Community
    • Login

    Entering curly quote marks as UDL operators, or keywords

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 275 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ian OhI
      Ian Oh
      last edited by

      Hi

      When I enter “” or ‘’ … the curly counterparts of the quotations marks on our keyboards into any of the Operators of the UDL, to help me find unbalanced curly quotes, it does not register. Entering the straight quotes work. I can see clearly where the text has missed a closed quote. Is there a way for Notepad++ UDL to incorporate the curly quotes?

      Thanks.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Ian Oh
        last edited by

        @Ian-Oh ,

        The UDL implementation isn’t very strong when it comes to non-ASCII Unicode characters. There have been many bug-reports/feature-requests to improve Unicode handling in the UDL, but none of them have been addressed. Sorry.

        However, you might be able to get something good enough for your purposes by using the Search > Mark dialog:

        "normal"
        “blah blah”
        “imbalanced open “blah”
        “imbalanced close” blah”
        

        FIND = “[^”]*“[^”]*”|“[^”]*”[^“]*”, purge each search, search-mode = regular expression

        fde9bb78-f3fc-4857-b35e-b71066beaeab-image.png

        That find looks for (1) an open double-quote followed by 0 or more characters that aren’t a close, followed by another open, followed by 0 or more characters that aren’t a close, followed by a close (thus finding an extra imbalanced open quote); OR (2) open followed by 0-or-more non-close, followed by close, followed by 0-or-more non-open, followed by another close (thus finding an extra imbalanced close quote).

        Ian OhI 1 Reply Last reply Reply Quote 1
        • Ian OhI
          Ian Oh @PeterJones
          last edited by

          @PeterJones

          Wow! That’s a thing of beauty… or as we’d say down under … Ubewdy! Thank you very much.

          And thanks for explaining the limitations of UDL to ASCII.

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @ian-oh, @peterjones, @alan-kilborn and All,

            Sorry for not being very responsive, but I am currently on vacation and consult our forum rather rarely !

            Before presenting the regexes which allow this kind of search, paste the following five lines, in a new tab and let’s study some concepts. Note that line 5 contains a full period

            Start blah
            “blah blah”
            “imbalanced open “blah”
            “imbalanced close” blah”
            This is a ““Te”st”“““abc”de”fgh. Ijkl”“mnop”” End
            

            If we’re going to search the longest area, with well balanced delimiters “ and ”, we must, first, consider the total range where to search these areas. Let me explain :

            • Case A : We may suppose that this range is the file contents ( Default case )

            • Case B : We may suppose that this range is limited to current line contents

            • Case C : Finally, we may suppose that this range is limited to a single sentence contents, within current line

            If, by convention, any text, without double curly quotes, is considered as well balanced ( zero “ char and zero ” char ), we can say that :

            • Regarding case A :

              • The multi-line area, beginning from word Start, in line 1, till the string “mnop”, in line 5, forms an area with the same number of opening and closing double curly quotes ! ( The last ”, before End, is not included )

              • Then, the final End word, preceded with a space char, is a well-balanced area, by default !

            • Regarding case B :

              • Line 1 and 2 are well balanced

              • Line 3 contains the well balanced area imbalanced open “blah”

              • Line 4 contains the well balanced area “imbalanced close” blah

              • Line 5 contains the well balanced area This is a ““Te”st”“““abc”de”fgh. Ijkl”“mnop”

            • Regarding case C :

              • Line 1 to 4 are identical to case B

              • Line 5 contains :

                • The well balanced area This is a ““Te”st”, in the first sentence

                • The well balanced area ““abc”de”fgh, right before the period

                • The well balanced area ijkl, preceded with a space char, in the second sentence

                • The well balanced area “mnop”

                • The well balanced area End, preceded with a space char


            So, if we add +1 for any opening double curly quote and -1 for any closing double curly quote, we get this table, where any • char refers to an unmatched double curly quote !

            •--------•---------------------------------------------------•
            | Line 1 | Start blah                                        |
            | Count  |                                                   |
            •--------•---------------------------------------------------•
            | Line 2 | “blah blah”                                       |
            | Count  | 1         0                                       |
            •--------•---------------------------------------------------•
            | Line 3 | “imbalanced open “blah”                           |
            | Count  | •                1    0                           |
            •--------•---------------------------------------------------•
            | Line 4 | “imbalanced close” blah”                          |
            | Count  | 1                0     •                          |
            •--------•---------------------------------------------------•
            | Line 5 | This is a ““Te”st”“““abc”de”fgh. Ijkl”“mnop”” End |
            | Count  |           12  1  0•12   1  0   .     •1    0•     |
            •--------•---------------------------------------------------•
            

            Thus, @ian-oh, according to the case A, B or C, you"ll execute the following recursive regexes, in free-spacing mode, beginning at (?x), till the (?1)+ syntax :

            
             Case A :  (?x) (?: ( [^“”] )*        ( “ (?:  (?1)++ | (?2) )* ” ) )+ (?1)*  | (?1)+
            
             Case B :  (?x) (?: ( [^“”\r\n] )*    ( “ (?:  (?1)++ | (?2) )* ” ) )+ (?1)*  | (?1)+
            
             Case C :  (?x) (?: ( [^“”.!?\r\n] )* ( “ (?:  (?1)++ | (?2) )* ” ) )+ (?1)*  | (?1)+
                                ^                 ^
                                |                 |
            Groups  --------->  1                 2
            
            

            Notes :

            • These regexes are derived from the end of this article, in the official N++ documentation, which explains how to search for well balanced regions with parentheses :

            (?x) (?: [^()]* ( \( (?: [^()]++ | (?1) )* \) ) )+ [^()]* | [^()]+

            • For case C, I considered that a sentence ends at a full period and at an interrogation or exclamation mark. Add other characters to this list if necessary !

            • These regexes are mainly composed of non-capturing groups and contain only two groups :

              • A non-recursive group 1 which refers to the allowed characters, for each case ( not included the double curly quotes )

              • A recursive group 2, as one reference (?2) is located inside the group 2 itself, which adds some intelligence to the overall search by a recursive evaluation of the text

            • Note that, in case of incoherent results, it is advised to replace any (?1) syntax by its true value ( [^“”] ), or ( [^“”\r\n] ) or ( [^“”.!?\r\n] ). This may helps !

            • Don’t try to perform backward searches : it won’t work !


            Here is an other text of 7 lines, whith a lot of double curly quotes, for additional tests of these 3 recursive regexes :

            ““““ab“““cd““ef”””gh”.ij””klm””””
            ““ab““““cd“““ef”””gh””””ijkl””””
            ““““““ab“cd“ef“”””gh”ijkl””?”mn”””””
            ““--ab“cd“ef--gh“ij--kl”mn”o!p““qr--st”uv\wx”--”yz””------”abc
            abcd--------““efghi----jk”----”lmnop------
            ““--ab”cdef--ghi.j--klmn---op““qr--stu”vwx--”yz”----
            ---abcde-----““qrs”tu--”vwxyz---
            

            If you paste these 7 lines in a new tab, you’ll verify that, with the regexA syntax, the last match is all the well-balanced area, below :

            abc
            abcd--------““efghi----jk”----”lmnop------
            ““--ab”cdef--ghi.j--klmn---op““qr--stu”vwx--”yz”----
            ---abcde-----““qrs”tu--”vwxyz---
            

            Which contains, exactly, eight “ opening characters and eight ” closing characters !

            Best Regards

            guy038

            1 Reply Last reply Reply Quote 1
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors