Community
    • Login

    Search with quantifier failed

    Scheduled Pinned Locked Moved General Discussion
    search patterncurly braces
    19 Posts 6 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • datatraveller1D
      datatraveller1 @Alan Kilborn
      last edited by

      @Alan-Kilborn I use the extended mode for hexadecimal search (\x), so for me it is still useful in some cases.

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @datatraveller1
        last edited by

        @datatraveller1 said in Search with quantifier failed:

        I use the extended mode for hexadecimal search (\x), so for me it is still useful in some cases.

        For the most part, you can also use Regular expression mode for that. But, as I’ve recently cautioned another poster, if you find you are searching for hex sequences often, you are probably doing something wrong with how you are approaching text editing. Of course, that’s said with no info about what you are doing.

        Terry RT datatraveller1D 2 Replies Last reply Reply Quote 0
        • Terry RT
          Terry R @Alan Kilborn
          last edited by

          Firstly, apologies to @Dr-N for possibly steering the posts in a wrong direction.

          Secondly, sorry to the rest of you. I had not actually tested the Extended mode (wasn’t on a PC when I posted), rather just read from the manual (which has been verified as missing important information). It also appeared (on the surface) to explain the OP’s issue.

          I have never used Extended because of the same reasoning as @Alan-Kilborn. I’m even more convinced now that it is a rubbish mode. If it’s intended to give users a leg up to full blown regex then (I believe) it completely misses the mark.

          Given the manual didn’t fully explain the use and if someone new was to try it, the meta characters, being so closely resembling regex code will only serve to confuse users of that mode. I cannot see a reason for retaining this mode.

          My 2c worth
          Terry

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @Terry R
            last edited by

            @Terry-R said in Search with quantifier failed:

            My 2c worth

            So here’s my take on it.

            The history: Notepad++, pre version 6.0, had no regex search/replace mode. The author wanted to give some capability, and didn’t want to tackle a regex implementation himself (fun fact: in truth it was done by the author of the PythonScript plugin and some others). Thus “extended” mode was born.

            When regex mode DID come along, “extended” mode really wasn’t necessary, but features aren’t usually removed, because someone may be using them (and complain). Thus, it lives. And maybe some still cling to it because regular expressions terrify them. :-)

            1 Reply Last reply Reply Quote 0
            • datatraveller1D
              datatraveller1 @Alan Kilborn
              last edited by

              @Alan-Kilborn I rarely use the hexadecimal search, e.g. to search for the annoying character “Non-breaking space, HEX A0, DEC 160”.
              -> Notepad++, extended search for \xA0
              … but you are right, this also works with the regular expression mode.

              BTW, I was a bit confused by the \X paragraph in the manual
              https://npp-user-manual.org/docs/searching:
              “For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X.” -> I miss the example for the shorter regex \X?

              PeterJonesP 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @datatraveller1
                last edited by PeterJones

                @datatraveller1 said in Search with quantifier failed:

                I miss the example for the shorter regex \X

                The entire regex is \X – it matches one letter plus all the combining characters that come after, hence it would match the o and the four shown combining-characterd

                datatraveller1D 1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello @dr-n, @alan-kilborn, @terry-r, @peterjones, @datatraveller1 and All,

                  I tried to merge the two posts below, in order to get a complete summary of the Extended search mode feature !

                  https://community.notepad-plus-plus.org/post/45753

                  https://community.notepad-plus-plus.org/post/24236

                  I hope I have not forgotten anything important !

                  Peter, if you get some spare time, just check here and see if some points of this post could be added / improved !


                  In the Extended search mode, in addition to the search/replacement of standard characters and the 5 specific characters, below :

                  Character Syntax
                  Tabulation \t
                  New Line \n
                  Carriage Return \r
                  Backslash \\
                  Null \0

                  Within an Unicode encoded file, any single character of code-point U+xxxx, may be written, in the Find what: and the Replace With: zones, with one of the five syntaxes below :

                  Type From To Character Range
                  Decimal \d000 \d999 [0-9]
                  Octal \o000 \o777 [0-7]
                  Binary \b00000000 \b11111111 [0-1]
                  Hexadecimal \x00 \xFF [0-9A-Fa-f]
                  Unicode \u0000 \uFFFF [0-9A-Fa-f]

                  Consequence :

                  The character with the greatest Unicode code-point which can be searched and/or replaced, in Extended mode, is :

                  • \d999, so the Unicode character ϧ ( COPTIC SMALL LETTER KHEI ), with code-point = \u03e7, in the decimal representation

                  • \o777, so the Unicode character ǿ ( LATIN SMALL LETTER O WITH STROKE AND ACUTE ), with code-point = \u01ff, in the octal representation

                  • \b11111111, so the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS ), with code-point = \u00ff, in the binary representation

                  • \xFF, so the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS ), with code-point = \u00ff, in the the hexa representation

                  • \uFFFD, so the Unicode character � ( REPLACEMENT CHARACTER ), with code-point = \ufffd, in the Unicode representation


                  Within an ANSI encoded file, any single character of code-point U+00xx, may be written, in the Find what: and the Replace With: zones, with one of the four syntaxes below :

                  Type From To Character Range
                  Decimal \d000 \d255 [0-9]
                  Octal \o000 \o377 [0-7]
                  Binary \b00000000 \b11111111 [0-1]
                  Hexadecimal \x00 \xFF [0-9A-Fa-f]

                  Remarks :

                  • In all cases, the character with the greatest Unicode code-point which can be searched and/or replaced is, either, \d255 or \o377 or \b11111111 or \xFF which refers to the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS )

                  • An Unicode character, of code-point U+00xx, can be found ONLY IF xx belongs to the range [00-7F] OR to the range [A0-FF]. When xx lies between 80 and 9F, it generally searches for the question mark ( ? ) as it refers to an Unicode char, whose code-point is not handled by the ANSI encoding ! Only, the 5 characters U+0081, U+008D, U+008F, U+0090 and U+009D, without any glyph, are correctly searched !


                  Examples ( With the Match case option ticked and the Match whole word only option UN-ticked ) :

                  • If you search for the uppercase letter A, you can choose, either, the syntax \d065 or \o101 or \b1000001 or \x41 or \u0041

                  • And if you look for the character, with decimal ASCII code 201 ( É ), type in, either, the syntax \d201 or \o311 or \b11001001 or \xC9 or \u00C9

                  • Of course, you may mix all these representations, either, in the Search and Replace zones. For instance, the text \d065\o102\b01000011Z\x44\u0045 represents the simple string ABCZDE

                  Remark : Depending of the End of Line character(s), used in your current file, indicated in the status bar ( \r\n for a Window file, \n for an Unix file, and \r for a Mac file ), you can search and/or replace text, containing line break(s). For instance :

                  • The search, in the Extended or Regular expression search mode, of the string ABC\r\n123 and the replacement by the string Word\r\nNumber, in a Windows file, would change the two lines :
                  ABC
                  123
                  

                  as the text :

                  Word
                  Number
                  
                  • This same S/R, in a Unix file, could be performed with the searched string ABC\n123 and the replaced string Word\nNumber

                  • With the following Windows file :

                  Line_1
                  Line_2
                  Line_3
                  Line_4
                  Line_5
                  Line_6
                  Line_7
                  Line_8
                  Line_9
                  

                  You could, perfectly, in Extended or Regular expression mode, use the following S/R :

                  • SEARCH Line_1\r\nLine_2\r\nLine_3\r\nLine_4\r\nLine_5\r\nLine_6\r\nLine_7\r\nLine_8\r\nLine_9\r\n

                  • REPLACE Modified Line #1\r\nModified Line #2\r\nModified Line #3\r\nModified Line #4\r\nModified Line #5\r\nModified Line #6\r\nModified Line #7\r\nModified Line #8\r\nModified Line #9\r\n

                  And get the text :

                  Modified Line #1
                  Modified Line #2
                  Modified Line #3
                  Modified Line #4
                  Modified Line #5
                  Modified Line #6
                  Modified Line #7
                  Modified Line #8
                  Modified Line #9
                  

                  The nice trick, with the search dialog, is that you DON’T need to separate the text of each line, with the End of Line characters \r\n :

                  • Select, first, the original 9-lines text

                  • Open the Replace dialog ( Ctrl + H )

                  => The entire searched text is automatically filled

                  Unfortunately, you CANNOT use this same work-around, for the replacement dialog -:(( So, you’ll still have to type all the text, below :

                  Modified Line #1\r\nModified Line #2\r\nModified Line #3\r\nModified Line #4\r\nModified Line #5\r\nModified Line #6\r\nModified Line #7\r\nModified Line #8\r\nModified Line #9\r\n


                  Note that, WHATEVER the search mode used :

                  • Do not exceed 2046 characters for, both, the Search and the Replace zones. Anyway, any surplus character is simply ignored !

                  • It could be worth to check the Match case option, in order to differentiate between upper and lower case letters

                  • I strongly advice you to uncheck the Match whole word only option, especially when the searched string begins and/or ends with a NON-word character

                  • The search of individual bytes of an UTF-8 or UCS-2 encoded character is not allowed !

                  • The replacement zone may contain any char, except for the NUL char ( \0 ), whatever its representation ( \0, \d000, \o000, \b00000000, \x00 or \u0000 )

                  Best Regards,

                  guy038

                  P.S. :

                  • Personally, I think that the only advantage of using the Extended mode is when using the \dxxx syntax, where xxx represents the decimal code of the character :

                    • Between 000 and 255 ( so in range U+0000 - U+00FF) within a UTF-8 or UCS-2 encoded file

                    • Between 000 and 127 or between 160 and 255 ( so in ranges U+0000 - U+007F or U+00A0 - U+00FF ) within an ANSI file

                  In all other cases, just prefer the Regular expression search mode ;-))


                  • For information, about the Extended search mode, you may also refer to this old article, in N++ Wiki, via the web.archive site :

                  https://web.archive.org/web/20190609210114/http://docs.notepad-plus-plus.org/index.php/Searching_And_Replacing#Escape_sequences_supported_in_extended_mode


                  • Finally, @peterjones, if you need to refer to the old N++ Wiki, here are some links, via the WayBack Machine site :

                  https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Main_Page

                  https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Category:Keywords

                  https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Category:Short_Title(All)

                  1 Reply Last reply Reply Quote 1
                  • datatraveller1D
                    datatraveller1 @PeterJones
                    last edited by datatraveller1

                    @PeterJones said in Search with quantifier failed:

                    The entire regex is \X – it matches one letter plus all the combining characters that come after, hence it would match the o and the four shown combining-characterd

                    Sorry, but I still don’t understand the manual text:
                    “For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X.”

                    -> \X seems to find any letter but the text implies \X is an alternative to find exactly ǭ̳̚?

                    … So is \X a better alternative than a dot in a regular expression to find a letter, because a dot finds the four combining characters and \X the one whole letter?

                    PeterJonesP 1 Reply Last reply Reply Quote 0
                    • PeterJonesP
                      PeterJones @datatraveller1
                      last edited by

                      Correction: I earlier said from memory,

                      it matches one letter plus all the combining characters that come after

                      When i should have said

                      it matches one character plus all the combining characters that come after

                      It was evening, I was tired, and I was typing on my phone from memory without the manual or a copy of Notepad++ in front of me.

                      @datatraveller1 said in Search with quantifier failed:

                      \X seems to find any letter but the text implies \X is an alternative to find exactly ǭ̳̚?

                      You read it different than it was intended. The manually literally says “Matches a single non-combining character followed by any number of combining characters” and that’s exactly what it matches. It doesn’t matter whether that character is o or a or Z or whatever. It matches one character, along with all the modifiers that come next. Just like \u matches one uppercase letter, or \l matches one lowercase letter, or \R matches either \r or \n or \r\n, the \X regex will match a character plus all the combining characters that come next.

                      In this example text, I have o followed by those four modifiers, a followed by those four modifiers, and _ followed by those

                      ǭ̳̚
                      
                      ą̳̄̚
                      
                      _̨̳̄̚
                      
                      :̨̳̄̚
                      

                      23785759-0ea4-4d0e-8bda-07646ba20eae-image.png

                      You will see that it matches all four of those sequences.

                      It says 20, because it also matches each of the bytes of the newlines between, each of which are 1 character followed by 0 modifying characters.

                      So is \X a better alternative than a dot in a regular expression to find a letter, because a dot finds the four combining characters and \X the one whole letter?

                      Depending on how one interprets your phrasing, that’s either right, or literally the opposite of what happens: if you mean that ǭ̳̚ with all the modifiers is “one whole letter” and that “dot finds the four combing characters” means that the dot matches each combining character independently with four separate matches, then yes, you are right. If you meant it the way i first read it, where it means “the single dot matches all four combinging characters at once, whereas the \X just finds the letter”, then it’s exactly opposite of what happens.

                      Dot matches a single character – this could be the initial character, or any of the four modifiers. \X matches the character plus all the modifiers in one unit. So the screenshot above with the \X showed 20 matches, whereas the . will show 36 (one for each character).

                      148eb546-30c2-49e9-8084-73798e049fa7-image.png

                      I really didn’t think there was ambiguity in the phrasing, but I will try to clarify it some more.

                      datatraveller1D 1 Reply Last reply Reply Quote 2
                      • datatraveller1D
                        datatraveller1 @PeterJones
                        last edited by

                        @PeterJones Thank you very much indeed!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors