Community
    • Login

    Search with quantifier failed

    Scheduled Pinned Locked Moved General Discussion
    search patterncurly braces
    19 Posts 6 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Terry RT
      Terry R @Dr. N
      last edited by

      @Dr-N said in Search with quantifier failed:

      I know both are equivalent.

      No they aren’t equivalent, the statement was “works equivalently on your sample text.” So your 2nd regex says one or two digits, more preferred. Your first regex is two digits (everytime), so there is a subtle difference.

      You still have not explained how you know the 2nd regex has failed. What did it fail to detect?

      Terry

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @Dr. N
        last edited by Terry R

        @Dr-N said in Search with quantifier failed:

        But strangely the second won’t work.

        I had a thought. Are you searching with the search mode set as “extended”? Because \d is accepted in this mode but \d{1,2} is not, well not as you expected as the {1,2} are being accepted as the literal characters, not a quantifier for the \d. Since you mentioned “regex” we have assumed your search mode is set to “regular expression”.

        Terry

        Alan KilbornA PeterJonesP 2 Replies Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Terry R
          last edited by

          @Terry-R said in Search with quantifier failed:

          Are you searching with the search mode set as “extended”? Because \d is accepted in this mode

          I do not find this to be the case.

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Terry R
            last edited by

            @Terry-R said in Search with quantifier failed:

            search mode set as “extended”? Because \d is accepted in this mode

            \d isn’t accepted alone. It needs to be followed by three decimal digits, to refer to the 3digit decimal codepoint for an ANSI codepoint. IOW, \d in extended mode does not refer to “any digit character” like it does in Regular Expression mode

            https://npp-user-manual.org/docs/searching/#extended-search-mode

            Though that section obviously needs clarification. It needs the dagger, and all the \o, \d, and \x need examples and expansion. Sheesh; who was the usermanual editor who let that section stand as-is. ;-(

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @PeterJones
              last edited by

              @PeterJones said in Search with quantifier failed:

              It needs the dagger, and all the \o, \d, and \x need examples and expansion. Sheesh; who was the usermanual editor who let that section stand as-is. ;-(

              What’s the “dagger”?

              Personally I don’t mind that that manual section is not well-rounded-out. In my mind Extended mode has always been a crippled mode I never use. I would guess the history is that it was a poor-man’s regex mode before regex mode was implemented.

              PeterJonesP datatraveller1D 2 Replies Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @Alan Kilborn
                last edited by PeterJones

                @Alan-Kilborn ,

                Dagger † U+2020 as seen in the Extended Mode Docs:
                7e250a4c-4840-4407-b336-3dacfc19bd96-image.png

                It may be a crippled mode, but it should still be correctly documented. The docs as written don’t clarify well enough that \d requires three decimal digits, nor that it is not the same as the \d from regex mode. Other entries in the list use the dagger notation to reference the note about being different from the regex syntax that looks similar.

                1 Reply Last reply Reply Quote 2
                • datatraveller1D
                  datatraveller1 @Alan Kilborn
                  last edited by

                  @Alan-Kilborn I use the extended mode for hexadecimal search (\x), so for me it is still useful in some cases.

                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @datatraveller1
                    last edited by

                    @datatraveller1 said in Search with quantifier failed:

                    I use the extended mode for hexadecimal search (\x), so for me it is still useful in some cases.

                    For the most part, you can also use Regular expression mode for that. But, as I’ve recently cautioned another poster, if you find you are searching for hex sequences often, you are probably doing something wrong with how you are approaching text editing. Of course, that’s said with no info about what you are doing.

                    Terry RT datatraveller1D 2 Replies Last reply Reply Quote 0
                    • Terry RT
                      Terry R @Alan Kilborn
                      last edited by

                      Firstly, apologies to @Dr-N for possibly steering the posts in a wrong direction.

                      Secondly, sorry to the rest of you. I had not actually tested the Extended mode (wasn’t on a PC when I posted), rather just read from the manual (which has been verified as missing important information). It also appeared (on the surface) to explain the OP’s issue.

                      I have never used Extended because of the same reasoning as @Alan-Kilborn. I’m even more convinced now that it is a rubbish mode. If it’s intended to give users a leg up to full blown regex then (I believe) it completely misses the mark.

                      Given the manual didn’t fully explain the use and if someone new was to try it, the meta characters, being so closely resembling regex code will only serve to confuse users of that mode. I cannot see a reason for retaining this mode.

                      My 2c worth
                      Terry

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Terry R
                        last edited by

                        @Terry-R said in Search with quantifier failed:

                        My 2c worth

                        So here’s my take on it.

                        The history: Notepad++, pre version 6.0, had no regex search/replace mode. The author wanted to give some capability, and didn’t want to tackle a regex implementation himself (fun fact: in truth it was done by the author of the PythonScript plugin and some others). Thus “extended” mode was born.

                        When regex mode DID come along, “extended” mode really wasn’t necessary, but features aren’t usually removed, because someone may be using them (and complain). Thus, it lives. And maybe some still cling to it because regular expressions terrify them. :-)

                        1 Reply Last reply Reply Quote 0
                        • datatraveller1D
                          datatraveller1 @Alan Kilborn
                          last edited by

                          @Alan-Kilborn I rarely use the hexadecimal search, e.g. to search for the annoying character “Non-breaking space, HEX A0, DEC 160”.
                          -> Notepad++, extended search for \xA0
                          … but you are right, this also works with the regular expression mode.

                          BTW, I was a bit confused by the \X paragraph in the manual
                          https://npp-user-manual.org/docs/searching:
                          “For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X.” -> I miss the example for the shorter regex \X?

                          PeterJonesP 1 Reply Last reply Reply Quote 0
                          • PeterJonesP
                            PeterJones @datatraveller1
                            last edited by PeterJones

                            @datatraveller1 said in Search with quantifier failed:

                            I miss the example for the shorter regex \X

                            The entire regex is \X – it matches one letter plus all the combining characters that come after, hence it would match the o and the four shown combining-characterd

                            datatraveller1D 1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038

                              Hello @dr-n, @alan-kilborn, @terry-r, @peterjones, @datatraveller1 and All,

                              I tried to merge the two posts below, in order to get a complete summary of the Extended search mode feature !

                              https://community.notepad-plus-plus.org/post/45753

                              https://community.notepad-plus-plus.org/post/24236

                              I hope I have not forgotten anything important !

                              Peter, if you get some spare time, just check here and see if some points of this post could be added / improved !


                              In the Extended search mode, in addition to the search/replacement of standard characters and the 5 specific characters, below :

                              Character Syntax
                              Tabulation \t
                              New Line \n
                              Carriage Return \r
                              Backslash \\
                              Null \0

                              Within an Unicode encoded file, any single character of code-point U+xxxx, may be written, in the Find what: and the Replace With: zones, with one of the five syntaxes below :

                              Type From To Character Range
                              Decimal \d000 \d999 [0-9]
                              Octal \o000 \o777 [0-7]
                              Binary \b00000000 \b11111111 [0-1]
                              Hexadecimal \x00 \xFF [0-9A-Fa-f]
                              Unicode \u0000 \uFFFF [0-9A-Fa-f]

                              Consequence :

                              The character with the greatest Unicode code-point which can be searched and/or replaced, in Extended mode, is :

                              • \d999, so the Unicode character ϧ ( COPTIC SMALL LETTER KHEI ), with code-point = \u03e7, in the decimal representation

                              • \o777, so the Unicode character ǿ ( LATIN SMALL LETTER O WITH STROKE AND ACUTE ), with code-point = \u01ff, in the octal representation

                              • \b11111111, so the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS ), with code-point = \u00ff, in the binary representation

                              • \xFF, so the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS ), with code-point = \u00ff, in the the hexa representation

                              • \uFFFD, so the Unicode character � ( REPLACEMENT CHARACTER ), with code-point = \ufffd, in the Unicode representation


                              Within an ANSI encoded file, any single character of code-point U+00xx, may be written, in the Find what: and the Replace With: zones, with one of the four syntaxes below :

                              Type From To Character Range
                              Decimal \d000 \d255 [0-9]
                              Octal \o000 \o377 [0-7]
                              Binary \b00000000 \b11111111 [0-1]
                              Hexadecimal \x00 \xFF [0-9A-Fa-f]

                              Remarks :

                              • In all cases, the character with the greatest Unicode code-point which can be searched and/or replaced is, either, \d255 or \o377 or \b11111111 or \xFF which refers to the Unicode character ÿ ( LATIN SMALL LETTER Y WITH DIAERESIS )

                              • An Unicode character, of code-point U+00xx, can be found ONLY IF xx belongs to the range [00-7F] OR to the range [A0-FF]. When xx lies between 80 and 9F, it generally searches for the question mark ( ? ) as it refers to an Unicode char, whose code-point is not handled by the ANSI encoding ! Only, the 5 characters U+0081, U+008D, U+008F, U+0090 and U+009D, without any glyph, are correctly searched !


                              Examples ( With the Match case option ticked and the Match whole word only option UN-ticked ) :

                              • If you search for the uppercase letter A, you can choose, either, the syntax \d065 or \o101 or \b1000001 or \x41 or \u0041

                              • And if you look for the character, with decimal ASCII code 201 ( É ), type in, either, the syntax \d201 or \o311 or \b11001001 or \xC9 or \u00C9

                              • Of course, you may mix all these representations, either, in the Search and Replace zones. For instance, the text \d065\o102\b01000011Z\x44\u0045 represents the simple string ABCZDE

                              Remark : Depending of the End of Line character(s), used in your current file, indicated in the status bar ( \r\n for a Window file, \n for an Unix file, and \r for a Mac file ), you can search and/or replace text, containing line break(s). For instance :

                              • The search, in the Extended or Regular expression search mode, of the string ABC\r\n123 and the replacement by the string Word\r\nNumber, in a Windows file, would change the two lines :
                              ABC
                              123
                              

                              as the text :

                              Word
                              Number
                              
                              • This same S/R, in a Unix file, could be performed with the searched string ABC\n123 and the replaced string Word\nNumber

                              • With the following Windows file :

                              Line_1
                              Line_2
                              Line_3
                              Line_4
                              Line_5
                              Line_6
                              Line_7
                              Line_8
                              Line_9
                              

                              You could, perfectly, in Extended or Regular expression mode, use the following S/R :

                              • SEARCH Line_1\r\nLine_2\r\nLine_3\r\nLine_4\r\nLine_5\r\nLine_6\r\nLine_7\r\nLine_8\r\nLine_9\r\n

                              • REPLACE Modified Line #1\r\nModified Line #2\r\nModified Line #3\r\nModified Line #4\r\nModified Line #5\r\nModified Line #6\r\nModified Line #7\r\nModified Line #8\r\nModified Line #9\r\n

                              And get the text :

                              Modified Line #1
                              Modified Line #2
                              Modified Line #3
                              Modified Line #4
                              Modified Line #5
                              Modified Line #6
                              Modified Line #7
                              Modified Line #8
                              Modified Line #9
                              

                              The nice trick, with the search dialog, is that you DON’T need to separate the text of each line, with the End of Line characters \r\n :

                              • Select, first, the original 9-lines text

                              • Open the Replace dialog ( Ctrl + H )

                              => The entire searched text is automatically filled

                              Unfortunately, you CANNOT use this same work-around, for the replacement dialog -:(( So, you’ll still have to type all the text, below :

                              Modified Line #1\r\nModified Line #2\r\nModified Line #3\r\nModified Line #4\r\nModified Line #5\r\nModified Line #6\r\nModified Line #7\r\nModified Line #8\r\nModified Line #9\r\n


                              Note that, WHATEVER the search mode used :

                              • Do not exceed 2046 characters for, both, the Search and the Replace zones. Anyway, any surplus character is simply ignored !

                              • It could be worth to check the Match case option, in order to differentiate between upper and lower case letters

                              • I strongly advice you to uncheck the Match whole word only option, especially when the searched string begins and/or ends with a NON-word character

                              • The search of individual bytes of an UTF-8 or UCS-2 encoded character is not allowed !

                              • The replacement zone may contain any char, except for the NUL char ( \0 ), whatever its representation ( \0, \d000, \o000, \b00000000, \x00 or \u0000 )

                              Best Regards,

                              guy038

                              P.S. :

                              • Personally, I think that the only advantage of using the Extended mode is when using the \dxxx syntax, where xxx represents the decimal code of the character :

                                • Between 000 and 255 ( so in range U+0000 - U+00FF) within a UTF-8 or UCS-2 encoded file

                                • Between 000 and 127 or between 160 and 255 ( so in ranges U+0000 - U+007F or U+00A0 - U+00FF ) within an ANSI file

                              In all other cases, just prefer the Regular expression search mode ;-))


                              • For information, about the Extended search mode, you may also refer to this old article, in N++ Wiki, via the web.archive site :

                              https://web.archive.org/web/20190609210114/http://docs.notepad-plus-plus.org/index.php/Searching_And_Replacing#Escape_sequences_supported_in_extended_mode


                              • Finally, @peterjones, if you need to refer to the old N++ Wiki, here are some links, via the WayBack Machine site :

                              https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Main_Page

                              https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Category:Keywords

                              https://web.archive.org/web/20190719202854/http://docs.notepad-plus-plus.org/index.php/Category:Short_Title(All)

                              1 Reply Last reply Reply Quote 1
                              • datatraveller1D
                                datatraveller1 @PeterJones
                                last edited by datatraveller1

                                @PeterJones said in Search with quantifier failed:

                                The entire regex is \X – it matches one letter plus all the combining characters that come after, hence it would match the o and the four shown combining-characterd

                                Sorry, but I still don’t understand the manual text:
                                “For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X.”

                                -> \X seems to find any letter but the text implies \X is an alternative to find exactly ǭ̳̚?

                                … So is \X a better alternative than a dot in a regular expression to find a letter, because a dot finds the four combining characters and \X the one whole letter?

                                PeterJonesP 1 Reply Last reply Reply Quote 0
                                • PeterJonesP
                                  PeterJones @datatraveller1
                                  last edited by

                                  Correction: I earlier said from memory,

                                  it matches one letter plus all the combining characters that come after

                                  When i should have said

                                  it matches one character plus all the combining characters that come after

                                  It was evening, I was tired, and I was typing on my phone from memory without the manual or a copy of Notepad++ in front of me.

                                  @datatraveller1 said in Search with quantifier failed:

                                  \X seems to find any letter but the text implies \X is an alternative to find exactly ǭ̳̚?

                                  You read it different than it was intended. The manually literally says “Matches a single non-combining character followed by any number of combining characters” and that’s exactly what it matches. It doesn’t matter whether that character is o or a or Z or whatever. It matches one character, along with all the modifiers that come next. Just like \u matches one uppercase letter, or \l matches one lowercase letter, or \R matches either \r or \n or \r\n, the \X regex will match a character plus all the combining characters that come next.

                                  In this example text, I have o followed by those four modifiers, a followed by those four modifiers, and _ followed by those

                                  ǭ̳̚
                                  
                                  ą̳̄̚
                                  
                                  _̨̳̄̚
                                  
                                  :̨̳̄̚
                                  

                                  23785759-0ea4-4d0e-8bda-07646ba20eae-image.png

                                  You will see that it matches all four of those sequences.

                                  It says 20, because it also matches each of the bytes of the newlines between, each of which are 1 character followed by 0 modifying characters.

                                  So is \X a better alternative than a dot in a regular expression to find a letter, because a dot finds the four combining characters and \X the one whole letter?

                                  Depending on how one interprets your phrasing, that’s either right, or literally the opposite of what happens: if you mean that ǭ̳̚ with all the modifiers is “one whole letter” and that “dot finds the four combing characters” means that the dot matches each combining character independently with four separate matches, then yes, you are right. If you meant it the way i first read it, where it means “the single dot matches all four combinging characters at once, whereas the \X just finds the letter”, then it’s exactly opposite of what happens.

                                  Dot matches a single character – this could be the initial character, or any of the four modifiers. \X matches the character plus all the modifiers in one unit. So the screenshot above with the \X showed 20 matches, whereas the . will show 36 (one for each character).

                                  148eb546-30c2-49e9-8084-73798e049fa7-image.png

                                  I really didn’t think there was ambiguity in the phrasing, but I will try to clarify it some more.

                                  datatraveller1D 1 Reply Last reply Reply Quote 2
                                  • datatraveller1D
                                    datatraveller1 @PeterJones
                                    last edited by

                                    @PeterJones Thank you very much indeed!

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors