Community
    • Login

    Find - Replace

    Scheduled Pinned Locked Moved General Discussion
    13 Posts 4 Posters 11.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones @Kendall DeMott
      last edited by

      @kendall-demott

      Notepad++ Find/Replace works just fine on quotes with numbers inside.

      c084f747-9e6a-427b-b923-9a0d0a277a3f-image.png

      You will have to give a little more detail if you are having difficulty. (Please make sure that if your XML has ASCII quotes, as it should, that you are not search for “smart/curly quotes” in your search field. “300” and "300" are not the same string, and the latter one will match valid XML whereas the curly one will not. My search was for "300" when the file was blah="300")

      ----

      Useful References

      • Please Read Before Posting
      • Template for Search/Replace Questions
      • FAQ: Where to find regular expressions (regex) documentation
      • Notepad++ Online User Manual: Searching/Regex
      1 Reply Last reply Reply Quote 4
      • Kendall DeMottK
        Kendall DeMott
        last edited by

        As previously stated, these are regular quotation marks, I’ve done this type of “find and replace” edit for years, until the last update, after this function no longer works.
        It will not Find a numerical value with quotes, nor will it Find what / Replace with.

        Not sure why on the first pic, the error at the bottom is showing double quotes?
        Find numerical W-quotes.jpg

        Find - Replace.jpg

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @Kendall DeMott
          last edited by PeterJones

          @kendall-demott

          First, to solve your problem:

          I can replicate your problem by using the same options:
          e4863f39-31fe-435e-b29c-e984bc0d8387-image.png

          But if I turn off “match whole words only”, then it finds it easily:
          fd4b91e5-117c-471d-baaf-6bf753bf8192-image.png

          This is because "1000" is not the “whole word”; price="1000"/> is the “whole word”.

          the error at the bottom is showing double quotes?

          Because that error bar takes whatever is in the FIND box and puts it between quotes to display the text. If you had said Find What: gobbeldygook, the error message would say Find: Can't find the text "gobbeldygook", as shown here:
          57ca9bc1-6a03-45bc-a44d-9db4356db3ef-image.png

          To reiterate the main solution: the reason your search did not work is because you told it to match whole words only, but then were trying to match against text that wasn’t a “whole word”.

          ------
          see https://npp-user-manual.org/docs/searching/#find-replace-tabs

          9f8d5089-0ad3-4a7b-9e2e-fcc03264d1fb-image.png

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @PeterJones
            last edited by

            @peterjones said in Find - Replace:

            price="1000"/> is the “whole word”.

            Can you elaborate on why this is?
            Aside from “it works”? :-)

            Reading the fine manual HERE doesn’t really shed light on it, for me.

            Note that I know how to use the option, and would never have used it like OP did, but it never hurts to know deeper meanings in things, so that maybe I can use a function better.

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @Alan Kilborn
              last edited by PeterJones

              @alan-kilborn ,

              I don’t have insight into how the non-regex “word” is defined in the code.

              However, at least in my brief experimentation, the “normal mode + match whole word only” seems to agree with “regex mode” and \b.*?\b.

              For example, because the spot between the = and the " will not match a word boundary \b, a “whole word only” match will not match if just the " is included, but it will if the match starts with =" or if it starts at the 1000.

              Maybe this will show it better: If you are searching the text price="1000"/>:

              looking for text regex version normal+whole word matches regex matches notes
              1000 \b1000\b YES YES the zero-width between "1 is a word boundary, as is 0"
              "1000 \b"1000\b NO NO the zero-width between =" is not a word boundary, so fails
              ="1000 \b="1000\b YES YES the zero-width between e= is a boundary
              ="1000" \b="1000"\b YES NO ERROR "/ is not a word boundary, so the regex fails, but the normal+whole somehow matches
              price="1000" \bprice="1000"\b NO NO including price before the = seems to change the normal+whole defintion of “whole word”… weird.

              Unfortunately, with experimentation, my theory broke down. I don’t know enough about the underlying details to explain exactly how it matches – someone with more insight into the source code would need to comment.

              But I think a good general rule is, “if it doesn’t also match regex=\bXXX\b, then normal+word=XXX probably won’t work, though there are subtle exceptions”. For normal+word, I would stick to words that are obviously word units, like the 1000 or price (with no spaces or punctuation), rather than trying to get normal+word to go across words or word boundaries. If you want to search across multiple words, or want mixed words and punctuation, normal+word will not always work as you expect.

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @kendall-demott, @peterjones, @alan-kilborn ans All,

                Well, I would say :

                • For an ANSI file :

                  • If a string of word chars is immediately surrounded both, before and after, with one of the characters [\x00 - \x2F] , [\x3A - \x40] , [\x5B - \x5E] , \x60 or [ \x7B - \x7F], that string will match when the Match whole word only option is ticked

                  • In other words, if a string is immediately surrounded by, at least, one word char, in the strict range [0-9A-Z_a-z] or any char in range [\x80-\xFF], that string will not match when the Match whole word only option is ticked

                • For a NON-ANSI file ( so any encoding different from ANSI ) :

                  • If a string of word chars is immediately surrounded both, before and after, with a Unicode non-word character, recognized by Notepad++, that string will match when the Match whole word only option is ticked

                  • In other words, if a string is immediately surrounded by, at least, one Unicode word char, recognized by Notepad++, that string will not match when the Match whole word only option is ticked


                Now, regarding the regex \b zero-width assertion, it represents, either :

                • The position between the very beginning of current file and a word character

                • The position between a non-word character and a word character

                • The position between a word character and a non-word character

                • The position between a word character and the very end of current file

                Note also that the \n and/or \r line-endings chars are always considered as non-word chars

                Best Regards,

                guy038

                PeterJonesP 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn
                  last edited by Alan Kilborn

                  More on the subject from @guy038 in this old post: https://community.notepad-plus-plus.org/post/20424

                  Peter, could the user manual be better in this regard?

                  1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @guy038
                    last edited by

                    @guy038 said in Find - Replace:

                    If the string to search for is, itself, surrounded with non-word characters, that string will match when the Match whole word only option is ticked ONLY IF surrounded with the \n or \r chars

                    That’s not accurate.

                    If the document is

                    <a price="1000"/> x
                    <a price="1000"/>x
                    

                    then FIND = ="1000"/> will match both those lines, even though it’s got an e to the left and either a space or an x to the right.

                    —

                    Also, I originally said that ="1000" matched normal+whole word in the document price="1000"/>, but it does not… so apparently my test was wrong yesterday. And with NORMAL=="1000" and REGEX=\b="1000"\b actually agreeing that it doesn’t match, I am back to thinking that for a “normal+whole word” FIND=☒☒☒, it is equivalent to a regex FIND=\b☒☒☒\b (or, I should say \b\Q☒☒☒\E\b, because ☒ might be a regex special character, so it needs to be escaped in the regex-equivalent). I haven’t been able to find an exception to this. If anyone can show me different, let me know.

                    1 Reply Last reply Reply Quote 0
                    • Kendall DeMottK
                      Kendall DeMott
                      last edited by

                      Peter, Thank You, unticking that box solved my issue.

                      1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @kendall-demott, @peterjones, @alan-kilborn and All,

                        I said, in my previous post ( from now on deleted ) :

                        • If the string to search for is, itself, surrounded with non-word characters, that string will match when the Match whole word only option is ticked ONLY IF surrounded with the \n or \r chars

                        Actually, I really misspoke ! I wanted to mean :

                        • Any string, containing word and/or non-word characters, at any location, will match, when the Match whole word only option is ticked, IF this string is surrounded with nothing, a \n char or a \r char

                        Now, Peter, you said in your last post :

                        I am back to thinking that for a “normal+whole word” FIND=☒☒☒, it is equivalent to a regex FIND=\b☒☒☒\b …

                        So I created a file, containing all Unicode characters of the BMP, only ( so 63,454 characters with code-point < U+FFFF ), in the form below :

                        NULabcd¤
                        SOHabcd¤
                        ...
                        ...
                        ...
                        abcd¤
                        �abcd¤
                        

                        And it happens that :

                        • The search of the string abcd, in Normal mode, with the Match whole word only option ticked, returns 12,561 matches

                        • The search of the regex string \babcd\b in Regular expression mode, returns 15,424 matches

                        So, obviously, these two kinds of searches are not equivalent at all !


                        For instance, let’s insert the string ¼abcd¤ in a new tab, whatever its encoding

                        First note that, either, the ¼ and the ¤ characters are non-word characters. To be convinced, just look for \w in Regular expression mode. The four letters are matched, only

                        • However, the search of abcd, in Normal search mode, with the Match whole word only ticked, gives : NO match

                        • Luckily, the search of \babcd\b, in Regular expression search mode, does give the correct answer : MATCH


                        Unfortunately, the general template \bString of Word chars\b is not exact, too, in numerous cases :

                        Let’s consider, for instance :

                        • The Ԩ Unicode character. It’s the CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK with code-point U+0528

                        • The ᏹ Unicode character. It’s the CHEROKEE SMALL LETTER YI, with code-point U+13F9

                        • The ⴭ Unicode character. It’s the GEORGIAN SMALL LETTER AEN, with code-point U+2D2D

                        Despite all these chars are seen as true letters by the Unicode Consortium, they are not considered, yet, as word chars by our N++ regex engine :((. Thus, the search of \babcd\b, in Regular expression mode, will wrongly match the string abcd in the examples below :

                        Ԩabcd¤
                        ᏹabcd¤
                        ⴭabcd¤
                        

                        Conclusion :

                        Although the search of a whole word with the regex \b....\b seems more accurate and will give correct results with usual chars, it may fail with a lot of non-usual Unicode chars !

                        Best Regards,

                        guy038

                        P.S. :

                        Note that the use of the regex assertion \b may give correct but rather surprising results ! For instance, the regex \b\Q^!:/@?$\E\b matches the part ^!:/@?$, of the string A^!:/@?$Z, because the \b assertion may be the location between a word char and a non-word char ! So, definitively, the use of the \b assertion, in regexes and the option Match whole word only, in Normal mode, are not equivalent !

                        PeterJonesP 1 Reply Last reply Reply Quote 0
                        • PeterJonesP
                          PeterJones @guy038
                          last edited by

                          @guy038 ,

                          Thanks for the experiment. Basically, it boils down to “Unicode complicates things for whole word only”. ;-)

                          The phrasing I am considering for the user manual:

                          • For ASCII text

                            • if the left and right characters of your search string are both “word characters” (letters, numbers, underscore, and optionally additional characters set by your preferences), then “match whole word only” will only allow a match if the characters to the left and right of the match are non-word-characters or spaces or the beginning or ending of the line
                            • if the left and right characters of your search string are both non-word characters (so not letters, numbers, underscore, and optionally additional characters set by your preferences)
                            • if the left of your search string is a word character and the right is not (or vice versa), then the characters to the left and right must be of the opposite type, or spaces, or beginning/ending of line.
                          • For non-ASCII text, the general concepts are the same; however, some edge cases may behave differently than you expect, and with thousands of possible Unicode characters and millions of combinations of pairs of Unicode characters, this manual cannot contain a full description.

                          • Either way, if you want full control of what counts as a “word” or a “word boundary”, use Search Mode = Regular Expression instead of Normal with Match Whole Word Only, which allows you full and precise control of what is allowed before and after what you consider a “whole word”.

                          And yes, I did verify that Settings > Preferences > Delimiter > add your character as part of a word does affect whether Match whole word only matches.

                          PeterJonesP 1 Reply Last reply Reply Quote 2
                          • PeterJonesP
                            PeterJones @PeterJones
                            last edited by

                            The phrasing I am considering for the user manual:

                            issue #349 => PR #350

                            It should be in the next release of the user manual

                            1 Reply Last reply Reply Quote 2
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors