Community
    • Login

    "Whole Word Only" Option in Combination with Non-Alphanumeric Characters

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    13 Posts 4 Posters 858 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Thomas KnoefelT
      Thomas Knoefel
      last edited by

      When searching for the string “Notepad++” with the "Match whole word only" option in a CSV file, there are inconsistencies in the search results based on the placement of non-alphanumeric characters.

      Examples and Outcomes:

      1. Search Term: Notepad++
        Text: Notepad,Notepad++,Notepad
        Result: No match found.
        Explanation: There is no match even though “Notepad++” is present, because the + sign is hitting the comma.

      2. Search Term: Notepad++
        Text: Notepad,Notepad++ ,Notepad
        Result: Match in the 2nd column.
        Explanation: A match occurs here because there is a space after “Notepad++”, which seems to affect the search behavior.

      3. Search Term: Notepad
        Text: Notepad,Notepad,Notepad
        Result: Hits in all columns.
        Explanation: Searches without non-alphanumeric characters at the ends behave as expected.

      It appears that the placement of non-alphanumeric characters at the beginning or end of the search string affects the outcome. This might be initially perceived as a bug.

      Is it possible to adjust the behavior in the Scintilla component for SCI_SEARCHINTARGET to handle search terms uniformly regardless of surrounding characters? I’m looking into this for the MultiReplace Plugin and would appreciate any insights or suggestions.

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @Thomas Knoefel
        last edited by

        @Thomas-Knoefel

        Have you read the fine user manual about Match whole word only ?

        Thomas KnoefelT 1 Reply Last reply Reply Quote 2
        • Thomas KnoefelT
          Thomas Knoefel @Alan Kilborn
          last edited by

          @Alan-Kilborn
          Thanks for the hint. I checked the help documentation, and it seems this behavior is normal and something I’ll need to get used to.

          https://npp-user-manual.org/docs/searching/

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @Thomas Knoefel
            last edited by

            @Thomas-Knoefel

            I don’t know if searching in regex mode works “better” or not:

            e.g. \b\QNotepad++\E\b

            Thomas KnoefelT 1 Reply Last reply Reply Quote 1
            • Thomas KnoefelT
              Thomas Knoefel @Alan Kilborn
              last edited by

              @Alan-Kilborn
              I was just curious about this when I first recognized it in the plugin and thought it was a bug. However, after realizing that this behavior is normal, I found it a bit odd. Since this behavior cannot be changed, so it’s just part of the feature set of the plugin then.

              CoisesC 1 Reply Last reply Reply Quote 0
              • CoisesC
                Coises @Thomas Knoefel
                last edited by

                @Thomas-Knoefel

                Based on the Scintilla documentation:

                https://www.scintilla.org/ScintillaDoc.html#searchFlags

                the plain text search with whole word enabled should be equivalent to:

                (?<!\w)\QNotepad++\E(?!\w)

                The implementation says otherwise:

                https://github.com/notepad-plus-plus/notepad-plus-plus/blob/7a401cfacef20a962bdd0cb1cdc47f8a96c0b85a/scintilla/src/Document.cxx#L2063

                “Whole word” effectively implies a word boundary; so it behaves like @Alan-Kilborn’s suggestion:

                \b\QNotepad++\E\b

                and not like the documentation indicates.

                PeterJonesP 1 Reply Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones @Coises
                  last edited by PeterJones

                  @Coises ,

                  I respectfully disagree.

                  With text Notepad,Notepad++,Notepad, search for Notepad++ with Whole Word checkmarked. It won’t be found, because there is no character-class difference between the + at the end of the word and the comma (,) after it that’s not part of the search string. Thus, the “Check that the given range is has transitions between character classes at both” comment that was in the source-code link is not fulfilled (+ to , is not a character class transition).

                  And the manual says, “If the left of your search string is a word character and the right is not (or vice versa), then the characters to the left and right must be of the opposite type, or be spaces, or be the beginning/ending of a line.” The left of the search string is N, so a word character; the right is + so punctuation; thus, it would have to be non-word to the left of the N (it is) and word or space to the right of the + (it is a comma, so punctuation, which is neither), thus the Manual correctly describes that Whole Word will not match for that.

                  The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                  CoisesC 1 Reply Last reply Reply Quote 3
                  • CoisesC
                    Coises @PeterJones
                    last edited by

                    @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                    The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                    Indeed, it does.

                    The documentation I looked at was the Scintilla documentation for the search flags. (Probably because I was thinking more as a plugin developer than as an end user.)

                    The Notepad++ User Manual documentation describes the actual behavior correctly.

                    PeterJonesP 1 Reply Last reply Reply Quote 3
                    • PeterJonesP
                      PeterJones @Coises
                      last edited by PeterJones

                      @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                      The documentation I looked at was the Scintilla documentation

                      Ah, okay, I misunderstood which “documentation” you were referring to. I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should. (Who knows if they’ve even bothered to learn their own edge cases; I get the feeling that Notepad++ and it’s associated plugin authors push Scintilla in ways that the Scintilla developers never expected; though presumably other apps that use Scintilla push things in different directions than we do.)

                      Thomas KnoefelT CoisesC 2 Replies Last reply Reply Quote 3
                      • Thomas KnoefelT
                        Thomas Knoefel @PeterJones
                        last edited by Thomas Knoefel

                        I’ve been thinking about how the “Match Whole Word Only” search option might function:

                        1. Text Segmentation: The entire text is divided into chunks by separating at non-word characters, ensuring symbols like ‘+’ are not included in these chunks.
                        2. Search Within Chunks: The search then strictly focuses on these separated chunks.

                        Interestingly, spaces seem to act as primary separators, overruling other non-word characters if they are next to these characters and including adjacent non-word characters into the chunks.

                        This chunk preparation, which happens without analyzing the search string, likely makes the search process faster, especially if the text is pre-prepared. Finally the search will only focus on these seperated chunks.

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @Thomas Knoefel
                          last edited by

                          @Thomas-Knoefel

                          Interesting idea. If you’re a scripter, maybe mockup some demo with a script and show it here?

                          1 Reply Last reply Reply Quote 0
                          • CoisesC
                            Coises @PeterJones
                            last edited by Coises

                            @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                            I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should.

                            Having just discovered — to some horror — that Notepad++ uses a modified version of Scintilla (context here) I am no longer inclined to “blame” Scintilla for anything without first doing a lot of investigation.

                            I somehow just assumed that modifying Scintilla would be “off limits” for the Notepad++ project.

                            Alan KilbornA 1 Reply Last reply Reply Quote 3
                            • Alan KilbornA
                              Alan Kilborn @Coises
                              last edited by

                              @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                              that modifying Scintilla would be “off limits” for the Notepad++ project

                              It mostly is.
                              But I think in some areas it was judged to be something that “had to be done”.

                              1 Reply Last reply Reply Quote 2
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors