Community
    • Login

    "Whole Word Only" Option in Combination with Non-Alphanumeric Characters

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    13 Posts 4 Posters 858 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Thomas Knoefel
      last edited by

      @Thomas-Knoefel

      Have you read the fine user manual about Match whole word only ?

      Thomas KnoefelT 1 Reply Last reply Reply Quote 2
      • Thomas KnoefelT
        Thomas Knoefel @Alan Kilborn
        last edited by

        @Alan-Kilborn
        Thanks for the hint. I checked the help documentation, and it seems this behavior is normal and something I’ll need to get used to.

        https://npp-user-manual.org/docs/searching/

        Alan KilbornA 1 Reply Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @Thomas Knoefel
          last edited by

          @Thomas-Knoefel

          I don’t know if searching in regex mode works “better” or not:

          e.g. \b\QNotepad++\E\b

          Thomas KnoefelT 1 Reply Last reply Reply Quote 1
          • Thomas KnoefelT
            Thomas Knoefel @Alan Kilborn
            last edited by

            @Alan-Kilborn
            I was just curious about this when I first recognized it in the plugin and thought it was a bug. However, after realizing that this behavior is normal, I found it a bit odd. Since this behavior cannot be changed, so it’s just part of the feature set of the plugin then.

            CoisesC 1 Reply Last reply Reply Quote 0
            • CoisesC
              Coises @Thomas Knoefel
              last edited by

              @Thomas-Knoefel

              Based on the Scintilla documentation:

              https://www.scintilla.org/ScintillaDoc.html#searchFlags

              the plain text search with whole word enabled should be equivalent to:

              (?<!\w)\QNotepad++\E(?!\w)

              The implementation says otherwise:

              https://github.com/notepad-plus-plus/notepad-plus-plus/blob/7a401cfacef20a962bdd0cb1cdc47f8a96c0b85a/scintilla/src/Document.cxx#L2063

              “Whole word” effectively implies a word boundary; so it behaves like @Alan-Kilborn’s suggestion:

              \b\QNotepad++\E\b

              and not like the documentation indicates.

              PeterJonesP 1 Reply Last reply Reply Quote 1
              • PeterJonesP
                PeterJones @Coises
                last edited by PeterJones

                @Coises ,

                I respectfully disagree.

                With text Notepad,Notepad++,Notepad, search for Notepad++ with Whole Word checkmarked. It won’t be found, because there is no character-class difference between the + at the end of the word and the comma (,) after it that’s not part of the search string. Thus, the “Check that the given range is has transitions between character classes at both” comment that was in the source-code link is not fulfilled (+ to , is not a character class transition).

                And the manual says, “If the left of your search string is a word character and the right is not (or vice versa), then the characters to the left and right must be of the opposite type, or be spaces, or be the beginning/ending of a line.” The left of the search string is N, so a word character; the right is + so punctuation; thus, it would have to be non-word to the left of the N (it is) and word or space to the right of the + (it is a comma, so punctuation, which is neither), thus the Manual correctly describes that Whole Word will not match for that.

                The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                CoisesC 1 Reply Last reply Reply Quote 3
                • CoisesC
                  Coises @PeterJones
                  last edited by

                  @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                  The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                  Indeed, it does.

                  The documentation I looked at was the Scintilla documentation for the search flags. (Probably because I was thinking more as a plugin developer than as an end user.)

                  The Notepad++ User Manual documentation describes the actual behavior correctly.

                  PeterJonesP 1 Reply Last reply Reply Quote 3
                  • PeterJonesP
                    PeterJones @Coises
                    last edited by PeterJones

                    @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                    The documentation I looked at was the Scintilla documentation

                    Ah, okay, I misunderstood which “documentation” you were referring to. I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should. (Who knows if they’ve even bothered to learn their own edge cases; I get the feeling that Notepad++ and it’s associated plugin authors push Scintilla in ways that the Scintilla developers never expected; though presumably other apps that use Scintilla push things in different directions than we do.)

                    Thomas KnoefelT CoisesC 2 Replies Last reply Reply Quote 3
                    • Thomas KnoefelT
                      Thomas Knoefel @PeterJones
                      last edited by Thomas Knoefel

                      I’ve been thinking about how the “Match Whole Word Only” search option might function:

                      1. Text Segmentation: The entire text is divided into chunks by separating at non-word characters, ensuring symbols like ‘+’ are not included in these chunks.
                      2. Search Within Chunks: The search then strictly focuses on these separated chunks.

                      Interestingly, spaces seem to act as primary separators, overruling other non-word characters if they are next to these characters and including adjacent non-word characters into the chunks.

                      This chunk preparation, which happens without analyzing the search string, likely makes the search process faster, especially if the text is pre-prepared. Finally the search will only focus on these seperated chunks.

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Thomas Knoefel
                        last edited by

                        @Thomas-Knoefel

                        Interesting idea. If you’re a scripter, maybe mockup some demo with a script and show it here?

                        1 Reply Last reply Reply Quote 0
                        • CoisesC
                          Coises @PeterJones
                          last edited by Coises

                          @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                          I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should.

                          Having just discovered — to some horror — that Notepad++ uses a modified version of Scintilla (context here) I am no longer inclined to “blame” Scintilla for anything without first doing a lot of investigation.

                          I somehow just assumed that modifying Scintilla would be “off limits” for the Notepad++ project.

                          Alan KilbornA 1 Reply Last reply Reply Quote 3
                          • Alan KilbornA
                            Alan Kilborn @Coises
                            last edited by

                            @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                            that modifying Scintilla would be “off limits” for the Notepad++ project

                            It mostly is.
                            But I think in some areas it was judged to be something that “had to be done”.

                            1 Reply Last reply Reply Quote 2
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors