• Login
Community
  • Login

"Whole Word Only" Option in Combination with Non-Alphanumeric Characters

Scheduled Pinned Locked Moved Notepad++ & Plugin Development
13 Posts 4 Posters 1.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    Thomas Knoefel
    last edited by Jun 6, 2024, 3:47 PM

    When searching for the string “Notepad++” with the "Match whole word only" option in a CSV file, there are inconsistencies in the search results based on the placement of non-alphanumeric characters.

    Examples and Outcomes:

    1. Search Term: Notepad++
      Text: Notepad,Notepad++,Notepad
      Result: No match found.
      Explanation: There is no match even though “Notepad++” is present, because the + sign is hitting the comma.

    2. Search Term: Notepad++
      Text: Notepad,Notepad++ ,Notepad
      Result: Match in the 2nd column.
      Explanation: A match occurs here because there is a space after “Notepad++”, which seems to affect the search behavior.

    3. Search Term: Notepad
      Text: Notepad,Notepad,Notepad
      Result: Hits in all columns.
      Explanation: Searches without non-alphanumeric characters at the ends behave as expected.

    It appears that the placement of non-alphanumeric characters at the beginning or end of the search string affects the outcome. This might be initially perceived as a bug.

    Is it possible to adjust the behavior in the Scintilla component for SCI_SEARCHINTARGET to handle search terms uniformly regardless of surrounding characters? I’m looking into this for the MultiReplace Plugin and would appreciate any insights or suggestions.

    A 1 Reply Last reply Jun 6, 2024, 3:54 PM Reply Quote 1
    • A
      Alan Kilborn @Thomas Knoefel
      last edited by Jun 6, 2024, 3:54 PM

      @Thomas-Knoefel

      Have you read the fine user manual about Match whole word only ?

      T 1 Reply Last reply Jun 6, 2024, 4:17 PM Reply Quote 2
      • T
        Thomas Knoefel @Alan Kilborn
        last edited by Jun 6, 2024, 4:17 PM

        @Alan-Kilborn
        Thanks for the hint. I checked the help documentation, and it seems this behavior is normal and something I’ll need to get used to.

        https://npp-user-manual.org/docs/searching/

        A 1 Reply Last reply Jun 6, 2024, 4:54 PM Reply Quote 1
        • A
          Alan Kilborn @Thomas Knoefel
          last edited by Jun 6, 2024, 4:54 PM

          @Thomas-Knoefel

          I don’t know if searching in regex mode works “better” or not:

          e.g. \b\QNotepad++\E\b

          T 1 Reply Last reply Jun 6, 2024, 7:46 PM Reply Quote 1
          • T
            Thomas Knoefel @Alan Kilborn
            last edited by Jun 6, 2024, 7:46 PM

            @Alan-Kilborn
            I was just curious about this when I first recognized it in the plugin and thought it was a bug. However, after realizing that this behavior is normal, I found it a bit odd. Since this behavior cannot be changed, so it’s just part of the feature set of the plugin then.

            C 1 Reply Last reply Jun 6, 2024, 8:21 PM Reply Quote 0
            • C
              Coises @Thomas Knoefel
              last edited by Jun 6, 2024, 8:21 PM

              @Thomas-Knoefel

              Based on the Scintilla documentation:

              https://www.scintilla.org/ScintillaDoc.html#searchFlags

              the plain text search with whole word enabled should be equivalent to:

              (?<!\w)\QNotepad++\E(?!\w)

              The implementation says otherwise:

              https://github.com/notepad-plus-plus/notepad-plus-plus/blob/7a401cfacef20a962bdd0cb1cdc47f8a96c0b85a/scintilla/src/Document.cxx#L2063

              “Whole word” effectively implies a word boundary; so it behaves like @Alan-Kilborn’s suggestion:

              \b\QNotepad++\E\b

              and not like the documentation indicates.

              P 1 Reply Last reply Jun 6, 2024, 8:39 PM Reply Quote 1
              • P
                PeterJones @Coises
                last edited by PeterJones Jun 6, 2024, 8:40 PM Jun 6, 2024, 8:39 PM

                @Coises ,

                I respectfully disagree.

                With text Notepad,Notepad++,Notepad, search for Notepad++ with Whole Word checkmarked. It won’t be found, because there is no character-class difference between the + at the end of the word and the comma (,) after it that’s not part of the search string. Thus, the “Check that the given range is has transitions between character classes at both” comment that was in the source-code link is not fulfilled (+ to , is not a character class transition).

                And the manual says, “If the left of your search string is a word character and the right is not (or vice versa), then the characters to the left and right must be of the opposite type, or be spaces, or be the beginning/ending of a line.” The left of the search string is N, so a word character; the right is + so punctuation; thus, it would have to be non-word to the left of the N (it is) and word or space to the right of the + (it is a comma, so punctuation, which is neither), thus the Manual correctly describes that Whole Word will not match for that.

                The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                C 1 Reply Last reply Jun 6, 2024, 8:52 PM Reply Quote 3
                • C
                  Coises @PeterJones
                  last edited by Jun 6, 2024, 8:52 PM

                  @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                  The Whole Word search for Notepad++ in the string Notepad,Notepad++,Notepad is behaving as described in both the comments of the source code and in the User Manual.

                  Indeed, it does.

                  The documentation I looked at was the Scintilla documentation for the search flags . (Probably because I was thinking more as a plugin developer than as an end user.)

                  The Notepad++ User Manual documentation describes the actual behavior correctly.

                  P 1 Reply Last reply Jun 6, 2024, 9:08 PM Reply Quote 3
                  • P
                    PeterJones @Coises
                    last edited by PeterJones Jun 6, 2024, 9:11 PM Jun 6, 2024, 9:08 PM

                    @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                    The documentation I looked at was the Scintilla documentation

                    Ah, okay, I misunderstood which “documentation” you were referring to. I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should. (Who knows if they’ve even bothered to learn their own edge cases; I get the feeling that Notepad++ and it’s associated plugin authors push Scintilla in ways that the Scintilla developers never expected; though presumably other apps that use Scintilla push things in different directions than we do.)

                    T C 2 Replies Last reply Jun 7, 2024, 11:02 AM Reply Quote 3
                    • T
                      Thomas Knoefel @PeterJones
                      last edited by Thomas Knoefel Jun 7, 2024, 11:03 AM Jun 7, 2024, 11:02 AM

                      I’ve been thinking about how the “Match Whole Word Only” search option might function:

                      1. Text Segmentation: The entire text is divided into chunks by separating at non-word characters, ensuring symbols like ‘+’ are not included in these chunks.
                      2. Search Within Chunks: The search then strictly focuses on these separated chunks.

                      Interestingly, spaces seem to act as primary separators, overruling other non-word characters if they are next to these characters and including adjacent non-word characters into the chunks.

                      This chunk preparation, which happens without analyzing the search string, likely makes the search process faster, especially if the text is pre-prepared. Finally the search will only focus on these seperated chunks.

                      A 1 Reply Last reply Jun 7, 2024, 11:14 AM Reply Quote 0
                      • A
                        Alan Kilborn @Thomas Knoefel
                        last edited by Jun 7, 2024, 11:14 AM

                        @Thomas-Knoefel

                        Interesting idea. If you’re a scripter, maybe mockup some demo with a script and show it here?

                        1 Reply Last reply Reply Quote 0
                        • C
                          Coises @PeterJones
                          last edited by Coises Jun 7, 2024, 5:02 PM Jun 7, 2024, 5:00 PM

                          @PeterJones said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                          I agree that Scintilla’s description doesn’t cover the edge cases, though it probably should.

                          Having just discovered — to some horror — that Notepad++ uses a modified version of Scintilla (context here ) I am no longer inclined to “blame” Scintilla for anything without first doing a lot of investigation.

                          I somehow just assumed that modifying Scintilla would be “off limits” for the Notepad++ project.

                          A 1 Reply Last reply Jun 7, 2024, 6:35 PM Reply Quote 3
                          • A
                            Alan Kilborn @Coises
                            last edited by Jun 7, 2024, 6:35 PM

                            @Coises said in "Whole Word Only" Option in Combination with Non-Alphanumeric Characters:

                            that modifying Scintilla would be “off limits” for the Notepad++ project

                            It mostly is.
                            But I think in some areas it was judged to be something that “had to be done”.

                            1 Reply Last reply Reply Quote 2
                            1 out of 13
                            • First post
                              1/13
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors