Community
    • Login

    invisible characters make difficulties -- when searching

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 5 Posters 608 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by Alan Kilborn

      So every once in a while – just enough to make it really frustrating – a search I do will fail even though I can see the text is “RIGHT THERE” in my file.

      I may have tracked this down, but let me show an example first:

      Normally, if you try to search for something you’ve searched for before, the find history will only contain the entry once; but I was seeing:

      b478db02-373b-4253-95f3-5be3d5c6107d-image.png

      Note that “association” appears twice!

      I was finding (pun!), that depending upon which “association” I would choose, I’d get differing results for a Find All in Current Document search:

      22e1a23f-f4e6-44ed-94eb-98ed96a8d5fb-image.png

      So…I stopped what task I was involved in, to figure this thing out.

      I exited N++ (to get it to save the find-history), restarted, and then examined config.xml and therein lay the truth:

      b8c1844a-f714-44a0-bcbf-c2354113a670-image.png

      Copying the “strange” one out of config.xml and pasting into a new N++ tab, I saw:

      24f6c386-a1b9-4490-9174-b20362ad8719-image.png

      But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

      So, yes I can now see why some of my searches failed.
      But, I have no idea how an LTR “character” got into my find buffer (linked to “association”) so that it could make it into my find-history.

      Actually, I lie: I was copying and pasting some skype data earlier, and copying from skype can get you LTR characters, because skype has a great love for these, but it is only in the “date” area that skype uses them, example:

      0ea7a2b1-8817-42ac-a2a3-7e893a1aad4f-image.png

      So I can sort of see where my “LTR” problem today originated…sort of…

      So, long-story-already-long, I’m frustrated when this kind of “accident” happens, but I’m not sure of a great method of avoidance.

      To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

      Does anyone have any great ideas, or advice on what I can do about this (besides “be careful!”)?

      PeterJonesP Michael VincentM 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Alan Kilborn
        last edited by

        @Alan-Kilborn said in invisible characters make difficulties -- when searching:

        To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

        It won’t help avoid it, but to see the character easier than going to the config.xml source code, you could Ctrl+A to make sure you select all of the FIND field, then paste it into your editor which already has the character representation enabled, so you could at least see the invisible characters there, rather than having to do the round-trip.

        Without a feature request, I don’t know of a way to change representation in the FIND field itself. I might just request that FIND and REPLACE fields always convert any of the zero width characters (and your setRepresentation script already shows you the list of those) to \x#### notation.

        Alan KilbornA 1 Reply Last reply Reply Quote 3
        • Alan KilbornA
          Alan Kilborn @PeterJones
          last edited by

          @PeterJones

          Thanks for your thoughts on this.

          An “after the fact” thing I can do would be to copy the “search” line from the “Search results” window and paste into a N++ tab. This would also then reveal invisible characters.

          But…the key thing is feeling confident in my searches. Trusting them, in other words. Granted, this type of happening is rare, but…

          Another thought I had was turning on “representation” within the “Search results” window. However, I feel this type of thing might be beyond my Pythonscript abilities. In other words, I took a stab at it, but it didn’t reveal the special characters in that window. Maybe @Ekopalypse or @Eko-palypse (never quite sure which one) and his advanced skills are needed here.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Alan Kilborn
            last edited by

            @Alan-Kilborn said in invisible characters make difficulties -- when searching:

            Another thought I had was turning on “representation” within the “Search results” window.

            The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn
              last edited by Alan Kilborn

              Ah ha! Success:

              947b2e03-c90e-47bf-ae9e-3fb3c592db88-image.png

              Now someone will ask me how I did that. :-(

              Peter said:

              The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.

              “Search result” window doesn’t have a “direct” way either.
              It’s involved.
              But yep, that’s why I said “Now someone will ask me how I did that”

              1 Reply Last reply Reply Quote 2
              • Michael VincentM
                Michael Vincent @Alan Kilborn
                last edited by

                @Alan-Kilborn said in invisible characters make difficulties -- when searching:

                But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

                Are you using Scintilla Character Representations?

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @Michael Vincent
                  last edited by Alan Kilborn

                  @Michael-Vincent said in invisible characters make difficulties -- when searching:

                  Are you using Scintilla Character Representations?

                  Yes. Probably originating from a script in (probably) this thread: https://community.notepad-plus-plus.org/topic/14045/invisible-characters-unwanted

                  But the kicker here is I’m (now) wanting to do it (and, lately, actually doing it) in the “Search result” window. Which, in Pythonscript, isn’t merely as simple as calling editor.setRepresentation().

                  I suppose the onus is now on me to show how I’m doing it, since I started this… :-) I’ll put something together…

                  EDIT: BTW, when I said:

                  “Search result” window doesn’t have a “direct” way either.

                  I should have said “pun (again) intended” because it involves using something called SciLexer.Scintilla_DirectFunction
                  Again, more later when I have time to put something (concise) together.

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn @PeterJones
                    last edited by Alan Kilborn

                    @PeterJones said in invisible characters make difficulties -- when searching:

                    the Find dialog doesn’t have a scintilla component involved

                    Right. It might be hard (well, harder than what I ended up doing) to show something like this in the Find what box of the Find window.

                    Why? Well, that box is pure Windows, not Scintilla, so I’m not even sure what “special, invisible” characters would appear like there. I’m fairly certain that each time a Ctrl+f pulls selected characters from a Scintilla window to populate the Find what control with, it undergoes a UTF8 (or whatever Scintilla/N++ encoding is actively set)-to-Windows-mbcs conversion (probably don’t have the exact terminology right here, but the gist is that I mean “whatever encoding Windows uses internally”).

                    Bottom line, I’m happy with just the search-sanity check that seeing these characters in the “Search results” windows gives me. No more troubles of this nature for me!

                    note 1: by “mbcs” I meant “multibyte-character-set”

                    note 2: when I said I’m not even sure what “special, invisible” characters would appear like there, actually if I select just the LTR character and press Ctrl+f, I see “nothing”, except, no, my caret in the Find what control is now different than usual, it glimmers with a rainbow like effect…so this is helpful in one way, but not totally helpful.

                    1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn
                      last edited by

                      BTW, lest some be confused by why my “Search results” window examples show:

                      ⮞ "xxx" (0 hits in...

                      rather than:

                      Search "xxx" (0 hits in...

                      It’s because I already know it is a Search and I don’t need to see that over and over and over each time I run a…search!

                      So I changed a single line in my english_customizable.xml file:

                      <find-result-title value="⮞"/>
                      

                      and got rid of the pointless info, as much as is reasonable. I could have set it to "" I suppose.

                      1 Reply Last reply Reply Quote 2
                      • EkopalypseE
                        Ekopalypse
                        last edited by

                        I’m a bit late on this, but I agree it would make more sense if this was already visible in the search dialog, so you are fully aware of what is being searched for.
                        But I’m not sure if this is supported by the underlying edit control. Maybe NormalizeString could be helpful in this cases.

                        1 Reply Last reply Reply Quote 3
                        • mere-humanM
                          mere-human
                          last edited by mere-human

                          It makes sense to create an issue to “show non-printable characters in the find dialog” somehow.
                          I assume even if the dialog doesn’t support that character directly, it might be displayed like “·” or something.
                          And that doesn’t necessarily affect the data copied to the clipboard (i.e. it contains Unicode character as is).

                          1 Reply Last reply Reply Quote 3
                          • EkopalypseE
                            Ekopalypse
                            last edited by

                            Or maybe with a warning in the status bar of the search dialog ?

                            PeterJonesP 1 Reply Last reply Reply Quote 2
                            • PeterJonesP
                              PeterJones @Ekopalypse
                              last edited by

                              @Ekopalypse said in invisible characters make difficulties -- when searching:

                              Or maybe with a warning in the status bar of the search dialog ?

                              Actually, I really like that idea, and it’s probably the easiest to implement: if search-string contains invisible characters, then set status = "Warning: search string contains invisible character(s)" .

                              1 Reply Last reply Reply Quote 3
                              • EkopalypseE
                                Ekopalypse
                                last edited by

                                I think so too. From my naive point of view, I would guess that looping over the search string and comparing it to a vector of known zero-width characters shouldn’t take too long, since the search string itself is limited in width, (2048?), as far as I remember.

                                1 Reply Last reply Reply Quote 2
                                • mere-humanM
                                  mere-human
                                  last edited by

                                  Could someone create a corresponding issue with the suggested options, please?
                                  https://github.com/notepad-plus-plus/notepad-plus-plus/issues

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors