Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    invisible characters make difficulties -- when searching

    Help wanted · · · – – – · · ·
    5
    15
    155
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan Kilborn
      Alan Kilborn last edited by Alan Kilborn

      So every once in a while – just enough to make it really frustrating – a search I do will fail even though I can see the text is “RIGHT THERE” in my file.

      I may have tracked this down, but let me show an example first:

      Normally, if you try to search for something you’ve searched for before, the find history will only contain the entry once; but I was seeing:

      b478db02-373b-4253-95f3-5be3d5c6107d-image.png

      Note that “association” appears twice!

      I was finding (pun!), that depending upon which “association” I would choose, I’d get differing results for a Find All in Current Document search:

      22e1a23f-f4e6-44ed-94eb-98ed96a8d5fb-image.png

      So…I stopped what task I was involved in, to figure this thing out.

      I exited N++ (to get it to save the find-history), restarted, and then examined config.xml and therein lay the truth:

      b8c1844a-f714-44a0-bcbf-c2354113a670-image.png

      Copying the “strange” one out of config.xml and pasting into a new N++ tab, I saw:

      24f6c386-a1b9-4490-9174-b20362ad8719-image.png

      But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

      So, yes I can now see why some of my searches failed.
      But, I have no idea how an LTR “character” got into my find buffer (linked to “association”) so that it could make it into my find-history.

      Actually, I lie: I was copying and pasting some skype data earlier, and copying from skype can get you LTR characters, because skype has a great love for these, but it is only in the “date” area that skype uses them, example:

      0ea7a2b1-8817-42ac-a2a3-7e893a1aad4f-image.png

      So I can sort of see where my “LTR” problem today originated…sort of…

      So, long-story-already-long, I’m frustrated when this kind of “accident” happens, but I’m not sure of a great method of avoidance.

      To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

      Does anyone have any great ideas, or advice on what I can do about this (besides “be careful!”)?

      PeterJones Michael Vincent 2 Replies Last reply Reply Quote 0
      • PeterJones
        PeterJones @Alan Kilborn last edited by

        @Alan-Kilborn said in invisible characters make difficulties -- when searching:

        To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

        It won’t help avoid it, but to see the character easier than going to the config.xml source code, you could Ctrl+A to make sure you select all of the FIND field, then paste it into your editor which already has the character representation enabled, so you could at least see the invisible characters there, rather than having to do the round-trip.

        Without a feature request, I don’t know of a way to change representation in the FIND field itself. I might just request that FIND and REPLACE fields always convert any of the zero width characters (and your setRepresentation script already shows you the list of those) to \x#### notation.

        Alan Kilborn 1 Reply Last reply Reply Quote 3
        • Alan Kilborn
          Alan Kilborn @PeterJones last edited by

          @PeterJones

          Thanks for your thoughts on this.

          An “after the fact” thing I can do would be to copy the “search” line from the “Search results” window and paste into a N++ tab. This would also then reveal invisible characters.

          But…the key thing is feeling confident in my searches. Trusting them, in other words. Granted, this type of happening is rare, but…

          Another thought I had was turning on “representation” within the “Search results” window. However, I feel this type of thing might be beyond my Pythonscript abilities. In other words, I took a stab at it, but it didn’t reveal the special characters in that window. Maybe @Ekopalypse or @Eko-palypse (never quite sure which one) and his advanced skills are needed here.

          PeterJones 1 Reply Last reply Reply Quote 0
          • PeterJones
            PeterJones @Alan Kilborn last edited by

            @Alan-Kilborn said in invisible characters make difficulties -- when searching:

            Another thought I had was turning on “representation” within the “Search results” window.

            The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.

            Alan Kilborn 1 Reply Last reply Reply Quote 0
            • Alan Kilborn
              Alan Kilborn last edited by Alan Kilborn

              Ah ha! Success:

              947b2e03-c90e-47bf-ae9e-3fb3c592db88-image.png

              Now someone will ask me how I did that. :-(

              Peter said:

              The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.

              “Search result” window doesn’t have a “direct” way either.
              It’s involved.
              But yep, that’s why I said “Now someone will ask me how I did that”

              1 Reply Last reply Reply Quote 2
              • Michael Vincent
                Michael Vincent @Alan Kilborn last edited by

                @Alan-Kilborn said in invisible characters make difficulties -- when searching:

                But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

                Are you using Scintilla Character Representations?

                Alan Kilborn 1 Reply Last reply Reply Quote 0
                • Alan Kilborn
                  Alan Kilborn @Michael Vincent last edited by Alan Kilborn

                  @Michael-Vincent said in invisible characters make difficulties -- when searching:

                  Are you using Scintilla Character Representations?

                  Yes. Probably originating from a script in (probably) this thread: https://community.notepad-plus-plus.org/topic/14045/invisible-characters-unwanted

                  But the kicker here is I’m (now) wanting to do it (and, lately, actually doing it) in the “Search result” window. Which, in Pythonscript, isn’t merely as simple as calling editor.setRepresentation().

                  I suppose the onus is now on me to show how I’m doing it, since I started this… :-) I’ll put something together…

                  EDIT: BTW, when I said:

                  “Search result” window doesn’t have a “direct” way either.

                  I should have said “pun (again) intended” because it involves using something called SciLexer.Scintilla_DirectFunction
                  Again, more later when I have time to put something (concise) together.

                  1 Reply Last reply Reply Quote 2
                  • Alan Kilborn
                    Alan Kilborn @PeterJones last edited by Alan Kilborn

                    @PeterJones said in invisible characters make difficulties -- when searching:

                    the Find dialog doesn’t have a scintilla component involved

                    Right. It might be hard (well, harder than what I ended up doing) to show something like this in the Find what box of the Find window.

                    Why? Well, that box is pure Windows, not Scintilla, so I’m not even sure what “special, invisible” characters would appear like there. I’m fairly certain that each time a Ctrl+f pulls selected characters from a Scintilla window to populate the Find what control with, it undergoes a UTF8 (or whatever Scintilla/N++ encoding is actively set)-to-Windows-mbcs conversion (probably don’t have the exact terminology right here, but the gist is that I mean “whatever encoding Windows uses internally”).

                    Bottom line, I’m happy with just the search-sanity check that seeing these characters in the “Search results” windows gives me. No more troubles of this nature for me!

                    note 1: by “mbcs” I meant “multibyte-character-set”

                    note 2: when I said I’m not even sure what “special, invisible” characters would appear like there, actually if I select just the LTR character and press Ctrl+f, I see “nothing”, except, no, my caret in the Find what control is now different than usual, it glimmers with a rainbow like effect…so this is helpful in one way, but not totally helpful.

                    1 Reply Last reply Reply Quote 1
                    • Alan Kilborn
                      Alan Kilborn last edited by

                      BTW, lest some be confused by why my “Search results” window examples show:

                      ⮞ "xxx" (0 hits in...

                      rather than:

                      Search "xxx" (0 hits in...

                      It’s because I already know it is a Search and I don’t need to see that over and over and over each time I run a…search!

                      So I changed a single line in my english_customizable.xml file:

                      <find-result-title value="⮞"/>
                      

                      and got rid of the pointless info, as much as is reasonable. I could have set it to "" I suppose.

                      1 Reply Last reply Reply Quote 2
                      • Ekopalypse
                        Ekopalypse last edited by

                        I’m a bit late on this, but I agree it would make more sense if this was already visible in the search dialog, so you are fully aware of what is being searched for.
                        But I’m not sure if this is supported by the underlying edit control. Maybe NormalizeString could be helpful in this cases.

                        1 Reply Last reply Reply Quote 3
                        • mere-human
                          mere-human last edited by mere-human

                          It makes sense to create an issue to “show non-printable characters in the find dialog” somehow.
                          I assume even if the dialog doesn’t support that character directly, it might be displayed like “·” or something.
                          And that doesn’t necessarily affect the data copied to the clipboard (i.e. it contains Unicode character as is).

                          1 Reply Last reply Reply Quote 3
                          • Ekopalypse
                            Ekopalypse last edited by

                            Or maybe with a warning in the status bar of the search dialog ?

                            PeterJones 1 Reply Last reply Reply Quote 2
                            • PeterJones
                              PeterJones @Ekopalypse last edited by

                              @Ekopalypse said in invisible characters make difficulties -- when searching:

                              Or maybe with a warning in the status bar of the search dialog ?

                              Actually, I really like that idea, and it’s probably the easiest to implement: if search-string contains invisible characters, then set status = "Warning: search string contains invisible character(s)" .

                              1 Reply Last reply Reply Quote 3
                              • Ekopalypse
                                Ekopalypse last edited by

                                I think so too. From my naive point of view, I would guess that looping over the search string and comparing it to a vector of known zero-width characters shouldn’t take too long, since the search string itself is limited in width, (2048?), as far as I remember.

                                1 Reply Last reply Reply Quote 2
                                • mere-human
                                  mere-human last edited by

                                  Could someone create a corresponding issue with the suggested options, please?
                                  https://github.com/notepad-plus-plus/notepad-plus-plus/issues

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright © 2014 NodeBB Forums | Contributors