invisible characters make difficulties -- when searching



  • So every once in a while – just enough to make it really frustrating – a search I do will fail even though I can see the text is “RIGHT THERE” in my file.

    I may have tracked this down, but let me show an example first:

    Normally, if you try to search for something you’ve searched for before, the find history will only contain the entry once; but I was seeing:

    b478db02-373b-4253-95f3-5be3d5c6107d-image.png

    Note that “association” appears twice!

    I was finding (pun!), that depending upon which “association” I would choose, I’d get differing results for a Find All in Current Document search:

    22e1a23f-f4e6-44ed-94eb-98ed96a8d5fb-image.png

    So…I stopped what task I was involved in, to figure this thing out.

    I exited N++ (to get it to save the find-history), restarted, and then examined config.xml and therein lay the truth:

    b8c1844a-f714-44a0-bcbf-c2354113a670-image.png

    Copying the “strange” one out of config.xml and pasting into a new N++ tab, I saw:

    24f6c386-a1b9-4490-9174-b20362ad8719-image.png

    But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

    So, yes I can now see why some of my searches failed.
    But, I have no idea how an LTR “character” got into my find buffer (linked to “association”) so that it could make it into my find-history.

    Actually, I lie: I was copying and pasting some skype data earlier, and copying from skype can get you LTR characters, because skype has a great love for these, but it is only in the “date” area that skype uses them, example:

    0ea7a2b1-8817-42ac-a2a3-7e893a1aad4f-image.png

    So I can sort of see where my “LTR” problem today originated…sort of…

    So, long-story-already-long, I’m frustrated when this kind of “accident” happens, but I’m not sure of a great method of avoidance.

    To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

    Does anyone have any great ideas, or advice on what I can do about this (besides “be careful!”)?



  • @Alan-Kilborn said in invisible characters make difficulties -- when searching:

    To summarize, it doesn’t seem like I should have to resort to looking at config.xml in order to see that I have a special character embedded in the search string that I think I’m searching for.

    It won’t help avoid it, but to see the character easier than going to the config.xml source code, you could Ctrl+A to make sure you select all of the FIND field, then paste it into your editor which already has the character representation enabled, so you could at least see the invisible characters there, rather than having to do the round-trip.

    Without a feature request, I don’t know of a way to change representation in the FIND field itself. I might just request that FIND and REPLACE fields always convert any of the zero width characters (and your setRepresentation script already shows you the list of those) to \x#### notation.



  • @PeterJones

    Thanks for your thoughts on this.

    An “after the fact” thing I can do would be to copy the “search” line from the “Search results” window and paste into a N++ tab. This would also then reveal invisible characters.

    But…the key thing is feeling confident in my searches. Trusting them, in other words. Granted, this type of happening is rare, but…

    Another thought I had was turning on “representation” within the “Search results” window. However, I feel this type of thing might be beyond my Pythonscript abilities. In other words, I took a stab at it, but it didn’t reveal the special characters in that window. Maybe @Ekopalypse or @Eko-palypse (never quite sure which one) and his advanced skills are needed here.



  • @Alan-Kilborn said in invisible characters make difficulties -- when searching:

    Another thought I had was turning on “representation” within the “Search results” window.

    The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.



  • Ah ha! Success:

    947b2e03-c90e-47bf-ae9e-3fb3c592db88-image.png

    Now someone will ask me how I did that. :-(

    Peter said:

    The “representation” is a scintilla message, and the Find dialog doesn’t have a scintilla component involved. There might be a way to do something similar, but it probably would require @Ekopalypse’s ctype-library manipulation of the dialog components to affect them, rather than scintilla messages.

    “Search result” window doesn’t have a “direct” way either.
    It’s involved.
    But yep, that’s why I said “Now someone will ask me how I did that”



  • @Alan-Kilborn said in invisible characters make difficulties -- when searching:

    But I only saw this because I have a script which sets “invisible” UTF8 characters to be visible with a “representation”.

    Are you using Scintilla Character Representations?



  • @Michael-Vincent said in invisible characters make difficulties -- when searching:

    Are you using Scintilla Character Representations?

    Yes. Probably originating from a script in (probably) this thread: https://community.notepad-plus-plus.org/topic/14045/invisible-characters-unwanted

    But the kicker here is I’m (now) wanting to do it (and, lately, actually doing it) in the “Search result” window. Which, in Pythonscript, isn’t merely as simple as calling editor.setRepresentation().

    I suppose the onus is now on me to show how I’m doing it, since I started this… :-) I’ll put something together…

    EDIT: BTW, when I said:

    “Search result” window doesn’t have a “direct” way either.

    I should have said “pun (again) intended” because it involves using something called SciLexer.Scintilla_DirectFunction
    Again, more later when I have time to put something (concise) together.



  • @PeterJones said in invisible characters make difficulties -- when searching:

    the Find dialog doesn’t have a scintilla component involved

    Right. It might be hard (well, harder than what I ended up doing) to show something like this in the Find what box of the Find window.

    Why? Well, that box is pure Windows, not Scintilla, so I’m not even sure what “special, invisible” characters would appear like there. I’m fairly certain that each time a Ctrl+f pulls selected characters from a Scintilla window to populate the Find what control with, it undergoes a UTF8 (or whatever Scintilla/N++ encoding is actively set)-to-Windows-mbcs conversion (probably don’t have the exact terminology right here, but the gist is that I mean “whatever encoding Windows uses internally”).

    Bottom line, I’m happy with just the search-sanity check that seeing these characters in the “Search results” windows gives me. No more troubles of this nature for me!

    note 1: by “mbcs” I meant “multibyte-character-set”

    note 2: when I said I’m not even sure what “special, invisible” characters would appear like there, actually if I select just the LTR character and press Ctrl+f, I see “nothing”, except, no, my caret in the Find what control is now different than usual, it glimmers with a rainbow like effect…so this is helpful in one way, but not totally helpful.



  • BTW, lest some be confused by why my “Search results” window examples show:

    ⮞ "xxx" (0 hits in...

    rather than:

    Search "xxx" (0 hits in...

    It’s because I already know it is a Search and I don’t need to see that over and over and over each time I run a…search!

    So I changed a single line in my english_customizable.xml file:

    <find-result-title value="⮞"/>
    

    and got rid of the pointless info, as much as is reasonable. I could have set it to "" I suppose.



  • I’m a bit late on this, but I agree it would make more sense if this was already visible in the search dialog, so you are fully aware of what is being searched for.
    But I’m not sure if this is supported by the underlying edit control. Maybe NormalizeString could be helpful in this cases.



  • It makes sense to create an issue to “show non-printable characters in the find dialog” somehow.
    I assume even if the dialog doesn’t support that character directly, it might be displayed like “·” or something.
    And that doesn’t necessarily affect the data copied to the clipboard (i.e. it contains Unicode character as is).



  • Or maybe with a warning in the status bar of the search dialog ?



  • @Ekopalypse said in invisible characters make difficulties -- when searching:

    Or maybe with a warning in the status bar of the search dialog ?

    Actually, I really like that idea, and it’s probably the easiest to implement: if search-string contains invisible characters, then set status = "Warning: search string contains invisible character(s)" .



  • I think so too. From my naive point of view, I would guess that looping over the search string and comparing it to a vector of known zero-width characters shouldn’t take too long, since the search string itself is limited in width, (2048?), as far as I remember.



  • Could someone create a corresponding issue with the suggested options, please?
    https://github.com/notepad-plus-plus/notepad-plus-plus/issues


Log in to reply