Community
    • Login

    "Special characters" in Search Results window (encoding issues?)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    19 Posts 4 Posters 2.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse @Alan Kilborn
      last edited by

      @Alan-Kilborn

      The other usual suspects would be the font and technology (GDI/DirectX).
      By the way, which font do you use?
      None of my installed fonts show this symbol in the editor.

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Ekopalypse
        last edited by

        @Ekopalypse said in "Special characters" in Search Results window (encoding issues?):

        The other usual suspects would be the font and technology (GDI/DirectX).
        By the way, which font do you use?

        I have direct-write enabled.
        I play around with different fonts (can’t seem to find the best–for me); “Consolas” is the one currently in favor for me.

        None of my installed fonts show this symbol in the editor.

        You mean this actual character?:

        ⟯

        1 Reply Last reply Reply Quote 1
        • EkopalypseE
          Ekopalypse
          last edited by Ekopalypse

          @Alan-Kilborn said in "Special characters" in Search Results window (encoding issues?):

          You mean this actual character?:

          I guess so, even my browser refuses to display it

          ed4eb222-dd59-4c3f-8a40-c5eeef40f798-image.png

          But more I think about it, the less I’m convinced that it is a font or technology issue as you do get a representation - just a different one.

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @Ekopalypse
            last edited by

            @Ekopalypse said in "Special characters" in Search Results window (encoding issues?):

            What I hope npp does is

            • read the content from the document and convert it to utf16
            • search for the string in utf16 as well
            • display the search result in whatever encoding is needs to do

            When I noticed the following in the Scintilla documentation, it reminded me of Eko’s points above:

            SCI_ENCODEDFROMUTF8(const char *utf8, char *encoded) → position
            SCI_ENCODEDFROMUTF8 converts a UTF-8 string into the document’s encoding which is useful for taking the results of a find dialog, for example, and receiving a string of bytes that can be searched for in the document.

            I’m not sure what I’m saying by pointing this out; perhaps just noticing a somewhat common theme? :-)

            EkopalypseE 1 Reply Last reply Reply Quote 1
            • EkopalypseE
              Ekopalypse @Alan Kilborn
              last edited by

              @Alan-Kilborn

              since the Windows API uses utf16 for internal storage of strings, this does not sound efficient.
              I wonder why the scintilla devs thought that this might be a good idea.
              Hmm … most likely because the library is used on different platforms,
              but then why not have a compile time variable to convert it to utf16 on Windows and utf8 on Linux … ??

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Ekopalypse
                last edited by

                @Ekopalypse

                In hindsight, I probably shouldn’t have mentioned SCI_ENCODEDFROMUTF8 as I did a search of the Notepad++ source code, and it isn’t used there.

                1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn
                  last edited by

                  @Alan-Kilborn said in "Special characters" in Search Results window (encoding issues?):

                  When I put it into shortcuts.xml I just pasted it as a single character, but now I notice that it looks like this if I open the xml:
                  ⟯

                  So just providing some more data on this, after I experimented with it a bit more:

                  I see that if I record a macro using the multibyte unicode character (discussed much further up in the thread), instead of “hand editing” shortcuts.xml after-the-fact, when I run the macro I DO see the correct character appearing in the Find result window:

                  fb3060bd-84e4-42c4-bff1-a4b1f40ee59d-image.png
                  AND…

                  if I later look at the saved xml, I see this for that character in the macro:

                  ⟯

                  Which does indeed make sense.

                  So perhaps the error was mine and it comes down to directly inserting the unicode character into the XML instead of inserting its &#x.... code.

                  Alan KilbornA 1 Reply Last reply Reply Quote 1
                  • Alan KilbornA
                    Alan Kilborn @Alan Kilborn
                    last edited by

                    More thoughts:

                    From my immediately preceding post, it follows that anytime a “special” character is used in a N++ configuration xml file, the “html syntax”, example &#27EF should be used, rather than inserting the character “directly”, for example via a paste.

                    However, I notice in english_customizable.xml the following, which does not follow this idea:

                    <Item id="1721" name="▲"/>
                    <Item id="1723" name="▼ Find Next"/>
                    

                    But yet these items display correctly…
                    Okay, different usage from the above; these appear on buttons in the UI, the earlier discussion is some text in the Find result window…

                    But in general, I would be interest to know why these don’t require any special “treatment” in the xml.
                    Or what is the “rule”?
                    Always use the “html syntax” seems the “safest”.

                    Ideas? Or is the topic too “meh” for anyone to care? :-)

                    PeterJonesP 1 Reply Last reply Reply Quote 0
                    • PeterJonesP
                      PeterJones @Alan Kilborn
                      last edited by

                      @Alan-Kilborn said in "Special characters" in Search Results window (encoding issues?):

                      Always use the “html syntax” seems the “safest”.
                      Ideas? Or is the topic too “meh” for anyone to care? :-)

                      I don’t know enough about XML to know the default encoding, but my guess is Windows-1251.
                      If you added

                      <?xml version="1.0" encoding="UTF-8" ?>
                      

                      at the beginning of the shortcuts.xml (and reloaded Notepad++), does that allow you to hand-insert the https://graphemica.com/⟯ or other special character into the macro XML directly?

                      Alan KilbornA 1 Reply Last reply Reply Quote 2
                      • Alan KilbornA
                        Alan Kilborn @PeterJones
                        last edited by

                        @PeterJones said in "Special characters" in Search Results window (encoding issues?):

                        If you added

                        First, I’m a bit surprised that after adding that line manually, that Notepad++ allows it to remain (after recording a new macro, forcing N++ to rewrite shortcuts.xml).

                        Second, it was a good idea, but sadly, after trying it, I get the same result as earlier, specifically, “garbage” characters in the Find result window text.

                        Third, thanks for the interest, @PeterJones

                        1 Reply Last reply Reply Quote 2
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors