Community
    • Login

    Interpret an Unicode value as real character in Notepad++

    Scheduled Pinned Locked Moved General Discussion
    14 Posts 5 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • vaso blgV
      vaso blg @PeterJones
      last edited by vaso blg

      @PeterJones actually, now as I re-read your post I would be interested in your script, for some other Unicode stuff (from time-to-time) but when I click the link it gives me 404 error (page does not exist)

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @vaso blg
        last edited by

        @vaso-blg said in Interpret an Unicode value as real character in Notepad++:

        @PeterJones actually, now as I re-read your post I would be interested in your script, but when I click the link it gives me 404 error (page does not exist)

        Oh, right, I copied an old link, but I had moved it to a subdirectory after the last time I posted a link to it. It’s now at https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/pyscReplaceBackslashSequence.py

        vaso blgV 2 Replies Last reply Reply Quote 1
        • vaso blgV
          vaso blg @PeterJones
          last edited by vaso blg

          @PeterJones yes, I already found it myself by exploring that page and already tested it, great, works as you said it would - very good addition to my solution (Snippets), thank you!

          1 Reply Last reply Reply Quote 0
          • vaso blgV
            vaso blg @PeterJones
            last edited by

            @PeterJones BTW don’t you have a reversed script, that would convert all the selected text into the Unicode values? That could be very interesting option to have indeed!!!

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @vaso blg
              last edited by

              @vaso-blg said in Interpret an Unicode value as real character in Notepad++:

              don’t you have a reversed script, that would convert all the selected text into the Unicode values?

              Not quite, but similar:
              https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/WhatUniChar.py

              5d45325e-23c2-4d93-82cf-c6e139abf73a-image.png

              It will update the status bar (until Notepad++'s next screen refresh) to show the codepoint of the single character at the typing caret. (It doesn’t do the whole selection, just a single character).

              vaso blgV 1 Reply Last reply Reply Quote 1
              • vaso blgV
                vaso blg @PeterJones
                last edited by vaso blg

                @PeterJones perfect + believe it or not, but once again I myself already downloaded exactly this script as the 2nd one expecting it should do something like that although I could not find a way how it operates, thinking it is not functioning or something (so luckily now I know that I have to look at the status bar and set the carret at the beggining instead of the end - one of the mistakes I was doing before you explained how it operates, haha - thank you!).

                1 Reply Last reply Reply Quote 0
                • rdipardoR
                  rdipardo
                  last edited by

                  For future visitors of this thread, note that @PeterJones’s alternative suggestion of surrogate pairs does in fact work with HTML Tag, since version 1.4 at least:

                  htmltag-U+1D400-decode

                  Conversions are reversible, i.e., literal Unicode pasted from the Web with code points above U+D800 will also be encoded as surrogates.

                  You can convert to and from the commonly used U+0000 format once you have configured the prefix in the settings:

                  htmltag-U+1D400-encode

                  There is currently a hard limit of U+DBFF for convertible code points, and only the first 4 digits are read. So, for example, U+1D400 will become \u1D40 -> ᵀ with the last 0 remaining as is, as reported above.

                  The official bug tracker is on GitHub.

                  1 Reply Last reply Reply Quote 3
                  • CoisesC
                    Coises
                    last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • CoisesC
                      Coises
                      last edited by Coises

                      The free and open source software WinCompose is an option.

                      This program converts one of your keys (by default the right Alt key) into a Compose key, which you press and release, then enter a short mnemonic sequence to choose a character. For example, Right-Alt a " to generate ä or Right-Alt o c to generate ©.

                      You can use Right-Alt u xxx Enter to generate a Unicode character. It works for the original poster’s example: Right-Alt u 1 d 4 0 0 Enter gives 𝐀.

                      I don’t use it on a regular basis, so I can’t comment on its stability; I used the portable version to write the text above, and also verified that it works in Notepad++.

                      (Note: I replaced my earlier version of this comment because I discovered the website for this software links to an out-of-date version. The GitHub version appears to be the most recent.)

                      1 Reply Last reply Reply Quote 3
                      • rdipardoR
                        rdipardo
                        last edited by rdipardo

                        FYI

                        The HTML Tag wiki now includes a more accurate description of its decoding limitations. Not sure yet if this can be improved while the source code is targeting the Free Pascal runtime, which tends to favour UTF-8 as being more compatible with the many different platforms it supports. Perhaps for the sake of legacy Delphi code, the WideString type is an exception:

                        WideStrings consist of COM compatible UTF16 encoded bytes on Windows machines (UCS2 on Windows 2000), and they are encoded as plain UTF16 on Linux, Mac OS X and iOS.

                        The P.E. header of a recent plugin DLL shows “4.0” as the minimum required OS version, even older than Windows 2K, so it’s possible that Unicode text is actually encoded as UCS2 (!).

                        The orignal developer seems to have assumed that a signed 16-bit SmallInt would be enough for all potential code points. They’ve been stored as 32-bit unsigned integers for a long time now, so there’s really no excuse for not extending the logic to decompose ordinals north of U+010000 into surrogate pairs and feeding them back into the decoder.

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors