Community
    • Login

    Interpret an Unicode value as real character in Notepad++

    Scheduled Pinned Locked Moved General Discussion
    14 Posts 5 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • vaso blgV
      vaso blg
      last edited by

      If I copy an Unicode encoded value as actual rendered character (for example form Wikipedia’s Mathematical_Alphanumeric_Symbols - 1D400) and paste it into Notepad++, it really does show it as a “bold” character.

      But when I try to manually write it in the Notepad++ as the Unicode value (typed as \u1D400) and use plugin HTML Tag > Decode JS (as advised in post called “replace-unicode-values-with-characters-e-g-u00e9-with-%C3%A9”), it does not renders it into the “bold” character…and what’s even worse, it converts it into different 2 characters!

      Why is that and how to make Notepad++ converting the value into the same character visual as it shows it when I simply copy/paste it already rendered form a webpage?

      Am I doing something wrong? Can anyone show me the proper way how would I simply write the actual Unicode value into the notepad++ and then it would converts it to the correct character?

      mkupperM 1 Reply Last reply Reply Quote 0
      • mkupperM
        mkupper @vaso blg
        last edited by

        @vaso-blg - Unfortunately, the usual tips and tricks you will discover in the Internet only work for U+0000 to U+FFFF. Those tips and tricks usually fail for Unicode characters in the range U+10000 to U+10FFFF which includes U+1D400. I have always done what you do which is to copy paste characters such as 𝐀 into Notepad++.

        Hopefully, someone else here will know of a trick that works within Notepad++ and allows for more direct entry of 1D400 and getting a 𝐀.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @mkupper
          last edited by PeterJones

          @mkupper said in Interpret an Unicode value as real character in Notepad++:

          Hopefully, someone else here will know of a trick that works within Notepad++ and allows for more direct entry of 1D400 and getting a 𝐀.

          I know of at least two ways:

          1. If you know the surrogate code
            • example: 𝐀 , as described here, https://www.fileformat.info/info/unicode/char/1d400/index.htm => C/C++/Java source code "\uD835\uDC00" has the surrogate code U+D835 U+DC00.
            • hold down Alt with one hand, and with the other, type +D835 (*) then release Alt, then hold down Alt and type +DC00 (*)
              (*: the + and all digits must be on numeric keypad; the D and any of the non-digit hex characters can be on the normal keyboard)
            • once you have typed both hex sequences, 𝐀 will appear
          2. use my pyscReplaceBackslashSequence.py script in the PythonScript plugin, then type \u1D400 then with the cursor just after that, run my script; it will convert it into the 𝐀

          The first requires knowing the surrogate code, having enabled the right registry key to allow unicode Alt codes, having a keyboard that still has a numeric keypad, and getting good at doing those sequences. The second requires knowing the full codepoint, and having PythonScript plugin and my script (and is made easier if you map my script to a keyboard shortcut). Neither are reasonable if you have a large set of special characters that you want to be able to insert, but have not memorized them all.

          I usually just search that fileformat.info site for the Unicode characters I want, or launch charmap.exe (which I have in my Run command menu). But if you had an “emoji keyboard” app or some such, it might make finding the right emoji easier (similar to smartphone keyboard emoji inputs). (Caveat: I have adblocker on fileformat.info, so its gazillion ads don’t bother me; I tend to forget that it’s ad-intensive when I recommend other people use it.)

          vaso blgV 2 Replies Last reply Reply Quote 1
          • vaso blgV
            vaso blg @PeterJones
            last edited by

            @PeterJones thank you, in fact as the final solution I just used Snippets plugin and simply added all the characters there and now I can insert them as needed - simple and useful solution + I do not need to remember the codes :-D

            1 Reply Last reply Reply Quote 1
            • vaso blgV
              vaso blg @PeterJones
              last edited by vaso blg

              @PeterJones actually, now as I re-read your post I would be interested in your script, for some other Unicode stuff (from time-to-time) but when I click the link it gives me 404 error (page does not exist)

              PeterJonesP 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @vaso blg
                last edited by

                @vaso-blg said in Interpret an Unicode value as real character in Notepad++:

                @PeterJones actually, now as I re-read your post I would be interested in your script, but when I click the link it gives me 404 error (page does not exist)

                Oh, right, I copied an old link, but I had moved it to a subdirectory after the last time I posted a link to it. It’s now at https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/pyscReplaceBackslashSequence.py

                vaso blgV 2 Replies Last reply Reply Quote 1
                • vaso blgV
                  vaso blg @PeterJones
                  last edited by vaso blg

                  @PeterJones yes, I already found it myself by exploring that page and already tested it, great, works as you said it would - very good addition to my solution (Snippets), thank you!

                  1 Reply Last reply Reply Quote 0
                  • vaso blgV
                    vaso blg @PeterJones
                    last edited by

                    @PeterJones BTW don’t you have a reversed script, that would convert all the selected text into the Unicode values? That could be very interesting option to have indeed!!!

                    PeterJonesP 1 Reply Last reply Reply Quote 0
                    • PeterJonesP
                      PeterJones @vaso blg
                      last edited by

                      @vaso-blg said in Interpret an Unicode value as real character in Notepad++:

                      don’t you have a reversed script, that would convert all the selected text into the Unicode values?

                      Not quite, but similar:
                      https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/WhatUniChar.py

                      5d45325e-23c2-4d93-82cf-c6e139abf73a-image.png

                      It will update the status bar (until Notepad++'s next screen refresh) to show the codepoint of the single character at the typing caret. (It doesn’t do the whole selection, just a single character).

                      vaso blgV 1 Reply Last reply Reply Quote 1
                      • vaso blgV
                        vaso blg @PeterJones
                        last edited by vaso blg

                        @PeterJones perfect + believe it or not, but once again I myself already downloaded exactly this script as the 2nd one expecting it should do something like that although I could not find a way how it operates, thinking it is not functioning or something (so luckily now I know that I have to look at the status bar and set the carret at the beggining instead of the end - one of the mistakes I was doing before you explained how it operates, haha - thank you!).

                        1 Reply Last reply Reply Quote 0
                        • rdipardoR
                          rdipardo
                          last edited by

                          For future visitors of this thread, note that @PeterJones’s alternative suggestion of surrogate pairs does in fact work with HTML Tag, since version 1.4 at least:

                          htmltag-U+1D400-decode

                          Conversions are reversible, i.e., literal Unicode pasted from the Web with code points above U+D800 will also be encoded as surrogates.

                          You can convert to and from the commonly used U+0000 format once you have configured the prefix in the settings:

                          htmltag-U+1D400-encode

                          There is currently a hard limit of U+DBFF for convertible code points, and only the first 4 digits are read. So, for example, U+1D400 will become \u1D40 -> ᵀ with the last 0 remaining as is, as reported above.

                          The official bug tracker is on GitHub.

                          1 Reply Last reply Reply Quote 3
                          • CoisesC
                            Coises
                            last edited by

                            This post is deleted!
                            1 Reply Last reply Reply Quote 0
                            • CoisesC
                              Coises
                              last edited by Coises

                              The free and open source software WinCompose is an option.

                              This program converts one of your keys (by default the right Alt key) into a Compose key, which you press and release, then enter a short mnemonic sequence to choose a character. For example, Right-Alt a " to generate ä or Right-Alt o c to generate ©.

                              You can use Right-Alt u xxx Enter to generate a Unicode character. It works for the original poster’s example: Right-Alt u 1 d 4 0 0 Enter gives 𝐀.

                              I don’t use it on a regular basis, so I can’t comment on its stability; I used the portable version to write the text above, and also verified that it works in Notepad++.

                              (Note: I replaced my earlier version of this comment because I discovered the website for this software links to an out-of-date version. The GitHub version appears to be the most recent.)

                              1 Reply Last reply Reply Quote 3
                              • rdipardoR
                                rdipardo
                                last edited by rdipardo

                                FYI

                                The HTML Tag wiki now includes a more accurate description of its decoding limitations. Not sure yet if this can be improved while the source code is targeting the Free Pascal runtime, which tends to favour UTF-8 as being more compatible with the many different platforms it supports. Perhaps for the sake of legacy Delphi code, the WideString type is an exception:

                                WideStrings consist of COM compatible UTF16 encoded bytes on Windows machines (UCS2 on Windows 2000), and they are encoded as plain UTF16 on Linux, Mac OS X and iOS.

                                The P.E. header of a recent plugin DLL shows “4.0” as the minimum required OS version, even older than Windows 2K, so it’s possible that Unicode text is actually encoded as UCS2 (!).

                                The orignal developer seems to have assumed that a signed 16-bit SmallInt would be enough for all potential code points. They’ve been stored as 32-bit unsigned integers for a long time now, so there’s really no excuse for not extending the logic to decompose ordinals north of U+010000 into surrogate pairs and feeding them back into the decoder.

                                1 Reply Last reply Reply Quote 3
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors