Interpret an Unicode value as real character in Notepad++
-
@vaso-blg - Unfortunately, the usual tips and tricks you will discover in the Internet only work for U+0000 to U+FFFF. Those tips and tricks usually fail for Unicode characters in the range U+10000 to U+10FFFF which includes U+1D400. I have always done what you do which is to copy paste characters such as
𝐀
into Notepad++.Hopefully, someone else here will know of a trick that works within Notepad++ and allows for more direct entry of 1D400 and getting a
𝐀
. -
@mkupper said in Interpret an Unicode value as real character in Notepad++:
Hopefully, someone else here will know of a trick that works within Notepad++ and allows for more direct entry of 1D400 and getting a 𝐀.
I know of at least two ways:
- If you know the surrogate code
- example: 𝐀 , as described here, https://www.fileformat.info/info/unicode/char/1d400/index.htm =>
C/C++/Java source code "\uD835\uDC00"
has the surrogate codeU+D835 U+DC00
. - hold down
Alt
with one hand, and with the other, type+D835
(*) then releaseAlt
, then hold downAlt
and type+DC00
(*)
(*: the+
and all digits must be on numeric keypad; theD
and any of the non-digit hex characters can be on the normal keyboard) - once you have typed both hex sequences, 𝐀 will appear
- example: 𝐀 , as described here, https://www.fileformat.info/info/unicode/char/1d400/index.htm =>
- use my pyscReplaceBackslashSequence.py script in the PythonScript plugin, then type
\u1D400
then with the cursor just after that, run my script; it will convert it into the 𝐀
The first requires knowing the surrogate code, having enabled the right registry key to allow unicode
Alt
codes, having a keyboard that still has a numeric keypad, and getting good at doing those sequences. The second requires knowing the full codepoint, and having PythonScript plugin and my script (and is made easier if you map my script to a keyboard shortcut). Neither are reasonable if you have a large set of special characters that you want to be able to insert, but have not memorized them all.I usually just search that
fileformat.info
site for the Unicode characters I want, or launchcharmap.exe
(which I have in my Run command menu). But if you had an “emoji keyboard” app or some such, it might make finding the right emoji easier (similar to smartphone keyboard emoji inputs). (Caveat: I have adblocker onfileformat.info
, so its gazillion ads don’t bother me; I tend to forget that it’s ad-intensive when I recommend other people use it.) - If you know the surrogate code
-
@PeterJones thank you, in fact as the final solution I just used Snippets plugin and simply added all the characters there and now I can insert them as needed - simple and useful solution + I do not need to remember the codes :-D
-
@PeterJones actually, now as I re-read your post I would be interested in your script, for some other Unicode stuff (from time-to-time) but when I click the link it gives me 404 error (page does not exist)
-
@vaso-blg said in Interpret an Unicode value as real character in Notepad++:
@PeterJones actually, now as I re-read your post I would be interested in your script, but when I click the link it gives me 404 error (page does not exist)
Oh, right, I copied an old link, but I had moved it to a subdirectory after the last time I posted a link to it. It’s now at https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/pyscReplaceBackslashSequence.py
-
@PeterJones yes, I already found it myself by exploring that page and already tested it, great, works as you said it would - very good addition to my solution (Snippets), thank you!
-
@PeterJones BTW don’t you have a reversed script, that would convert all the selected text into the Unicode values? That could be very interesting option to have indeed!!!
-
@vaso-blg said in Interpret an Unicode value as real character in Notepad++:
don’t you have a reversed script, that would convert all the selected text into the Unicode values?
Not quite, but similar:
https://github.com/pryrt/nppStuff/blob/main/pythonScripts/useful/WhatUniChar.pyIt will update the status bar (until Notepad++'s next screen refresh) to show the codepoint of the single character at the typing caret. (It doesn’t do the whole selection, just a single character).
-
@PeterJones perfect + believe it or not, but once again I myself already downloaded exactly this script as the 2nd one expecting it should do something like that although I could not find a way how it operates, thinking it is not functioning or something (so luckily now I know that I have to look at the status bar and set the carret at the beggining instead of the end - one of the mistakes I was doing before you explained how it operates, haha - thank you!).
-
For future visitors of this thread, note that @PeterJones’s alternative suggestion of surrogate pairs does in fact work with HTML Tag, since version 1.4 at least:
Conversions are reversible, i.e., literal Unicode pasted from the Web with code points above
U+D800
will also be encoded as surrogates.You can convert to and from the commonly used
U+0000
format once you have configured the prefix in the settings:There is currently a hard limit of
U+DBFF
for convertible code points, and only the first 4 digits are read. So, for example,U+1D400
will become\u1D40 -> ᵀ
with the last0
remaining as is, as reported above.The official bug tracker is on GitHub.
-
This post is deleted! -
The free and open source software WinCompose is an option.
This program converts one of your keys (by default the right Alt key) into a Compose key, which you press and release, then enter a short mnemonic sequence to choose a character. For example, Right-Alt a " to generate ä or Right-Alt o c to generate ©.
You can use Right-Alt u xxx Enter to generate a Unicode character. It works for the original poster’s example: Right-Alt u 1 d 4 0 0 Enter gives 𝐀.
I don’t use it on a regular basis, so I can’t comment on its stability; I used the portable version to write the text above, and also verified that it works in Notepad++.
(Note: I replaced my earlier version of this comment because I discovered the website for this software links to an out-of-date version. The GitHub version appears to be the most recent.)
-
FYI
The HTML Tag wiki now includes a more accurate description of its decoding limitations. Not sure yet if this can be improved while the source code is targeting the Free Pascal runtime, which tends to favour UTF-8 as being more compatible with the many different platforms it supports. Perhaps for the sake of legacy Delphi code, the
WideString
type is an exception:WideStrings consist of COM compatible UTF16 encoded bytes on Windows machines (UCS2 on Windows 2000), and they are encoded as plain UTF16 on Linux, Mac OS X and iOS.
The P.E. header of a recent plugin DLL shows “4.0” as the minimum required OS version, even older than Windows 2K, so it’s possible that Unicode text is actually encoded as UCS2 (!).
The orignal developer seems to have assumed that a signed 16-bit
SmallInt
would be enough for all potential code points. They’ve been stored as 32-bit unsigned integers for a long time now, so there’s really no excuse for not extending the logic to decompose ordinals north ofU+010000
into surrogate pairs and feeding them back into the decoder.