Seeking Clarification on Entering Alt Keypad Characters
-
I was trying to find the keystrokes needed to create the “shrug” emoji and the middle Unicode character 0xE38384 ( ツ - Japanese Katakana Tu or Tsu) seems to require using the Alt plus numpad + followed by the full hex sequence, but when I try to use Alt with the numpad plus there is no response.
Alt with the numpad digits works fine, but any other key used with Alt triggers menu items.
The workaround was to use the Hex–>ASCII Converter plugin, which is OK, but curious to confirm the Alt plus numpad plus sign just won’t work.
Thanks in advance.
-
@haleba-hotmail said in Seeking Clarification on Entering Alt Keypad Characters:
middle Unicode character 0xE38384 ( ツ - Japanese Katakana Tu or Tsu)
You have one definite misunderstanding, and maybe a second.
First, 0xE38384 is not the Unicode Code Point for ツ (¹). U+30C4 is the codepoint. 0xE38384 is the three-byte sequence in the UTF-8 encoding of unicode for the U+30C4 character, ツ. You should be using
ALT +30C4
sequence to enter that in any Windows application, not just Notepad++.The second, which I’m not sure whether or not you understand, is that the
+
in this case isn’t just saying “hold down the alt key while typing the rest of the sequence”. The plus is actually part of the sequence:ALT +30C4
means “hold down ALT key, then type+
on the numeric keypad, then type30C4
, where the3
and0
and4
must all also be on the numeric keypad”.That said, there are a couple more caveats for the
ALT +30C4
sequence.- As detailed http://www.fileformat.info/tip/microsoft/enter_unicode.htm, in the Method 1: Universal section,
Alas, this appears to require a registry setting. It was already set on my computer, but some readers report that this method didn’t work for them, and this is probably why. If you don’t know what the registry is, please don’t try this. Under
HKEY_Current_User/Control Panel/Input Method
, setEnableHexNumpad
to “1”. If you have to add it, set the type to beREG_SZ
. - Sometimes, it has to do with timing; it can be difficult to get all 5 of those characters typed before Windows gives up and starts interpreting them as individual keystrokes again, instead of the unicode-entry-escape combo
FOOTNOTE 1: The first misunderstanding was probably compounded by the way that HEX->ASCII works.
The reason why Plugins > Converter > Hex->ASCII has you enterE38384
is because it is working with bytes, not characters, and it is internally using UTF-8, so after it converts the 6 hex nibbles into 3 bytes, ut recognizes those three bytes as a single character; when it pastes back into Notepad++ editor pane, it converts that character to the appropriate encoding for the active editor. For example, if I have a file containingE38384
saved as UCS-2 LE BOM:C:\Users\peter.jones\Downloads\TempData\nppCommunity>xxd ucs2le.txt 00000000: fffe 4500 3300 3800 3300 3800 3400 ..E.3.8.3.8.4.
Then I select those six characters and run the HEX->ASCII command, it enters the ツ character. Then I save, and now on disk, I have:
C:\Users\peter.jones\Downloads\TempData\nppCommunity>xxd ucs2le.txt 00000000: fffe c430 ...0
which is the little-endian for BOM then U+30C4.
- As detailed http://www.fileformat.info/tip/microsoft/enter_unicode.htm, in the Method 1: Universal section,
-
Hello, @haleba-hotmail, @peterjones and All,
First, in your post, you’re speaking about 2 characters, one char part of the Basic Multilingual Plane (
BMP
) and the other character outside theBMP
These are :-
The KATAKANA letter TU ( = TSU )
ツ
(\x{30C4}
), from the Unicode block Katakana, in range30A0–30FF
-
The SHRUG
🤷
portrait symbol (\x{1F937}
) from the Unicode block Supplemental Symbols and Pictographs, in range1F900–1F9FF
The main characteristics of these two chars are :
Character ツ Character name KATAKANA LETTER TU Hex code point 30C4 Decimal code point 12484 Hex UTF-8 bytes E3 83 84 Octal UTF-8 bytes 343 203 204 UTF-8 bytes as Latin-1 characters bytes ã <83> <84>
and
Character 🤷 Character name SHRUG Hex code point 1F937 Decimal code point 129335 Hex UTF-8 bytes F0 9F A4 B7 Octal UTF-8 bytes 360 237 244 267 UTF-8 bytes as Latin-1 characters bytes ð <9F> ¤ · Hex UTF-16 Surrogates D83E DD37
I got information on these characters, from an useful on-line
UTF-8
tool, described in the last section of the post below :https://community.notepad-plus-plus.org/post/50983
I must say that I did not pay attention, until now, to the
Converter
plugin, of @don-ho !!-
Seemingly, if you select one or some consecutive character(s) and use the option
Plugins > Converter > ASCII -> HEX
, it correctly writes the hexadecimal byte(s), needed to encode this/these character(s) inUTF8
, or inANSI
for the255
-characters allowed block ! -
IMPORTANT : Even if your current encoding is
UCS-2 BE BOM
orUCS-2 LE BOM
, it still shows the hexadecimal bytes, used in anUTF-8
or anUTF-8 BOM
file, to encode this/these characters :-( In any case, it’s best to avoid these two encodings because they cannot handle characters which are over theBMP
, like your SHRUG symbol !
For instance, in an
UTF-8
file, the selection of the string🤷Aツé
and then the optionPlugins > Converter > ASCII -> HEX
gives the resultF09FA4B741E38384C3A9
, because :-
The
🤷
character is coded with the 4-bytes UTF-8 sequenceF09FA4B7
-
The
A
character is coded with the 1-byte UTF-8 sequence41
-
The
ツ
character is coded with the 3-bytes UTF-8 sequenceE38384
-
The
é
character is coded with the 2-bytes UTF-8 sequenceC3A9
And, in an
ANSI
file, the selection of the stringAé
, with the optionPlugins > Converter > ASCII -> HEX
gives the result41E9
because :-
The
A
character is coded with the 1-byte ANSI sequence41
-
The
é
character is coded with the 1-byte ANSI sequenceE9
In the same way, if you select one or some consecutive hexadecimal bytes and use the option
Plugins > Converter > HEX -> ASCII
, it correctly writes the corresponding glyphs of this/these character(s), produced by the current font, in anUTF-8
orANSI
file. For instance, selecting the sequenceF09FA4B741E38384C3A9
, does give back our 4chars string🤷Aツé
Now, regarding the different Windows input methods, I strongly advice you to read this post, first, where I recapitulate all the different Windows input methods :
https://community.notepad-plus-plus.org/topic/18903/regex-misidentifying-foreign-characters/6
And, in its last section, looks the reference to a nice monospaced font, which correctly writes almost the majority of all the Unicode characters, even those which are outside the
BMP
As said in that post, after modifying the registry ( be careful ! ), you may directly insert, for instance, the KATAKANA letter TU, following these steps :
-
Hold down the
Alt
key and, successively : -
Hit the
+
key, on the numeric keypad -
Hit the
3
key, on the numeric keypad -
Hit the
0
key, on the numeric keypad -
Hit the
C
key, on the main keyboard -
Hit the
4
key , on the numeric keypad -
Release the
Alt
key
=> Immediately, the
ツ
character should be inserted at cursor location ;-))However, note that the Shrug symbol cannot be inserted, even using this powerful input method, because its code-point
1F937
is greater than\x{FFFF}
! You’ll have to use, in that case, an on-line tool to get these characters, from their Unicode code-point, in the range\x{10000}
-\x{10FFFF}
, as, for instance, theUTF-8
tool described above !Best Regards,
guy038
P.S. : I started writing this post, before the @peterjones reply. Also, some parts may be redundant ;-))
-