2-byte characters recently broken? Or do I misremember?
-
I was fairly sure that I recalled that Notepad++ supports 2-byte characters (i.e. an “a” with an umlaut over it, “ä”). However, recently, I notice that whenever I type such a character, save the text file in Notepad++, and then re-open the file, the ä gets replaced by a questionmark ?
-
Notepad++, even the newest v8.9.1.2 handles non-ASCII characters just fine.
You will want to check your encoding – make sure that Notepad++ thinks the encoding is what the file is actually encoded as. For example, if Notepad++ thinks it’s UTF8, but your file is actually one of the ANSI encodings (like the Windows 1252 character set), then the file will have a single byte 0xE4 for
ä
, but Notepad++ sees that as an incomplete UTF8 sequence, and doesn’t know what to do with it – 0xE4 is actually a byte that says to a UTF8 interpreter “this is the first byte of a 3-byte sequence”, but then there are no more bytes that meet proper UTF8 encoding that follow, so it shows a ? to indicate it’s reaction of “huh, what?”.So if you have a file that is showing ? instead of
ä
, look down in the status bar to see if Notepad++ thinks the file is UTF8 – it will say near the lower-right corner. If it does, try going to Encoding > ANSI and see if that now displays the file as you expect. -
@peterjones Apologies, I hadn’t seen that you’d replied.
Weirdness. The encoding is showing as “TIS-620”. (Thai …)
If I click on Encoding->ANSI or Encoding->UTF-8 the TIS-620 in the status bar does not change.
At the bottom left it says “Normal text file”.
Further thoughts appreciated, thanks. (n.b. this is now Notepad++ v8.1.9.3)
-Jay -
@jay-libove said in 2-byte characters recently broken? Or do I misremember?:
The encoding is showing as “TIS-620”. (Thai …)
It is probably your intent that the file is UTF-8?
And you have autodetection of encoding turned on in the Preferences?
Hmmm, there’s a known bug where UTF-8 files are detected as TIS-620 … maybe this is happening to you?Here are some references to this bug:
- https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10916
- https://github.com/notepad-plus-plus/notepad-plus-plus/search?q=TIS-620&type=issues
Autodetection is not an exact science (well, it hasn’t been proven to be, anyway). I came up with a method to mitigate this bug somewhat, you may want to have a look HERE.
Another way to “solve” this problem is to turn autodetect of encoding off. Then, with N++ settings as default, your file probably will show UTF-8 on the status bar after loading.
@jay-libove said in 2-byte characters recently broken? Or do I misremember?:
If I click on Encoding->ANSI or Encoding->UTF-8 the TIS-620 in the status bar does not change.
This is because Notepad++ thinks your file is encoded as TIS-620 and you are telling it to reinterpret it (without changing it) as UTF-8. Probably the reinterpret fails because of the corruption the bug has caused?
-
Thanks very much @Alan-Kilborn
I’ll jump in to the other thread (levicki).
-Jay