2-byte characters recently broken? Or do I misremember?
-
I was fairly sure that I recalled that Notepad++ supports 2-byte characters (i.e. an “a” with an umlaut over it, “ä”). However, recently, I notice that whenever I type such a character, save the text file in Notepad++, and then re-open the file, the ä gets replaced by a questionmark ?
-
Notepad++, even the newest v8.9.1.2 handles non-ASCII characters just fine.
You will want to check your encoding – make sure that Notepad++ thinks the encoding is what the file is actually encoded as. For example, if Notepad++ thinks it’s UTF8, but your file is actually one of the ANSI encodings (like the Windows 1252 character set), then the file will have a single byte 0xE4 for
ä, but Notepad++ sees that as an incomplete UTF8 sequence, and doesn’t know what to do with it – 0xE4 is actually a byte that says to a UTF8 interpreter “this is the first byte of a 3-byte sequence”, but then there are no more bytes that meet proper UTF8 encoding that follow, so it shows a ? to indicate it’s reaction of “huh, what?”.So if you have a file that is showing ? instead of
ä, look down in the status bar to see if Notepad++ thinks the file is UTF8 – it will say near the lower-right corner. If it does, try going to Encoding > ANSI and see if that now displays the file as you expect. -
@peterjones Apologies, I hadn’t seen that you’d replied.
Weirdness. The encoding is showing as “TIS-620”. (Thai …)
If I click on Encoding->ANSI or Encoding->UTF-8 the TIS-620 in the status bar does not change.
At the bottom left it says “Normal text file”.
Further thoughts appreciated, thanks. (n.b. this is now Notepad++ v8.1.9.3)
-Jay -
@jay-libove said in 2-byte characters recently broken? Or do I misremember?:
The encoding is showing as “TIS-620”. (Thai …)
It is probably your intent that the file is UTF-8?
And you have autodetection of encoding turned on in the Preferences?
Hmmm, there’s a known bug where UTF-8 files are detected as TIS-620 … maybe this is happening to you?Here are some references to this bug:
- https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10916
- https://github.com/notepad-plus-plus/notepad-plus-plus/search?q=TIS-620&type=issues
Autodetection is not an exact science (well, it hasn’t been proven to be, anyway). I came up with a method to mitigate this bug somewhat, you may want to have a look HERE.
Another way to “solve” this problem is to turn autodetect of encoding off. Then, with N++ settings as default, your file probably will show UTF-8 on the status bar after loading.
@jay-libove said in 2-byte characters recently broken? Or do I misremember?:
If I click on Encoding->ANSI or Encoding->UTF-8 the TIS-620 in the status bar does not change.
This is because Notepad++ thinks your file is encoded as TIS-620 and you are telling it to reinterpret it (without changing it) as UTF-8. Probably the reinterpret fails because of the corruption the bug has caused?
-
Thanks very much @Alan-Kilborn
I’ll jump in to the other thread (levicki).
-Jay
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login