UTF-8 doc becomes ANSI doc !

gerdb42

Let’s assume a file contains Byte-sequence 20-A9-20 (in ANSI this would be Space-Copyright-Space). This Sequence is invalid in UTF-8 so NPP has no alternative other than assuming an single-Byte encoding. And since it never does changes to the file’s content on its own, it is left to treat such a file as ANSI (or whatever your favorite single-Byte encoding is).

This is not a shortcoming of NPP but part of that single-Byte heritage we still have to deal with today.

Claudia Frank

@gerdb42

I assume we have the same understanding so I’m interested to know
what I have written that could be misunderstood?
Could you point me to my error?

Thank you and cheers
Claudia

gerdb42

@Claudia-Frank said:
Not quite an error, but

I would also find it very useful if the setting
New Document->Encoding: UTF-8 and Apply to opened ANSI files (or any other configured encoding)
would force npp to treat all new opened documents as “configured encoding” when
auto detection of encoding has been disabled.

would require an implicit conversion to UTF-8. And besides breaking the principle of not doing changes without user action, it will pop up a whole bunch of other issues.

Claudia Frank

@gerdb42

I agree that this would break the principle but on the other hand it could be beneficial as well.
But, now as I’m typing I’m thinking, when this conversion takes place and you don’t know from which encoding it came from
you might corrupt the document without knowing how to fix it.
Yes - bad idea.

Cheers
Claudia