UTF-8 doc becomes ANSI doc !

  • @Claudia-Frank

    Let’s assume a file contains Byte-sequence 20-A9-20 (in ANSI this would be Space-Copyright-Space). This Sequence is invalid in UTF-8 so NPP has no alternative other than assuming an single-Byte encoding. And since it never does changes to the file’s content on its own, it is left to treat such a file as ANSI (or whatever your favorite single-Byte encoding is).

    This is not a shortcoming of NPP but part of that single-Byte heritage we still have to deal with today.

  • @gerdb42

    I assume we have the same understanding so I’m interested to know
    what I have written that could be misunderstood?
    Could you point me to my error?

    Thank you and cheers

  • @Claudia-Frank said:
    Not quite an error, but

    I would also find it very useful if the setting
    New Document->Encoding: UTF-8 and Apply to opened ANSI files (or any other configured encoding)
    would force npp to treat all new opened documents as “configured encoding” when
    auto detection of encoding has been disabled.

    would require an implicit conversion to UTF-8. And besides breaking the principle of not doing changes without user action, it will pop up a whole bunch of other issues.

  • @gerdb42

    I agree that this would break the principle but on the other hand it could be beneficial as well.
    But, now as I’m typing I’m thinking, when this conversion takes place and you don’t know from which encoding it came from
    you might corrupt the document without knowing how to fix it.
    Yes - bad idea.


Log in to reply