Standard ANSI and code still change to something else
-
Hi
i use v8.8.7 on windows 11 pro v 25H2.
when i right click to create new txt, press enter and hereafter double click the txt file opening it i NPP, its ANSI.
When i enter some text and save its saved in ANSI as supposed.
But when i enter danish letters æøå and save its suddently saved in something else, eg windows 1255.
It shows encoding characterset Hebrew which is wrong, it should not show anything
Why is that ?
How to always create and save in ANSI ?
thanks
Nolan
-
“ANSI” is the American National Standards Institute. One of the things they did during 80s-era computing was define different encodings that put various character sets into the 256-character page limit of an 8-bit character. At some point, computer newbies confused the organization that defined them with the encodings themselves, and the world was stuck with incorrectly calling all those 256-character encodings “ANSI”, even when referring to the 8-bit encodings that were specific to Microsoft’s DOS and later Windows operating systems.
When doing encodings for the “Windows” GUI OS, MS thoughtfully named their encoding standards as “WIN-####” or “Windows ####” (people write them both ways). For example, Windows 1252 is the Microsoft encoding that’s nearly identical to ISO-8859-1 (ISO = “International Standards Organization”, an international body who makes standards, like ANSI is US-specific). Windows 1255, which you mentioned, is an encoding for Hebrew.
When Notepad++ says “ANSI”, what it means is, “using whichever 8-bit encoding your installation of Microsoft has set as the default character set / codepage” (which gets even more confusing now that you can confuse things by telling the OS to use 65001, which is the UTF-8 codepage, which causes many unexpected bugs, since Notepad++ is not expecting multi-byte characters when in ANSI mode).
But anyway, when you save a file, Notepad++ writes those bytes to disk based on the default Windows encoding; but the OS does not save any metadata about the encoding of the file (back in the DOS days, FAT and FAT32 didn’t have enough space to store such metadata; and when MS made the NTFS for NT, they could have added in metadata like encoding, but chose not to) – but that means, when any application, Notepad++ or otherwise, reads the file later, they have no way to know for sure what encoding was used for a given “text file” based on the information in the file itself or based on non-existent metadata. As a result, Notepad++ uses a set of heuristics to guess, based on byte frequency and byte sequences, what encoding it probably is. But it often guesses wrong, which is why my recommendation is to always turn off Settings > Preferences > MISC > Autodetect character encoding: assuming that the majority of “ANSI” text files you are reading are made by you on your same computer with the same default codepage/encoding, you shouldn’t need Notepad++ to “guess” what encoding it thinks it is: you can just let it always apply the Windows default encoding when it reads the file.
Or you could do something that brings you into this century, by using the UTF-8-with-BOM or one of the UTF-16 encodings, any of which will unambiguously be able to encode any of the 160k or so characters defined by Unicode – which allows you to mix characters from across the world without any ambiguity of 1980’s style 8-bit encodings. If you have a choice in your data, choose UTF-8 or UTF-16; if you have no choice, complain to whoever is not giving you the choice that they are hindering efficiency by forcing you to continue to use outdated 1980’s character sets instead of a modern encoding built to interface with the whole world.