@NolanNolan said in Standard ANSI and code still change to something else:
But really weird that using Microsofts own notepad.exe that comes with a standard windows installation makes windows search not detect characters in txt files that belongs to the installation language of the OS.
Perhaps not quite as strange as it might first appear.
Support for Unicode in Windows dates back to the first release of Windows NT in 1993. (NT was a “business” operating system; it took another eight years or so to get Unicode into “consumer” systems.) The thing is, Windows chose to support 16-bit characters: UCS-2, which later became UTF-16. UTF-8 wasn’t even presented publicly until 1993, and it took many more years for it to become popular. Most early adopters of Unicode, like Windows, used 16-bit “wide” characters.
So, for a long time, in Windows “Unicode” meant UTF-16. Windows XP (2001) introduced code page 65001 for UTF-8, but it was only useful in conversion functions and console sessions. In Windows 10 Version 1903 (May 2019), it became possible to set UTF-8 (65001) as the system code page; however, that doesn’t (yet, in 2025 at least) do as much as you might hope it would, and it can precipitate odd behavior in software. (I tested your specific case: setting Use Unicode UTF-8 for worldwide language support does not change how search in Windows Explorer interprets files without a byte order mark.)
Files using legacy (“ANSI”) encodings are too common to ignore, but, as @PeterJones pointed out in his earlier post in this thread, there is no completely reliable way to distinguish an “ANSI” encoding from UTF-8. Windows chose to use the byte order mark (already in use in UTF-16 files) to signal when a file is UTF-8. Windows simply does not recognize a file without a byte order mark as Unicode.
Notepad++ uses byte order marks, too, but it also recognizes when a file has a very high likelihood of being UTF-8 (without a byte order mark). This is possible because the details of UTF-8 encoding make it highly unlikely that a legacy text file will “accidentally” also be a valid UTF-8 file — unless it is very short, has been intentionally crafted to trigger false detection, or contains only ASCII characters. (Since ASCII characters are represented identically in UTF-8 and in legacy code pages, the last case only matters if you edit a file which contained only ASCII characters so that it contains one or more non-ASCII characters. In that case, it is important to set your intended encoding depending on how the file will be used.)
What you’re confronting is the difference between how Windows detects UTF-8 (must have a byte order mark) and how Notepad++ detects UTF-8 (valid UTF-8 byte sequence, which is statistically highly unlikely to be a legacy encoding).
There is no good solution to this without inventing a time machine and changing decisions that were made over three decades ago.
Well… no good solution that does not sacrifice reasonable backward compatibility. I consider that one of Windows’ best features, and I admire Microsoft for sticking to it. Twenty-year old programs can still run on current versions of Windows. I hate the culture of “If it’s not constantly maintained and upgraded, junk it!” that’s overtaken most of the computing world. A job once done well should stay done. (I suspect this has a lot to do with Microsoft’s dominance in business applications.) Not everyone shares my view.