How to let NP++ auto-detect UTF-8 encoding correctly?

Ben S.

I exported a WhatsApp chat into a text file and transferred it from Android SmartPhone to Win7-PC.

When I load it into NP++ it is detected as ANSI encoding.
However all german Umlaute are displayed wrong.
When I manually switch encoding to UTF-8 everything is fine.

How can I tell NP++ to AUTOMATICALLY detect all such text files in the future as UTF-8 encoded?

When I apply “convert to UTF-8” to german Umlaute are NOT converted. They remain as wrong chars.
So this does not work

SalviaSage

Try turning on this option.
https://i.imgur.com/w1996uF.png

Ben S.

Thank you for suggestion, but it does NOT help.

Surprisingly the file is still identified as ANSI (as can be seen in the lower right part of the status bar).

BTW: The file line feeds are identifed as Unix(LF) if it helps.

So is there any other suggestion?

dinkumoil

@Ben-S

Automatic encoding detection is a difficult and unreliable thing. The algorithms work heuristically by inspecting the file’s content and can fail under some circumstances.

If your file names have a special file extension you could use my AutoCodepage plugin, available via Notepad++ PluginManager.

Otherwise there would be the following workaround:

Open Windows Notepad.
Press and hold the ALT-Key and type at the numeric block of the keyboard the sequence 0239.
Press and hold the ALT-Key and type at the numeric block of the keyboard the sequence 0187.
Press and hold the ALT-Key and type at the numeric block of the keyboard the sequence 0191.
Save the file under the name Header.txt in the folder where your file is stored but avoid to press ENTER before saving.
Open a Windows console and navigate to the folder where your file and the newly created Header.txt are stored.
Execute the following command:

copy /b “Header.txt” + “<Name-of-your-file>” “Result.txt”

With this sequence you will add an UTF-8 Byte Order Mark (BOM) to the beginning of your file and store it under the name Result.txt. When you open this file in Notepad++ it should be recognized as UTF-8 encoded.