UTF-8 without BOM

Dan Gargus

A Wordpress bug fix suggests to convert erroring files to “UTF-8 without BOM” but I cannot find that conversion option.

Can anyone tell me why it’s not available?

guy038

Hello Dan,

Well, starting with the v6.8.1 version of N++, the main menu Encoding has been, slightly, changed :

Before the v6.8.1 version, the Encode lines were Encode in ANSI, Encode in UTF-8 without BOM, Encode in UTF-8, Encode in UCS-2 Big Endian and Encode in UCS-2 Little Endian. And idem, for the Convert lines.
From the v6.8.1 version, the Encode lines are Encode in ANSI, Encode in UTF-8, Encode in UTF-8-BOM, Encode in UCS-2 BE BOM and Encode in UCS-2 LE BOM. And idem, for the Convert lines.

NOTES :

The BOM, also called Byte Order MArk, is an invisible character, of Unicode code point \xFEFF, which helps the application to detect the right encoding of the current file, as well as the Most and the Least significant bytes of the characters, in that file.

So, depending of the Unicode encoding used, the representation of the hidden BOM character, beginning the file is :

The two bytes FE FF in the UCS-2 BE BOM
The two bytes FF FE in the UCS-2 LE BOM
The three bytes EF BB BF in the UTF-8 BOM encoding

REMARKS :

The UTF-8 sequence EF BB BF is just the UTF-8 transformation format of the UNICODE value of the BOM ( FE FF ) !

The N++ encoding, simply called UTF-8, means that all the characters of the file are UTF-8 encoded, but NO BOM is added, at the very beginning of the file. That’s the UNIQUE difference with the strict UTF-8-BOM encoding !

Best Regards,

guy038