UTF-8 without BOM
Dan Gargus last edited by
A Wordpress bug fix suggests to convert erroring files to “UTF-8 without BOM” but I cannot find that conversion option.
Can anyone tell me why it’s not available?
guy038 last edited by guy038
Well, starting with the v6.8.1 version of N++, the main menu Encoding has been, slightly, changed :
v6.8.1version, the Encode lines were Encode in ANSI, Encode in UTF-8 without BOM, Encode in UTF-8, Encode in UCS-2 Big Endian and Encode in UCS-2 Little Endian. And idem, for the Convert lines.
v6.8.1version, the Encode lines are Encode in ANSI, Encode in UTF-8, Encode in UTF-8-BOM, Encode in UCS-2 BE BOM and Encode in UCS-2 LE BOM. And idem, for the Convert lines.
The BOM, also called Byte Order MArk, is an invisible character, of Unicode code point
\xFEFF, which helps the application to detect the right encoding of the current file, as well as the Most and the Least significant bytes of the characters, in that file.
So, depending of the Unicode encoding used, the representation of the hidden BOM character, beginning the file is :
The two bytes
FE FFin the UCS-2 BE BOM
The two bytes
FF FEin the UCS-2 LE BOM
The three bytes
EF BB BFin the UTF-8 BOM encoding
- The UTF-8 sequence
EF BB BFis just the UTF-8 transformation format of the UNICODE value of the BOM (
FE FF) !
The N++ encoding, simply called UTF-8, means that all the characters of the file are UTF-8 encoded, but NO BOM is added, at the very beginning of the file. That’s the UNIQUE difference with the strict UTF-8-BOM encoding !