UTF-8 without BOM
-
A Wordpress bug fix suggests to convert erroring files to “UTF-8 without BOM” but I cannot find that conversion option.
Can anyone tell me why it’s not available?
-
Hello Dan,
Well, starting with the v6.8.1 version of N++, the main menu Encoding has been, slightly, changed :
-
Before the
v6.8.1
version, the Encode lines were Encode in ANSI, Encode in UTF-8 without BOM, Encode in UTF-8, Encode in UCS-2 Big Endian and Encode in UCS-2 Little Endian. And idem, for the Convert lines. -
From the
v6.8.1
version, the Encode lines are Encode in ANSI, Encode in UTF-8, Encode in UTF-8-BOM, Encode in UCS-2 BE BOM and Encode in UCS-2 LE BOM. And idem, for the Convert lines.
NOTES :
The BOM, also called Byte Order MArk, is an invisible character, of Unicode code point
\xFEFF
, which helps the application to detect the right encoding of the current file, as well as the Most and the Least significant bytes of the characters, in that file.So, depending of the Unicode encoding used, the representation of the hidden BOM character, beginning the file is :
-
The two bytes
FE FF
in the UCS-2 BE BOM -
The two bytes
FF FE
in the UCS-2 LE BOM -
The three bytes
EF BB BF
in the UTF-8 BOM encoding
REMARKS :
- The UTF-8 sequence
EF BB BF
is just the UTF-8 transformation format of the UNICODE value of the BOM (FE FF
) !
The N++ encoding, simply called UTF-8, means that all the characters of the file are UTF-8 encoded, but NO BOM is added, at the very beginning of the file. That’s the UNIQUE difference with the strict UTF-8-BOM encoding !
Best Regards,
guy038
-