• Login
Community
  • Login

UTF-8 without BOM

Scheduled Pinned Locked Moved General Discussion
n missingconversion opti
2 Posts 2 Posters 11.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Dan Gargus
    last edited by Oct 19, 2015, 5:39 PM

    A Wordpress bug fix suggests to convert erroring files to “UTF-8 without BOM” but I cannot find that conversion option.

    Can anyone tell me why it’s not available?

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Oct 20, 2015, 1:01 AM Oct 20, 2015, 12:54 AM

      Hello Dan,

      Well, starting with the v6.8.1 version of N++, the main menu Encoding has been, slightly, changed :

      • Before the v6.8.1 version, the Encode lines were Encode in ANSI, Encode in UTF-8 without BOM, Encode in UTF-8, Encode in UCS-2 Big Endian and Encode in UCS-2 Little Endian. And idem, for the Convert lines.

      • From the v6.8.1 version, the Encode lines are Encode in ANSI, Encode in UTF-8, Encode in UTF-8-BOM, Encode in UCS-2 BE BOM and Encode in UCS-2 LE BOM. And idem, for the Convert lines.


      NOTES :

      The BOM, also called Byte Order MArk, is an invisible character, of Unicode code point \xFEFF, which helps the application to detect the right encoding of the current file, as well as the Most and the Least significant bytes of the characters, in that file.

      So, depending of the Unicode encoding used, the representation of the hidden BOM character, beginning the file is :

      • The two bytes FE FF in the UCS-2 BE BOM

      • The two bytes FF FE in the UCS-2 LE BOM

      • The three bytes EF BB BF in the UTF-8 BOM encoding


      REMARKS :

      • The UTF-8 sequence EF BB BF is just the UTF-8 transformation format of the UNICODE value of the BOM ( FE FF ) !

      The N++ encoding, simply called UTF-8, means that all the characters of the file are UTF-8 encoded, but NO BOM is added, at the very beginning of the file. That’s the UNIQUE difference with the strict UTF-8-BOM encoding !

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      2 out of 2
      • First post
        2/2
        Last post
      The Community of users of the Notepad++ text editor.
      Powered by NodeBB | Contributors