Encodage utf-8 sans BOM

bidule

Hi
Good morning,
For some time, the utf8 encoding without BOM is no longer in the list of encodings.
It’s a shame, for the development and proper functioning of a cms I need this encoding
Can review this position no longer put it?
This editor is very practical, I have been using it for a very long time
Cordially

Alan Kilborn

@bidule said in Encodage utf-8 sans BOM:

the utf8 encoding without BOM is no longer in the list of encodings

Which list are you referring to?

If you mean “on the Encoding menu”, then, these options are what you seek:

It is true that quite some time ago, these menu entries in Notepad++ had the text “without BOM” on them.

It’s probably better now, anyway. I mean, well, the old way could have said:

UTF-8 without BOM and without pickles and without salt

Better to say what is contained rather than what isn’t.

guy038

Hello, @bidule, @alan-kilborn and All,

@bidule, probably, your previous installed version was quite old. Because, in very very old Notepad++ releases, the Encoding menu look like this :

This picture is, for example, from the v.6.4.5 release of N++

Best Regards,

guy038

bidule

@guy038
Hi J know that,
th version is

I went back with this V7 version, because it is important to keep UTF-8 without BOM.
But why this abandonment?
Already in 2012, it had been reassembled!
Good day

Alan Kilborn

@bidule said in Encodage utf-8 sans BOM:

But why this abandonment?

There’s no “abandonment”.
If you want “without BOM”, in newer N++, simply chose the command that DOES NOT say “with BOM”.
This is the yellow highlighting in my screenshot earlier.
I realize that you are probably not a native speaker, but please tell us that you understand this.

guy038

Hi, @bidule, @alan-kilborn and All,

Well, I noticed that in my last v.7.9.2 version, compatible with Windows XP, my Encoding menu looks like this :

As you notice, between this v7.9.2 screenshot and the v8.5 screenshot of @alan-kilborn, in his post, there differences in the names of the non UTF_8/ANSI encodings :

For 7.9.2 release :
- UCS-2 BE BOM
- UCS-2 LE BOM
- Convert to UCS-2 BE BOM
- Convert to UCS-2 LE BOM
For 8.5 release and versions from v8.0 :
- UTF-16 BE BOM
- UTF-16 LE BOM
- Convert to UTF-16 BE BOM
- Convert to UTF-16 LE BOM

The differences is that :

The encodings relative to UCS-2 can ONLY encode characters of the BMP Unicode plane, between \x{0000} and \x{FFFF}
The encodings relative to UTF-16 can encode ALL Unicode characters, between \x{0000} and \x{10FFFF}, as well as the UTF-8 encoding

So, since the v8.0 release, there is a significant improvement about writing the exact characters, when they have an Unicode code-point over \x{FFFF} !

However, note that, when you want to search any character over the BMP so > \x{FFFF}, you MUST use the equivalent surrogate regex syntax of this character !

For instance the 💦 character, with the Unicode code-point 1F4A6 cannot be searched with the regex \x{1F4A6} but can be reached with its equivalent regex syntax \x{D83D}\x{DCA6}. Of course, you may also directly paste this specific character in the search field !

Refer to any Internet site relative to characters to get the correspondance between the hexadecimal code-point of a character and its surrogate value, expressed in a two consecutive double-byte string

Best Regards,

guy038