Encodage utf-8 sans BOM
-
Hi
Good morning,
For some time, the utf8 encoding without BOM is no longer in the list of encodings.
It’s a shame, for the development and proper functioning of a cms I need this encoding
Can review this position no longer put it?
This editor is very practical, I have been using it for a very long time
Cordially -
@bidule said in Encodage utf-8 sans BOM:
the utf8 encoding without BOM is no longer in the list of encodings
Which list are you referring to?
If you mean “on the Encoding menu”, then, these options are what you seek:
It is true that quite some time ago, these menu entries in Notepad++ had the text “without BOM” on them.
It’s probably better now, anyway. I mean, well, the old way could have said:
UTF-8 without BOM and without pickles and without salt
Better to say what is contained rather than what isn’t.
-
Hello, @bidule, @alan-kilborn and All,
@bidule, probably, your previous installed version was quite old. Because, in very very old Notepad++ releases, the
Encoding
menu look like this :
This picture is, for example, from the
v.6.4.5
release of N++Best Regards,
guy038
-
@guy038
Hi J know that,
th version is
I went back with this V7 version, because it is important to keep UTF-8 without BOM.
But why this abandonment?
Already in 2012, it had been reassembled!
Good day -
@bidule said in Encodage utf-8 sans BOM:
But why this abandonment?
There’s no “abandonment”.
If you want “without BOM”, in newer N++, simply chose the command that DOES NOT say “with BOM”.
This is the yellow highlighting in my screenshot earlier.
I realize that you are probably not a native speaker, but please tell us that you understand this. -
Hi, @bidule, @alan-kilborn and All,
Well, I noticed that in my last
v.7.9.2
version, compatible with Windows XP, myEncoding
menu looks like this :As you notice, between this
v7.9.2
screenshot and thev8.5
screenshot of @alan-kilborn, in his post, there differences in the names of the nonUTF_8/ANSI
encodings :-
For
7.9.2
release :-
UCS-2 BE BOM
-
UCS-2 LE BOM
-
Convert to UCS-2 BE BOM
-
Convert to UCS-2 LE BOM
-
-
For
8.5
release and versions fromv8.0
:-
UTF-16 BE BOM
-
UTF-16 LE BOM
-
Convert to UTF-16 BE BOM
-
Convert to UTF-16 LE BOM
-
The differences is that :
-
The encodings relative to
UCS-2
can ONLY encode characters of theBMP
Unicode plane, between\x{0000}
and\x{FFFF}
-
The encodings relative to
UTF-16
can encode ALL Unicode characters, between\x{0000}
and\x{10FFFF}
, as well as theUTF-8
encoding
So, since the
v8.0
release, there is a significant improvement about writing the exact characters, when they have an Unicode code-point over\x{FFFF}
!
However, note that, when you want to search any character over the
BMP
so> \x{FFFF}
, you MUST use the equivalent surrogate regex syntax of this character !For instance the
💦
character, with the Unicode code-point1F4A6
cannot be searched with the regex\x{1F4A6}
but can be reached with its equivalent regex syntax\x{D83D}\x{DCA6}
. Of course, you may also directly paste this specific character in the search field !Refer to any Internet site relative to characters to get the correspondance between the hexadecimal code-point of a character and its
surrogate
value, expressed in a two consecutive double-byte stringBest Regards,
guy038
-