Hi, @bidule, @alan-kilborn and All,
Well, I noticed that in my last v.7.9.2 version, compatible with Windows XP, my Encoding menu looks like this :
8ac3dce4-4a8f-424c-ae6e-7cfadd306630-792.PNG
As you notice, between this v7.9.2 screenshot and the v8.5 screenshot of @alan-kilborn, in his post, there differences in the names of the non UTF_8/ANSI encodings :
For 7.9.2 release :
UCS-2 BE BOM
UCS-2 LE BOM
Convert to UCS-2 BE BOM
Convert to UCS-2 LE BOM
For 8.5 release and versions from v8.0 :
UTF-16 BE BOM
UTF-16 LE BOM
Convert to UTF-16 BE BOM
Convert to UTF-16 LE BOM
The differences is that :
The encodings relative to UCS-2 can ONLY encode characters of the BMP Unicode plane, between \x{0000} and \x{FFFF}
The encodings relative to UTF-16 can encode ALL Unicode characters, between \x{0000} and \x{10FFFF}, as well as the UTF-8 encoding
So, since the v8.0 release, there is a significant improvement about writing the exact characters, when they have an Unicode code-point over \x{FFFF} !
However, note that, when you want to search any character over the BMP so > \x{FFFF}, you MUST use the equivalent surrogate regex syntax of this character !
For instance the đź’¦ character, with the Unicode code-point 1F4A6 cannot be searched with the regex \x{1F4A6} but can be reached with its equivalent regex syntax \x{D83D}\x{DCA6}. Of course, you may also directly paste this specific character in the search field !
Refer to any Internet site relative to characters to get the correspondance between the hexadecimal code-point of a character and its surrogate value, expressed in a two consecutive double-byte string
Best Regards,
guy038