Community
    • Login

    Encodage utf-8 sans BOM

    Scheduled Pinned Locked Moved General Discussion
    encodage
    6 Posts 3 Posters 885 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • biduleB
      bidule
      last edited by

      Hi
      Good morning,
      For some time, the utf8 encoding without BOM is no longer in the list of encodings.
      It’s a shame, for the development and proper functioning of a cms I need this encoding
      Can review this position no longer put it?
      This editor is very practical, I have been using it for a very long time
      Cordially

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @bidule
        last edited by Alan Kilborn

        @bidule said in Encodage utf-8 sans BOM:

        the utf8 encoding without BOM is no longer in the list of encodings

        Which list are you referring to?

        If you mean “on the Encoding menu”, then, these options are what you seek:

        d142dfe2-1e5c-43f3-9f29-a6ae6374a314-image.png

        It is true that quite some time ago, these menu entries in Notepad++ had the text “without BOM” on them.

        It’s probably better now, anyway. I mean, well, the old way could have said:

        UTF-8 without BOM and without pickles and without salt

        Better to say what is contained rather than what isn’t.

        1 Reply Last reply Reply Quote 3
        • guy038G
          guy038
          last edited by guy038

          Hello, @bidule, @alan-kilborn and All,

          @bidule, probably, your previous installed version was quite old. Because, in very very old Notepad++ releases, the Encoding menu look like this :

          2346878a-0429-4c9e-b4b4-6671ebe7b49e-Avant.PNG


          This picture is, for example, from the v.6.4.5 release of N++

          Best Regards,

          guy038

          biduleB 1 Reply Last reply Reply Quote 1
          • biduleB
            bidule @guy038
            last edited by

            @guy038
            Hi J know that,
            th version is
            ea1e9d82-99b2-4480-8e79-aa0f6a3257f9-image.png
            I went back with this V7 version, because it is important to keep UTF-8 without BOM.
            But why this abandonment?
            Already in 2012, it had been reassembled!
            Good day

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @bidule
              last edited by

              @bidule said in Encodage utf-8 sans BOM:

              But why this abandonment?

              There’s no “abandonment”.
              If you want “without BOM”, in newer N++, simply chose the command that DOES NOT say “with BOM”.
              This is the yellow highlighting in my screenshot earlier.
              I realize that you are probably not a native speaker, but please tell us that you understand this.

              1 Reply Last reply Reply Quote 2
              • guy038G
                guy038
                last edited by guy038

                Hi, @bidule, @alan-kilborn and All,

                Well, I noticed that in my last v.7.9.2 version, compatible with Windows XP, my Encoding menu looks like this :

                8ac3dce4-4a8f-424c-ae6e-7cfadd306630-792.PNG

                As you notice, between this v7.9.2 screenshot and the v8.5 screenshot of @alan-kilborn, in his post, there differences in the names of the non UTF_8/ANSI encodings :

                • For 7.9.2 release :

                  • UCS-2 BE BOM

                  • UCS-2 LE BOM

                  • Convert to UCS-2 BE BOM

                  • Convert to UCS-2 LE BOM

                • For 8.5 release and versions from v8.0 :

                  • UTF-16 BE BOM

                  • UTF-16 LE BOM

                  • Convert to UTF-16 BE BOM

                  • Convert to UTF-16 LE BOM


                The differences is that :

                • The encodings relative to UCS-2 can ONLY encode characters of the BMP Unicode plane, between \x{0000} and \x{FFFF}

                • The encodings relative to UTF-16 can encode ALL Unicode characters, between \x{0000} and \x{10FFFF}, as well as the UTF-8 encoding

                So, since the v8.0 release, there is a significant improvement about writing the exact characters, when they have an Unicode code-point over \x{FFFF} !


                However, note that, when you want to search any character over the BMP so > \x{FFFF}, you MUST use the equivalent surrogate regex syntax of this character !

                For instance the 💦 character, with the Unicode code-point 1F4A6 cannot be searched with the regex \x{1F4A6} but can be reached with its equivalent regex syntax \x{D83D}\x{DCA6}. Of course, you may also directly paste this specific character in the search field !

                Refer to any Internet site relative to characters to get the correspondance between the hexadecimal code-point of a character and its surrogate value, expressed in a two consecutive double-byte string

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors