Community
    • Login

    Forcing Notepad++ into Little Endian without BOM

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 7.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 백준영백
      백준영
      last edited by

      I got some problem changing encoding into Little Endian in notepad++.
      Reason for asking is this happens: http://imgur.com/a/vbcyP
      if I change all \x00 into blank, looks like this: http://imgur.com/a/vbcyP
      And this is what I wanted to see(not same file): http://imgur.com/a/vbcyP
      As you see, notepad++ detect it as Little endian, without BOM.

      Even if I change not working one into Little endian, it shows as LE BOM, not Little Endian.

      I have 10 files of these, with only two working properly.
      Is there any way to force Little endian without BOM?

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @백준영
        last edited by

        @백준영

        Is there any way to force Little endian without BOM?

        Looks like it is not.

        When you open a text file and you see strange symbols,
        it either means that the wrong encoding type was assumed
        or that the font you used wasn’t able to show a proper symbol.

        In your case it’s more likely an encoding issue.
        In such a case it is not a good idea to replace those chars
        with whatever chars you might think it is, as it most likely
        breaks the coding at all. Instead, use the encoding menu
        to findout the proper encoding.

        If you can only make it work with BOM and you need to have
        it without there is always the way to create the file
        with BOM and afterwards use a hexeditor to remove the
        BOM bytes which are always at the beginning of the file.

        BOMs used per UTF-X encoding

        Bytes 	        Encoding Form
        00 00 FE FF 	UTF-32, big-endian
        FF FE 00 00 	UTF-32, little-endian
        FE FF 	        UTF-16, big-endian
        FF FE 	        UTF-16, little-endian
        EF BB BF 	    UTF-8
        

        One other change I would do, because I assume
        you work with unicode files more often,
        is to use a different default encoding

        Settings->Preferences.>New Document->Encoding
        

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • gstaviG
          gstavi
          last edited by

          It is not clear if you are referring to loading or saving.

          And while in general it would be nice to allow a user to override encoding setting in either of them, it is not clear why would you want UTF16 without BOM. Unlike UTF8 in which BOM is just wrong, for UTF16, BOM is actually very helpful. UTF16 without BOM makes little sense.

          Claudia FrankC 1 Reply Last reply Reply Quote 0
          • Claudia FrankC
            Claudia Frank @gstavi
            last edited by

            @gstavi

            I assume your post refers to the original post but I do have a question.
            You stated

            Unlike UTF8 in which BOM is just wrong

            Why do you think so?

            I, personally, can’t see that, nowadays, it is still needed to have
            utf-8 with BOM encoded files but it isn’t wrong if there is one.
            See http://www.unicode.org/faq/utf_bom.html#bom4

            Cheers
            Claudia

            gstaviG 1 Reply Last reply Reply Quote 0
            • gstaviG
              gstavi @Claudia Frank
              last edited by

              @Claudia-Frank

              The main justification for BOM is to distinguish little endian from big endian. This is not needed for UTF-8.

              One can claim that BOM can distinguish ANSI from UTF-8.
              But there is little reason to open a file as ANSI today. It should be acceptable to open every ANSI/UTF8 file as UTF8 since ANSI is a subset of UTF8 so ANSI files are UTF8. It is true that ANSI file may use different code page for chars in the 128-255 range and these will load wrong as UTF8. But to load such file correctly into a unicode editor one will need to guess or ask the user what code page should be used to translate these chars into unicode code points and BOM does not help to detect the proper code page.

              Claudia FrankC Gabr-FG 2 Replies Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank @gstavi
                last edited by

                @gstavi

                I agree to 100% with what you’ve said - I just got confused by the statement

                Unlike UTF8 in which BOM is just wrong

                I thought I missed something but I assume you have the same understanding as I have.

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 0
                • Gabr-FG
                  Gabr-F @gstavi
                  last edited by Gabr-F

                  @gstavi I imagine you’re either not living in a country that has special characters or you don’t have many old files. Otherwise you’d still have to deal very frequently with files in your local codepage, and it would be very inconvenient to use a text editor that doesn’t default to your local codepage for files without BOM.

                  I have been saving all my new files in UTF-8 (always with BOM) for about 10 years but I still found it extremely frustrating the time I tried an editor which defaulted to UTF-8.

                  The fact that you can’t know what exact codepage a file is in (if it doesn’t have headers) is a minor problem for most people because the vast majority of your non-unicode files will be in your system’s default codepage.

                  I have to add that I used very little DOS, one who had would probably have a lot more problems, with the differences between the dos and windows’s default codepages.

                  gstaviG 1 Reply Last reply Reply Quote 0
                  • gstaviG
                    gstavi @Gabr-F
                    last edited by

                    @Gabr-F
                    I do live in a country with lots of special characters but thankfully I did not need to deal with old files for a long time.
                    Do remember that NPP need not be the only tool in the toolbox. You can use tools like iconv to convert old files to UTF8 in a batch.

                    1 Reply Last reply Reply Quote 0
                    • Gabr-FG
                      Gabr-F
                      last edited by

                      @gstavi I prefer not to touch my old files, I’m fond to them and like to leave them as they are :)
                      The advises of converting all files from one line ending to another or one encoding to another always looked like heresy to me :)

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors