Forcing Notepad++ into Little Endian without BOM

백준영

I got some problem changing encoding into Little Endian in notepad++.
Reason for asking is this happens: http://imgur.com/a/vbcyP
if I change all \x00 into blank, looks like this: http://imgur.com/a/vbcyP
And this is what I wanted to see(not same file): http://imgur.com/a/vbcyP
As you see, notepad++ detect it as Little endian, without BOM.

Even if I change not working one into Little endian, it shows as LE BOM, not Little Endian.

I have 10 files of these, with only two working properly.
Is there any way to force Little endian without BOM?

Claudia Frank

@백준영

Is there any way to force Little endian without BOM?

Looks like it is not.

When you open a text file and you see strange symbols,
it either means that the wrong encoding type was assumed
or that the font you used wasn’t able to show a proper symbol.

In your case it’s more likely an encoding issue.
In such a case it is not a good idea to replace those chars
with whatever chars you might think it is, as it most likely
breaks the coding at all. Instead, use the encoding menu
to findout the proper encoding.

If you can only make it work with BOM and you need to have
it without there is always the way to create the file
with BOM and afterwards use a hexeditor to remove the
BOM bytes which are always at the beginning of the file.

BOMs used per UTF-X encoding

Bytes 	        Encoding Form
00 00 FE FF 	UTF-32, big-endian
FF FE 00 00 	UTF-32, little-endian
FE FF 	        UTF-16, big-endian
FF FE 	        UTF-16, little-endian
EF BB BF 	    UTF-8

One other change I would do, because I assume
you work with unicode files more often,
is to use a different default encoding

Settings->Preferences.>New Document->Encoding

Cheers
Claudia

gstavi

It is not clear if you are referring to loading or saving.

And while in general it would be nice to allow a user to override encoding setting in either of them, it is not clear why would you want UTF16 without BOM. Unlike UTF8 in which BOM is just wrong, for UTF16, BOM is actually very helpful. UTF16 without BOM makes little sense.

Claudia Frank

@gstavi

I assume your post refers to the original post but I do have a question.
You stated

Unlike UTF8 in which BOM is just wrong

Why do you think so?

I, personally, can’t see that, nowadays, it is still needed to have
utf-8 with BOM encoded files but it isn’t wrong if there is one.
See http://www.unicode.org/faq/utf_bom.html#bom4

Cheers
Claudia

gstavi

@Claudia-Frank

The main justification for BOM is to distinguish little endian from big endian. This is not needed for UTF-8.

One can claim that BOM can distinguish ANSI from UTF-8.
But there is little reason to open a file as ANSI today. It should be acceptable to open every ANSI/UTF8 file as UTF8 since ANSI is a subset of UTF8 so ANSI files are UTF8. It is true that ANSI file may use different code page for chars in the 128-255 range and these will load wrong as UTF8. But to load such file correctly into a unicode editor one will need to guess or ask the user what code page should be used to translate these chars into unicode code points and BOM does not help to detect the proper code page.

Claudia Frank

@gstavi

I agree to 100% with what you’ve said - I just got confused by the statement

Unlike UTF8 in which BOM is just wrong

I thought I missed something but I assume you have the same understanding as I have.

Cheers
Claudia

Gabr-F

@gstavi I imagine you’re either not living in a country that has special characters or you don’t have many old files. Otherwise you’d still have to deal very frequently with files in your local codepage, and it would be very inconvenient to use a text editor that doesn’t default to your local codepage for files without BOM.

I have been saving all my new files in UTF-8 (always with BOM) for about 10 years but I still found it extremely frustrating the time I tried an editor which defaulted to UTF-8.

The fact that you can’t know what exact codepage a file is in (if it doesn’t have headers) is a minor problem for most people because the vast majority of your non-unicode files will be in your system’s default codepage.

I have to add that I used very little DOS, one who had would probably have a lot more problems, with the differences between the dos and windows’s default codepages.

gstavi

@Gabr-F
I do live in a country with lots of special characters but thankfully I did not need to deal with old files for a long time.
Do remember that NPP need not be the only tool in the toolbox. You can use tools like iconv to convert old files to UTF8 in a batch.

Gabr-F

@gstavi I prefer not to touch my old files, I’m fond to them and like to leave them as they are :)
The advises of converting all files from one line ending to another or one encoding to another always looked like heresy to me :)