Problem with Hexa editor add-on
-
Hello,I open a binary file with NotePad++ Hexa extension and with an other Hexa editor, and I do not get the same result. I best agree with the other one, as the file was generated with the code below, on matlab :
Signal = zeros(4*64000,1);
t=1;
for i=1 : 4000
Signal(t) = mod(t,200);
t=t+1;
end
FileID = fopen(fullfile(pwd, ‘output4.int16’), ‘w’);
fwrite(FileID,Signal,‘int16’);
fclose(FileID);Where is the problem ?
Thanks for the time you will take to consider my interrogation
Cédric Freycenon
-
Notepad++ is a text editor, so it reasonably assumes that anything you open with Notepad++ is a text file; that’s its job, after all. When it sees a consistent alternation of some byte followed by 0x00 byte, for the entire document, it reasonably guesses that you have a UCS-2 LE encoded file (because that’s what UCS-2 LE looks like for text), and treats each pair of bytes as a single character. Then when you run Plugins > Hex Editor > View in Hex, the plugin takes the characters rather than the bytes.
I just ran an experiment: I created a new true UCS-2 LE w/ BOM file (with text
12345
) in Notepad++, and saved it; an external hex dumperxxd.exe
shows:C:\Users\peter.jones\Downloads\TempData\nppCommunity>xxd ucs2le-1234.txt 00000000: fffe 3100 3200 3300 3400 3500 ..1.2.3.4.5.
it starts with the 0xFF 0xFE BOM, then the two-byte sequences for those 5 characters.
When i View in Hex on that file, the hex editor plugin shows
When I turn off the hex editor, it still says UCS-2 LE BOM
However, when I create a binary file with the bytes
01 00 02 00
,C:\Users\peter.jones\Downloads\TempData\nppCommunity>perl -e "print qq(\x01\x00\x02\x00)" > ucs2le-bytes.txt C:\Users\peter.jones\Downloads\TempData\nppCommunity>xxd ucs2le-bytes.txt 00000000: 0100 0200 ....
Notepad++ sees that as auto-interpreted UCS-2 Little Endian (without BOM):
And the hex editor plugin only displays the hex of the characters, rather than the hex of the indivdual bytes:
When you stop viewing in hex, the plugin sent back the characters, not the original bytes, and now Notepad++ thinks it’s a normal ANSI file:
and if i save that to disk, it has gotten rid of the00
bytes:C:\Users\peter.jones\Downloads\TempData\nppCommunity>xxd ucs2le-bytes.txt 00000000: 0102 ..
So, unfortunately, it looks like somewhere in the handoff between Notepad++ and the Hex Editor plugin, a UCS-2 LE file without BOM gets converted to ANSI encoding instead of UCS-2 LE encoding, so it drops the zero bytes.
If you are allowed to put a BOM (writing the bytes 0xFF and 0xFE (255 and 254) at the start of the output file from Matlab (though you then might have to also have matlab strip out those bytes if you later read that file into matlab again), then Notepad++ will truly believe it’s UCS-2 LE BOM, and the hex editor plugin will treat it that way. Or you could go through a temporary converter (not provided), which will add the BOM to the start of the file before you load it in Notepad++/HexEditor, then will strip the BOM after you’re done using it in Notepad++.
However, as I implied earlier, the fundamental issue is that you are expecting Notepad++, which is a text editor, to read and not mangle a binary file, which is not its primary purpose. If you are careful, it might sometimes work. But Notepad++ was not written to be a binary editor, so you should not expect it to work perfectly for something that it wasn’t intended. That said, the workaround shouldn’t be difficult for someone who can program in matlab.
-
Thanks for your answer.
best regardsCédric Freycenon