Copy buffer converts NUL to SPACE



  • I was trying to take a binary file into a hex-edit-type mode, edit the hex strings, and convert it back to a binary.

    I started with the HEX-Editor plugin (0.9.5), but it wouldn’t let me paste into the hex side, and would start crashing NPP (6.8.6) when I’d do something silly like File | New.

    If I Select All, then use TextFX > TextFX Convert > Convert Text to Hex-16, I noticed that all NUL characters converted from 00 to 20.

    At first, I thought it was a bug with TextFX, so I tried with Plugins > Converter > ASCII to HEX, but it does the same thing

    I made a test case: first, I started with an artificially-generated Hex-16:

    "000000000  00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f   |0123456789abcdef|"
    "000000010  10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f   |0123456789abcdef|"
    "000000020  20 21 22 23 24 25 26 27-28 29 2a 2b 2c 2d 2e 2f   |0123456789abcdef|"
    "000000030  30 31 32 33 34 35 36 37-38 39 3a 3b 3c 3d 3e 3f   |0123456789abcdef|"
    "000000040  40 41 42 43 44 45 46 47-48 49 4a 4b 4c 4d 4e 4f   |0123456789abcdef|"
    "000000050  50 51 52 53 54 55 56 57-58 59 5a 5b 5c 5d 5e 5f   |0123456789abcdef|"
    "000000060  60 61 62 63 64 65 66 67-68 69 6a 6b 6c 6d 6e 6f   |0123456789abcdef|"
    "000000070  70 71 72 73 74 75 76 77-78 79 7a 7b 7c 7d 7e 7f   |0123456789abcdef|"
    "000000080  54 68 65 20 71 75 69 63-6B 20 62 7b 6F 77 6E 20   |The quick brown |"
    "000000090  66 6F 78 00 41 6E 6F 74-68 65 72 20 73 69 6C 6C   |fox Another sill|"
    "0000000A0  79 20 73 74 72 69 6E 67-00 59 65 74 20 61 6E 6F   |y string Yet ano|"
    "0000000B0  74 68 65 72 20 73 74 72-69 6E 67 20 00 00 00 00   |ther string     |"
    "0000000C0  00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f   |0123456789abcdef|"
    "0000000D0  10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f   |0123456789abcdef|"
    "0000000E0  20 21 22 23 24 25 26 27-28 29 2a 2b 2c 2d 2e 2f   |0123456789abcdef|"
    "0000000F0  30 31 32 33 34 35 36 37-38 39 3a 3b 3c 3d 3e 3f   |0123456789abcdef|"
    "000000100  40 41 42 43 44 45 46 47-48 49 4a 4b 4c 4d 4e 4f   |0123456789abcdef|"
    "000000110  50 51 52 53 54 55 56 57-58 59 5a 5b 5c 5d 5e 5f   |0123456789abcdef|"
    "000000120  60 61 62 63 64 65 66 67-68 69 6a 6b 6c 6d 6e 6f   |0123456789abcdef|"
    "000000130  70 71 72 73 74 75 76 77-78 79 7a 7b 7c 7d 7e 7f   |0123456789abcdef|"
    

    Then I select all, and run TextFX > TextFX Convert > Convert Hex to Text: this does what I would expect, and when I show all characters, I can see it properly creates the NUL character throughout.
    So then I try to reverse the process: Select All, TextFX > TextFX Convert > Convert text to Hex-16, and I get

    "000000000  20 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F   | xxxxxxxx..xx.xx|"
    "000000010  10 11 12 13 14 15 16 17-18 19 1A 1B 1C 1D 1E 1F   |xxxxxxxxxxxxxxxx|"
    ...
    "000000080  54 68 65 20 71 75 69 63-6B 20 62 7B 6F 77 6E 20   |The quick b{own |"
    "000000090  66 6F 78 20 41 6E 6F 74-68 65 72 20 73 69 6C 6C   |fox Another sill|"
    "0000000A0  79 20 73 74 72 69 6E 67-20 59 65 74 20 61 6E 6F   |y string Yet ano|"
    "0000000B0  74 68 65 72 20 73 74 72-69 6E 67 20 20 20 20 20   |ther string     |"
    

    … where all the NUL characters became 20 instead of 00 (I edited out the {SOH}{STX}{...} characters and converted them to x’s in the |...|-delimited section on the right for pasting into the forum, but those characters were all correct.)

    If I follow the same procedure, but with Plugins > Converter > ASCII to HEX, I get

    200102030405060708090A0B0C0D0E0F
    101112131415161718191A1B1C1D1E1F
    202122232425262728292A2B2C2D2E2F
    303132333435363738393A3B3C3D3E3F
    404142434445464748494A4B4C4D4E4F
    505152535455565758595A5B5C5D5E5F
    606162636465666768696A6B6C6D6E6F
    707172737475767778797A7B7C7D7E7F
    54686520717569636B20627B6F776E20
    666F7820416E6F746865722073696C6C
    7920737472696E672059657420616E6F
    7468657220737472696E672020202020
    200102030405060708090A0B0C0D0E0F
    101112131415161718191A1B1C1D1E1F
    202122232425262728292A2B2C2D2E2F
    303132333435363738393A3B3C3D3E3F
    404142434445464748494A4B4C4D4E4F
    505152535455565758595A5B5C5D5E5F
    606162636465666768696A6B6C6D6E6F
    707172737475767778797A7B7C7D7E7F
    

    … which once again shows 20 instead of 00 for NUL characters.

    I then took the binary version, and just copy/pasted into another file, and saw that most of the control characters stayed correct, but NUL became a space, too.

    At this point, I assume that the act of copying text into the copy-buffer (which I assume both TextFX Convert and Converter use internally) converts NUL to SPACE. Is there a setting which changes that behavior for the copy buffer, or any way I can “protect” my NUL characters when doing the conversion? Because I cannot just assume that all 20 in the resulting hex-dump are really NUL, not SPACE.



  • TL;DR?

    Short Version: When I copy a string that includes a NUL character, it appears to be translated into a SPACE before scripts or plugins get access to the buffer. Is there anyway to turn off this behavior, so that the NUL remains present in the copy buffer?



  • Hello PeterCJ-AtWork,

    this seems to be a scintilla feature which was introduced in 2013
    as discussed here.

    So, using npp for manipulating binary data doesn’t seem to work.

    Cheers
    Claudia



  • Thanks for the info. I guess I won’t be using my otherwise favorite editor for binary edits.



  • Hello, PeterCJ AtWork,

    I think that I found a work-around to your problem :-))

    • First, check the menu option TextFX - TextFX Viz Settings - Viz Paste/Append binary

    • Now, open and select any piece of text, which is part of a binary text

    • Select the menu option TextFX - TextFX Viz - Copy Visible Selection

    • Open a new tab ( CTRL + N )

    • If necessary, select the same encoding as the binary source text

    • Select the last menu option TextFX - TextFX Viz - Paste

    Well, you could think, at first sight, that it’s just OK !

    Unfortunately, ALL the LF ( \x0A )and CR ( \x0D )characters are wiped out, from the pasted text :-((

    Quite weird ! I did some tests but I could NOT find a setting/way which allows copy of, both, NUL and EOL characters


    So, we have to cheat, a bit ! Before copying the selection, with the TextFX command :

    • Change, in the binary source text, any character \n with any string ( of 1 or several characters long ), which does NOT exist, in the binary text

    • Change, in the binary source text, any character \r with any string ( of 1 or several characters long ), which does NOT exist, in the binary text

    • Once the paste action done, with simple regexes, just change the temporary strings, created for \n and \r, in order to get back to the original text -))

    Best Regards

    guy038



  • Thanks for that workaround. It helps. (Fortunately, this is something I only do every few months, not on a daily basis. Unfortunately, it means I’ll have to bookmark the workaround, because I’ll forget it every time. :-) )



  • Revisit three years later: thanks to @Meta-Chuh in Keep line endings on paste (no LF to CRLF or CRLF to LF)?, I learned of the Edit > Paste Special > Copy/Cut/Paste Binary Content feature, which apparently existed since v5.9 (so was available in v6.8.6 three years ago) which does allow for binary copy/paste. It doesn’t work in conjunction with the TextFX process originally described, but I thought I’d point out for all future viewers of this Topic that the copy/paste buffer can work with binary data inside Notepad++.


Log in to reply