Simple test file is losing its encoding :(



  • Egads! I have a problem that just cropped up - no idea why it is happening all of a sudden, as it has been working fine for years!

    Simple test case, I have a bunch of text surrounded by some color replacement tokens.


    §YThis is a ‘test’!§W
    §YThis is a ‘test’!§W
    §YThis is a ‘test’!§W
    §YThis is a ‘test’!§W
    §YThis is a ‘test’!§W
    §YThis is a ‘test’!§W

    File is initially created in ANSI encoding, and saved that way. Reopened, it seems to “lose” its encoding, and now looks like this!


    即This is a 'test’劬
    即This is a 'test’劬
    即This is a 'test’劬
    即This is a 'test’劬
    即This is a 'test’劬

    And the encoding is gone. :( Any ideas? I’ve tried all manner of things to try and stop it. Running the latest version of N++. Thanks in advance for any guidance!



  • If you have a backup of the file make a backup of it and open it in NPP.
    You can select Encoding > Convert to … to make changes to the file without its content also changing.

    Also, check default Encoding setting in Settings > Preferences. In “New Document” on the right side.
    Format and Encoding set here are for new files only tough.



  • I have a backup of the file, that isn’t a problem. The issue is that every time I make changes to it, save it, then reload it, it gets wonky!



  • I could not replicate that. If I created a new file, and Encoding > Encode in ANSI, then pasted in the text copied from the first half of your post, and save, when I reload the file (even if I rename it), it properly loads the same as before, and still claims to be encoded in ANSI in both the Encoding menu and the lower-right of the NPP status bar.

    Using the gnuwin32 copy of hexdump, I see the

    C:>hexdump 15054-renamed.txt
    00000000: A7 59 54 68 69 73 20 69 - 73 20 61 20 91 74 65 73 | YThis is a  tes|
    00000010: 74 92 21 A7 57 0D 0A A7 - 59 54 68 69 73 20 69 73 |t ! W   YThis is|
    00000020: 20 61 20 91 74 65 73 74 - 92 21 A7 57 0D 0A A7 59 | a  test ! W   Y|
    00000030: 54 68 69 73 20 69 73 20 - 61 20 91 74 65 73 74 92 |This is a  test |
    00000040: 21 A7 57 0D 0A A7 59 54 - 68 69 73 20 69 73 20 61 |! W   YThis is a|
    00000050: 20 91 74 65 73 74 92 21 - A7 57 0D 0A A7 59 54 68 |  test ! W   YTh|
    00000060: 69 73 20 69 73 20 61 20 - 91 74 65 73 74 92 21 A7 |is is a  test ! |
    00000070: 57 0D 0A A7 59 54 68 69 - 73 20 69 73 20 61 20 91 |W   YThis is a  |
    00000080: 74 65 73 74 92 21 A7 57 -                         |test ! W|
    00000088;
    

    (I also get similar using the xxd.exe that ships with VIM for Windows.)

    That’s exactly what I’d expect to see for ANSI encoding of that file.

    But, note: if I created the file by File > New, but with it in my default Encoding > Encode in UTF-8, and then paste into that new file, then incorrectly do Encoding > Encode in ANSI after pasting, it changes the high-bit characters (the § and smart quotes) into two-byte sequences that look like §Y and similar. If I save it, however, and reload (same name), it will come back in as UTF-8 again, and look right again. As such, it’s

    C:>hexdump 15054-wrong.txt
    00000000: A7 59 54 68 69 73 20 69 - 73 20 61 20 91 74 65 73 | YThis is a  tes|
    00000010: 74 92 21 A7 57 0D 0A A7 - 59 54 68 69 73 20 69 73 |t ! W   YThis is|
    00000020: 20 61 20 91 74 65 73 74 - 92 21 A7 57 0D 0A A7 59 | a  test ! W   Y|
    00000030: 54 68 69 73 20 69 73 20 - 61 20 91 74 65 73 74 92 |This is a  test |
    00000040: 21 A7 57 0D 0A A7 59 54 - 68 69 73 20 69 73 20 61 |! W   YThis is a|
    00000050: 20 91 74 65 73 74 92 21 - A7 57 0D 0A A7 59 54 68 |  test ! W   YTh|
    00000060: 69 73 20 69 73 20 61 20 - 91 74 65 73 74 92 21 A7 |is is a  test ! |
    00000070: 57 0D 0A A7 59 54 68 69 - 73 20 69 73 20 61 20 91 |W   YThis is a  |
    00000080: 74 65 73 74 92 21 A7 57 -                         |test ! W|
    00000088;
    

    That’s exactly what I’d expect for the UTF8-encoding for those characters.

    I can change my 15054-renamed.txt inside NPP to my heart’s content, save it, and reload it, and it still preserves the essential ANSI encoding, and it continues to behave properly on reload.

    For your copy of the file that has the encoding problem, what does hexdump (or similar tool) show?




Log in to reply