Community
    • Login

    Simple test file is losing its encoding :(

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott BurkettS
      Scott Burkett
      last edited by

      Egads! I have a problem that just cropped up - no idea why it is happening all of a sudden, as it has been working fine for years!

      Simple test case, I have a bunch of text surrounded by some color replacement tokens.


      §YThis is a ‘test’!§W
      §YThis is a ‘test’!§W
      §YThis is a ‘test’!§W
      §YThis is a ‘test’!§W
      §YThis is a ‘test’!§W
      §YThis is a ‘test’!§W

      File is initially created in ANSI encoding, and saved that way. Reopened, it seems to “lose” its encoding, and now looks like this!


      即This is a 'test’劬
      即This is a 'test’劬
      即This is a 'test’劬
      即This is a 'test’劬
      即This is a 'test’劬

      And the encoding is gone. :( Any ideas? I’ve tried all manner of things to try and stop it. Running the latest version of N++. Thanks in advance for any guidance!

      1 Reply Last reply Reply Quote 0
      • decodermanD
        decoderman
        last edited by

        If you have a backup of the file make a backup of it and open it in NPP.
        You can select Encoding > Convert to … to make changes to the file without its content also changing.

        Also, check default Encoding setting in Settings > Preferences. In “New Document” on the right side.
        Format and Encoding set here are for new files only tough.

        1 Reply Last reply Reply Quote 0
        • Scott BurkettS
          Scott Burkett
          last edited by

          I have a backup of the file, that isn’t a problem. The issue is that every time I make changes to it, save it, then reload it, it gets wonky!

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones
            last edited by

            I could not replicate that. If I created a new file, and Encoding > Encode in ANSI, then pasted in the text copied from the first half of your post, and save, when I reload the file (even if I rename it), it properly loads the same as before, and still claims to be encoded in ANSI in both the Encoding menu and the lower-right of the NPP status bar.

            Using the gnuwin32 copy of hexdump, I see the

            C:>hexdump 15054-renamed.txt
            00000000: A7 59 54 68 69 73 20 69 - 73 20 61 20 91 74 65 73 | YThis is a  tes|
            00000010: 74 92 21 A7 57 0D 0A A7 - 59 54 68 69 73 20 69 73 |t ! W   YThis is|
            00000020: 20 61 20 91 74 65 73 74 - 92 21 A7 57 0D 0A A7 59 | a  test ! W   Y|
            00000030: 54 68 69 73 20 69 73 20 - 61 20 91 74 65 73 74 92 |This is a  test |
            00000040: 21 A7 57 0D 0A A7 59 54 - 68 69 73 20 69 73 20 61 |! W   YThis is a|
            00000050: 20 91 74 65 73 74 92 21 - A7 57 0D 0A A7 59 54 68 |  test ! W   YTh|
            00000060: 69 73 20 69 73 20 61 20 - 91 74 65 73 74 92 21 A7 |is is a  test ! |
            00000070: 57 0D 0A A7 59 54 68 69 - 73 20 69 73 20 61 20 91 |W   YThis is a  |
            00000080: 74 65 73 74 92 21 A7 57 -                         |test ! W|
            00000088;
            

            (I also get similar using the xxd.exe that ships with VIM for Windows.)

            That’s exactly what I’d expect to see for ANSI encoding of that file.

            But, note: if I created the file by File > New, but with it in my default Encoding > Encode in UTF-8, and then paste into that new file, then incorrectly do Encoding > Encode in ANSI after pasting, it changes the high-bit characters (the § and smart quotes) into two-byte sequences that look like §Y and similar. If I save it, however, and reload (same name), it will come back in as UTF-8 again, and look right again. As such, it’s

            C:>hexdump 15054-wrong.txt
            00000000: A7 59 54 68 69 73 20 69 - 73 20 61 20 91 74 65 73 | YThis is a  tes|
            00000010: 74 92 21 A7 57 0D 0A A7 - 59 54 68 69 73 20 69 73 |t ! W   YThis is|
            00000020: 20 61 20 91 74 65 73 74 - 92 21 A7 57 0D 0A A7 59 | a  test ! W   Y|
            00000030: 54 68 69 73 20 69 73 20 - 61 20 91 74 65 73 74 92 |This is a  test |
            00000040: 21 A7 57 0D 0A A7 59 54 - 68 69 73 20 69 73 20 61 |! W   YThis is a|
            00000050: 20 91 74 65 73 74 92 21 - A7 57 0D 0A A7 59 54 68 |  test ! W   YTh|
            00000060: 69 73 20 69 73 20 61 20 - 91 74 65 73 74 92 21 A7 |is is a  test ! |
            00000070: 57 0D 0A A7 59 54 68 69 - 73 20 69 73 20 61 20 91 |W   YThis is a  |
            00000080: 74 65 73 74 92 21 A7 57 -                         |test ! W|
            00000088;
            

            That’s exactly what I’d expect for the UTF8-encoding for those characters.

            I can change my 15054-renamed.txt inside NPP to my heart’s content, save it, and reload it, and it still preserves the essential ANSI encoding, and it continues to behave properly on reload.

            For your copy of the file that has the encoding problem, what does hexdump (or similar tool) show?

            1 Reply Last reply Reply Quote 3
            • chcgC
              chcg
              last edited by

              Maybe related to this one https://github.com/notepad-plus-plus/notepad-plus-plus/issues/3188.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors