Community
    • Login

    BUG: N++ does not keep in UTF8 unsaved open files

    Scheduled Pinned Locked Moved General Discussion
    bugcyrillicutf8 encoding
    7 Posts 3 Posters 79 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dz15mlruD
      dz15mlru
      last edited by dz15mlru

      BUG,
      I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost. After converting back to UTF8 the text remains to be malformed.
      https://i.imgur.com/jt05fe5.png

      CoisesC PeterJonesP 2 Replies Last reply Reply Quote 0
      • CoisesC
        Coises @dz15mlru
        last edited by

        @dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:

        I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost.

        First, would you open the ? menu, select Debug Into… and paste the information here? That helps make sure we know some details that can be important when analyzing bugs.

        Second, at Settings | Preferences… | MISC. is the box labeled Autodetect character encoding checked? If it is, try unchecking it. That option sometimes does more harm than good.

        in recent times

        If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening.

        After converting back to UTF8 the text remains to be malformed.

        Once the text in the edit window is garbled, using one of the Encoding | Convert to options will never help. Those just convert what you’re already seeing to a different encoding.

        I never use persistent unsaved files, so hopefully someone else will come along with experience about how to manage an unsaved file carried over from a previous session (I assume that’s the condition you’re describing) that opens in the wrong encoding. If it were a saved file that you were opening anew, the right thing to do would be to select the correct encoding from the top of the Encoding menu (not the Convert to options at the bottom) before making any changes. I don’t know if that works with persistent unsaved files, though.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @dz15mlru
          last edited by

          @dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:

          BUG,
          I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh”

          If you have a new UTF-8 file, the session file stores its encoding as “-1”, which I believe means it will use its auto-detect the next time around. And the auto-detection that Notepad++ uses is imperfect (as any encoding autodetection will be; this is explaied in the new “Encoding” section in the User Manual – but I just discovered that the manual stopped publishing updates a few day ago, so until it publishes, you can read the encoding description in the repo instead)

          As @coises suggested while I was writing this up, try turning off the auto-detection, and it should prevent that in the future.

          And the “Convert to…” won’t work to fix what you are seeing on your existing files… but maybe Encoding > UTF-8 will cause it to re-interpret the bytes correctly (assuming the bytes haven’t been re-written at this point to something else).

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Coises
            last edited by

            @Coises said in BUG: N++ does not keep in UTF8 unsaved open files:

            If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening

            Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that. So “in recent times” may have been that additional text was added to those files which make the algorithm think it looks like “Cyrillic -> Macintosh” should look.

            CoisesC 1 Reply Last reply Reply Quote 0
            • CoisesC
              Coises @PeterJones
              last edited by

              @PeterJones said in BUG: N++ does not keep in UTF8 unsaved open files:

              Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that. So “in recent times” may have been that additional text was added to those files which make the algorithm think it looks like “Cyrillic -> Macintosh” should look.

              The thing is… it is very unusual for a UTF-8 file of any size that contains non-ASCII characters to “look like” anything but UTF-8. (Unless Cyrillic/Macintosh is some strange exception.) I suspect something is going on here that we haven’t heard about yet.

              One possibility might be if new files are set to open as UTF-8 but “Apply to opened ANSI files” is not checked, then the user exits when a file has only ASCII characters; on re-opening, perhaps (as I said, I don’t use Remember current session) Notepad++ opens it as ANSI, the user doesn’t notice and adds non-ASCII characters. Now it really would be in something other than UTF-8 — but why it would be mis-identified as the wrong Cyrillic code page, I don’t know.

              1 Reply Last reply Reply Quote 0
              • dz15mlruD
                dz15mlru
                last edited by

                @Coises said in BUG: N++ does not keep in UTF8 unsaved open files:

                at Settings | Preferences… | MISC. is the box labeled Autodetect character encoding checked? If it is, try unchecking it. That option sometimes does more harm than good.

                Yes, is cheked. I’ll try to disable it, but not sure when I’ll see the changes, the result.

                @Coises said in BUG: N++ does not keep in UTF8 unsaved open files:

                If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening.

                Well, few weeks ago I’ve encountered a few BSODs, caused by a faulty RAM unit or slot. Fixed by removing one. I have had N++ open at that time in at least one incident. I was happy after that that N++ did not lost my huge session of unsaved files, and apparently everything was ok - at least in the most recent open files, but they mainly were in standard Latin alphabet content. But in a few days I discovered the issue of files with malformed text in wrong encoding. However, I can’t say for sure if the BSOD caused this issue, or maybe it already existed for some short time before this. I have a lot of unsaved files and I don’t open all of them daily to be sure when changes occur.

                @PeterJones said in BUG: N++ does not keep in UTF8 unsaved open files:

                Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that

                Yes, is ON. And most frequently I have mixed content in documents, both Cyrillic + Latin. But I expected that UTF-8 should preserve intact all the file content…
                I’ll disable autodetection.

                @Coises said in BUG: N++ does not keep in UTF8 unsaved open files:

                One possibility might be if new files are set to open as UTF-8 but “Apply to opened ANSI files” is not checked

                Indeed is so. In “Settings - > New Document - > Encoding - > UTF-8”, should I tick the option “Apply to opened ANSI files”?
                Not sure if I had it checked over the years, or if the setting was changed recently.
                About the file content, I’m pretty sure that the file with noticed problem was unchaged for very long time and it already contained both Cyrillic + Latin, and it was in UTF-8 over the time.
                I always keep all my files in UTF-8, and never change the encoding to another. In this case something happened and a number of files were changed from UTF-8 to Cyrillic - > Macintosh, without a valid reason. Perhaps it was due to those BSODs

                1 Reply Last reply Reply Quote 0
                • dz15mlruD
                  dz15mlru
                  last edited by dz15mlru

                  Thanks you guys for your support.

                  So, I’ll try to 1) disable the “autodetection of character encoding”, and 2) to check the option “Apply to opened ANSI files"

                  Also, just now it arrived one fresh update from N++.
                  I will perfom this update as well. And I’ll monitor after this if the problem appears again.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors