Community
    • Login

    Every time I start notepad++, the encoding of some files will be changed

    Scheduled Pinned Locked Moved General Discussion
    4 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Freya SmithF
      Freya Smith
      last edited by Freya Smith

      Every time I start notepad++, the encoding of some files will be changed, which causes me great trouble in using it.
      My encoding settings are as follows:
      p402.jpg

      p301.jpg

      EX: But it was still From utf8 to ansi

      p403.jpg

      p404.jpg

      Notepad++ v8.7 (64-bit)

      CoisesC 1 Reply Last reply Reply Quote 0
      • CoisesC
        Coises @Freya Smith
        last edited by Coises

        @Freya-Smith:
        Have you tried checking UTF-8 | Apply to opened ANSI files in the New document screen?

        I know it doesn’t sound like it should be relevant, but I think it will help.

        In Windows, the encoding of a file is not stored anywhere. Since “ANSI” means the default non-Unicode interpretation of a file in the current locale, there is no such thing as a file that can’t be ANSI. There is no way to look at a UTF-8 file and tell for certain that it is UTF-8 and not ANSI. (However, the presence of a byte order mark at the beginning of an ANSI file is so unlikely that it is assumed to indicate Unicode.)

        The reverse is sometimes possible. Some ANSI files contain character sequences which cannot be UTF-8. Checking that box will cause files that cannot be UTF-8 to be opened as ANSI; files that could be either will be opened as UTF-8.

        Alan KilbornA 1 Reply Last reply Reply Quote 2
        • Alan KilbornA
          Alan Kilborn @Coises
          last edited by

          @Coises said in Every time I start notepad++, the encoding of some files will be changed:

          Apply to opened ANSI files

          My recollection of what this checkbox (when checkmarked) does is:

          • if a file has no content (it’s 0 bytes on disk), open it as UTF-8
          • if a file’s entire content is “7-bit ASCII” (no bytes with highest bit set), open it as UTF-8

          This “recollection” was found in some notes I had.

          The USER MANUAL is “light” on detail on this feature, saying only “If you open an ANSI file, this allows it to be “upgraded” to UTF-8.”

          CoisesC 1 Reply Last reply Reply Quote 3
          • CoisesC
            Coises @Alan Kilborn
            last edited by Coises

            @Alan-Kilborn said in Every time I start notepad++, the encoding of some files will be changed:

            @Coises said in Every time I start notepad++, the encoding of some files will be changed:

            Apply to opened ANSI files

            My recollection of what this checkbox (when checkmarked) does is:

            • if a file has no content (it’s 0 bytes on disk), open it as UTF-8
            • if a file’s entire content is “7-bit ASCII” (no bytes with highest bit set), open it as UTF-8

            This “recollection” was found in some notes I had.

            After doing my best to follow the code, I believe you are correct. The relevant routines appear to be:

            FileManager::setLoadedBufferEncodingAndEol
            and
            Utf8_16_Read::utf8_7bits_8bits

            which appear to come into play when there is no byte order mark and the file is not HTML or XML with a detected character set specification. First, utf8_7bits_8bits decides that if a file contains a null, it’s 8-bit ANSI; if it contains only bytes from 1-127, it’s 7 bit ANSI; otherwise, if it contains only character sequences that are legal UTF-8, it’s UTF-8; otherwise, it’s 8-bit ANSI. Then setLoadedBufferEncodingAndEol uses the New Document | UTF-8 | Apply to opened ANSI files to determine whether existing files that are empty or contain 7-bit ANSI should be opened as UTF-8.

            It looks like MISC | Autodetect character encoding tries to detect ANSI codepages that are not the default (corresponding to an Encoding | Character sets submenu selection, rather than Encoding | ANSI), but I haven’t attempted to follow that all the way through. I’m not sure where that fits into the sequence of decisions and how it interacts with the Apply to opened ANSI files setting.

            1 Reply Last reply Reply Quote 4
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors