Community
    • Login

    Encoding of files with ASCII only

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Karl KarlserK
      Karl Karlser
      last edited by

      Hello,

      when I open a textfile that only contains ASCII characters Notepad++ shows encoding as UTF8 w/o BOM.
      If I add a non ASCII sign like §, it shows encoding ANSI, which is what I actually defined for this file.

      Is this normal or is my file really UTF8 encoding of some sort?

      EkopalypseE 1 Reply Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse @Karl Karlser
        last edited by

        @Karl-Karlser

        ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

        rdipardoR 1 Reply Last reply Reply Quote 0
        • rdipardoR
          rdipardo @Ekopalypse
          last edited by

          ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

          All you need is a hex viewer (*1). “ASCII” is a general term for any variety of single-byte encoding, so expect to see a 1:1 correspondence between characters and bytes:

          ansi.png

          § is included in many single-byte encodings, like the default OEM code page on Windows PCs. Go to ? > Debug Info... and check the Current ANSI codepage. If the number is 1252, then § is a valid “ASCII” character. Or just type this into a Python REPL:

          print('§'.encode('cp1252'))
          

          The output will be the single byte b'\xa7'.

          If the file is truly UTF-8, then § (and only § ) will occupy multiple bytes:

          utf8.png

          Or, at the Python REPL:

          print('§'.encode('utf8'))
          # => b'\xc2\xa7'
          

          (*1) I used the HexEdit plugin.

          EkopalypseE 1 Reply Last reply Reply Quote 0
          • EkopalypseE
            Ekopalypse @rdipardo
            last edited by

            @rdipardo

            If I understood the question correctly, OP implicitly asked if there is a way to report the encoded file as, in his case, ANSI if it contains only ASCII characters. Based on my previous statement, this is not possible. Even if I use a hex editor, there is no way to tell if I wanted to use the file as ANSI or as some other encoding with ASCII characters as a subset. If I misunderstood the question, sorry.

            1 Reply Last reply Reply Quote 3
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors