• Login
Community
  • Login

Encoding of files with ASCII only

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 3 Posters 4.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K
    Karl Karlser
    last edited by Jan 9, 2023, 8:07 AM

    Hello,

    when I open a textfile that only contains ASCII characters Notepad++ shows encoding as UTF8 w/o BOM.
    If I add a non ASCII sign like §, it shows encoding ANSI, which is what I actually defined for this file.

    Is this normal or is my file really UTF8 encoding of some sort?

    E 1 Reply Last reply Jan 9, 2023, 9:35 AM Reply Quote 0
    • E
      Ekopalypse @Karl Karlser
      last edited by Jan 9, 2023, 9:35 AM

      @Karl-Karlser

      ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

      R 1 Reply Last reply Jan 9, 2023, 10:58 AM Reply Quote 0
      • R
        rdipardo @Ekopalypse
        last edited by Jan 9, 2023, 10:58 AM

        ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

        All you need is a hex viewer (*1). “ASCII” is a general term for any variety of single-byte encoding, so expect to see a 1:1 correspondence between characters and bytes:

        ansi.png

        § is included in many single-byte encodings, like the default OEM code page on Windows PCs. Go to ? > Debug Info... and check the Current ANSI codepage. If the number is 1252, then § is a valid “ASCII” character. Or just type this into a Python REPL:

        print('§'.encode('cp1252'))
        

        The output will be the single byte b'\xa7'.

        If the file is truly UTF-8, then § (and only § ) will occupy multiple bytes:

        utf8.png

        Or, at the Python REPL:

        print('§'.encode('utf8'))
        # => b'\xc2\xa7'
        

        (*1) I used the HexEdit plugin.

        E 1 Reply Last reply Jan 9, 2023, 11:07 AM Reply Quote 0
        • E
          Ekopalypse @rdipardo
          last edited by Jan 9, 2023, 11:07 AM

          @rdipardo

          If I understood the question correctly, OP implicitly asked if there is a way to report the encoded file as, in his case, ANSI if it contains only ASCII characters. Based on my previous statement, this is not possible. Even if I use a hex editor, there is no way to tell if I wanted to use the file as ANSI or as some other encoding with ASCII characters as a subset. If I misunderstood the question, sorry.

          1 Reply Last reply Reply Quote 3
          4 out of 4
          • First post
            4/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors