Community
    • Login

    Encoding of files with ASCII only

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 6.1k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Karl KarlserK Offline
      Karl Karlser
      last edited by

      Hello,

      when I open a textfile that only contains ASCII characters Notepad++ shows encoding as UTF8 w/o BOM.
      If I add a non ASCII sign like §, it shows encoding ANSI, which is what I actually defined for this file.

      Is this normal or is my file really UTF8 encoding of some sort?

      EkopalypseE 1 Reply Last reply Reply Quote 0
      • EkopalypseE Offline
        Ekopalypse @Karl Karlser
        last edited by

        @Karl-Karlser

        ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

        rdipardoR 1 Reply Last reply Reply Quote 0
        • rdipardoR Offline
          rdipardo @Ekopalypse
          last edited by

          ASCII is a subset of many encodings such as utf8, ansi, etc., so there is no way to figure out which encoding was intended

          All you need is a hex viewer (*1). “ASCII” is a general term for any variety of single-byte encoding, so expect to see a 1:1 correspondence between characters and bytes:

          ansi.png

          § is included in many single-byte encodings, like the default OEM code page on Windows PCs. Go to ? > Debug Info... and check the Current ANSI codepage. If the number is 1252, then § is a valid “ASCII” character. Or just type this into a Python REPL:

          print('§'.encode('cp1252'))
          

          The output will be the single byte b'\xa7'.

          If the file is truly UTF-8, then § (and only § ) will occupy multiple bytes:

          utf8.png

          Or, at the Python REPL:

          print('§'.encode('utf8'))
          # => b'\xc2\xa7'
          

          (*1) I used the HexEdit plugin.

          EkopalypseE 1 Reply Last reply Reply Quote 0
          • EkopalypseE Offline
            Ekopalypse @rdipardo
            last edited by

            @rdipardo

            If I understood the question correctly, OP implicitly asked if there is a way to report the encoded file as, in his case, ANSI if it contains only ASCII characters. Based on my previous statement, this is not possible. Even if I use a hex editor, there is no way to tell if I wanted to use the file as ANSI or as some other encoding with ASCII characters as a subset. If I misunderstood the question, sorry.

            1 Reply Last reply Reply Quote 3

            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

            With your input, this post could be even better 💗

            Register Login
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors