Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Invisible characters

    General Discussion
    2
    5
    152
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ertan Küçükoglu
      Ertan Küçükoglu last edited by

      Hello,

      I received a CSV file with an invisible character “PM” (when show all characters turned on). File format is Turkish Ansi.
      7909d3d5-30dd-4584-b4ed-81f9751a03b4-image.png

      A hex editor display this character as 0x9E
      f5efa5ca-9826-4455-8557-2fc6fdddda1a-image.png

      I did check http://www.december.com/html/spec/ascii.html and could not understand what it is. Is it any common character for some sort of?

      Thanks & regards,
      Ertan

      1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones last edited by

        Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

        Narrowing to the sub-list of characters "ZYTİNYAО TABAĞI"

        Notepad++ lists four Turkish encodings:

        • ISO 8859-3: this has a different byte for İ, so I’m assuming that’s not what you meant.
        • ISO 8859-9: this looks compatible on the bytes I checked, but has nothing defined in positions 0x80-0x9F, so 0x9D would not be a valid character in that encoding
        • OEM-857: this doesn’t match most of your characters, but does put Ş at hex position 0x9E
        • Windows-1254: this looks compatible on the bytes I checked, but has nothing defined in 0x9D and 0x9E

        Ş is at 0x9E in OEM-857, but nothing else matches with that one.
        Ş is at 0xDE in Win-1254 and ISO 8859-9

        It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards); or it could be that the program/generator mixed up the 857 codepage Ş with the 1254 encoding of Ş. Or it could be a transmission error.

        Or my analysis could be completely wrong: I am not an encoding expert, and definitely not a Turkish-encoding expert. I just looked up the various encodings and compared to your listed hexdump.

        Ertan Küçükoglu 1 Reply Last reply Reply Quote 2
        • Ertan Küçükoglu
          Ertan Küçükoglu @PeterJones last edited by Ertan Küçükoglu

          @PeterJones said in Invisible characters:

          Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

          Probably I mistakenly overlooked at it.

          @PeterJones said in Invisible characters:

          Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

          • ISO 8859-3: this has a different byte for İ, so I’m assuming that’s not what you meant.

          That should be it. You should read it as I and most likely it is lowercase ı (without dot at top)

          @PeterJones said in Invisible characters:

          It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards);

          And this is most likely correct as I suspect data is taken from an Oracle database of some kind, input into FirebirdSQL database where I am provided the CSV from.

          Thank you!

          1 Reply Last reply Reply Quote 1
          • PeterJones
            PeterJones last edited by

            Oh, I also meant to say (but got distracted – ooh, squirrel!) that Edit > Character Panel will give the 255 character codes for 8-bit encodings; if you change from one encoding to another, the Character Panel will correctly update to match. It lists both decimal and hexadecimal character codes, along with the character at that point. If you double-click on the character, it will insert that character in the active editor (be warned: if you double-click on the hex value, it will “helpfully” type the hex value for you in the editor)

            Ertan Küçükoglu 1 Reply Last reply Reply Quote 2
            • Ertan Küçükoglu
              Ertan Küçükoglu @PeterJones last edited by

              @PeterJones I didn’t know about Character Panel. Thanks for mentioning about it, too.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright © 2014 NodeBB Forums | Contributors