Invisible characters



  • Hello,

    I received a CSV file with an invisible character “PM” (when show all characters turned on). File format is Turkish Ansi.
    7909d3d5-30dd-4584-b4ed-81f9751a03b4-image.png

    A hex editor display this character as 0x9E
    f5efa5ca-9826-4455-8557-2fc6fdddda1a-image.png

    I did check http://www.december.com/html/spec/ascii.html and could not understand what it is. Is it any common character for some sort of?

    Thanks & regards,
    Ertan



  • Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

    Narrowing to the sub-list of characters "ZYTİNYAО TABAĞI"

    Notepad++ lists four Turkish encodings:

    • ISO 8859-3: this has a different byte for İ, so I’m assuming that’s not what you meant.
    • ISO 8859-9: this looks compatible on the bytes I checked, but has nothing defined in positions 0x80-0x9F, so 0x9D would not be a valid character in that encoding
    • OEM-857: this doesn’t match most of your characters, but does put Ş at hex position 0x9E
    • Windows-1254: this looks compatible on the bytes I checked, but has nothing defined in 0x9D and 0x9E

    Ş is at 0x9E in OEM-857, but nothing else matches with that one.
    Ş is at 0xDE in Win-1254 and ISO 8859-9

    It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards); or it could be that the program/generator mixed up the 857 codepage Ş with the 1254 encoding of Ş. Or it could be a transmission error.

    Or my analysis could be completely wrong: I am not an encoding expert, and definitely not a Turkish-encoding expert. I just looked up the various encodings and compared to your listed hexdump.



  • @PeterJones said in Invisible characters:

    Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

    Probably I mistakenly overlooked at it.

    @PeterJones said in Invisible characters:

    Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

    • ISO 8859-3: this has a different byte for İ, so I’m assuming that’s not what you meant.

    That should be it. You should read it as I and most likely it is lowercase ı (without dot at top)

    @PeterJones said in Invisible characters:

    It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards);

    And this is most likely correct as I suspect data is taken from an Oracle database of some kind, input into FirebirdSQL database where I am provided the CSV from.

    Thank you!



  • Oh, I also meant to say (but got distracted – ooh, squirrel!) that Edit > Character Panel will give the 255 character codes for 8-bit encodings; if you change from one encoding to another, the Character Panel will correctly update to match. It lists both decimal and hexadecimal character codes, along with the character at that point. If you double-click on the character, it will insert that character in the active editor (be warned: if you double-click on the hex value, it will “helpfully” type the hex value for you in the editor)



  • @PeterJones I didn’t know about Character Panel. Thanks for mentioning about it, too.


Log in to reply