• Login
Community
  • Login

Invisible characters

Scheduled Pinned Locked Moved General Discussion
5 Posts 2 Posters 979 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    Ertan Küçükoglu
    last edited by Feb 19, 2020, 3:21 PM

    Hello,

    I received a CSV file with an invisible character “PM” (when show all characters turned on). File format is Turkish Ansi.
    7909d3d5-30dd-4584-b4ed-81f9751a03b4-image.png

    A hex editor display this character as 0x9E
    f5efa5ca-9826-4455-8557-2fc6fdddda1a-image.png

    I did check http://www.december.com/html/spec/ascii.html and could not understand what it is. Is it any common character for some sort of?

    Thanks & regards,
    Ertan

    1 Reply Last reply Reply Quote 0
    • P
      PeterJones
      last edited by Feb 19, 2020, 4:09 PM

      Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

      Narrowing to the sub-list of characters "ZYTİNYAО TABAĞI"

      Notepad++ lists four Turkish encodings:

      • ISO 8859-3 : this has a different byte for İ, so I’m assuming that’s not what you meant.
      • ISO 8859-9 : this looks compatible on the bytes I checked, but has nothing defined in positions 0x80-0x9F, so 0x9D would not be a valid character in that encoding
      • OEM-857 : this doesn’t match most of your characters, but does put Ş at hex position 0x9E
      • Windows-1254 : this looks compatible on the bytes I checked, but has nothing defined in 0x9D and 0x9E

      Ş is at 0x9E in OEM-857, but nothing else matches with that one.
      Ş is at 0xDE in Win-1254 and ISO 8859-9

      It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards); or it could be that the program/generator mixed up the 857 codepage Ş with the 1254 encoding of Ş. Or it could be a transmission error.

      Or my analysis could be completely wrong: I am not an encoding expert, and definitely not a Turkish-encoding expert. I just looked up the various encodings and compared to your listed hexdump.

      E 1 Reply Last reply Feb 19, 2020, 5:24 PM Reply Quote 2
      • E
        Ertan Küçükoglu @PeterJones
        last edited by Ertan Küçükoglu Feb 19, 2020, 5:24 PM Feb 19, 2020, 5:24 PM

        @PeterJones said in Invisible characters:

        Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

        Probably I mistakenly overlooked at it.

        @PeterJones said in Invisible characters:

        Why are you looking at a 7bit ASCII chart for a specific 8bit Turkish “ANSI” encoding?

        • ISO 8859-3 : this has a different byte for İ, so I’m assuming that’s not what you meant.

        That should be it. You should read it as I and most likely it is lowercase ı (without dot at top)

        @PeterJones said in Invisible characters:

        It may be that the program or person that generated that unknown character was using a “standard-encoding-plus-extra” to try to get more characters (sometimes “unused” slots are filled with custom characters in certain applications or derived standards);

        And this is most likely correct as I suspect data is taken from an Oracle database of some kind, input into FirebirdSQL database where I am provided the CSV from.

        Thank you!

        1 Reply Last reply Reply Quote 1
        • P
          PeterJones
          last edited by Feb 19, 2020, 5:24 PM

          Oh, I also meant to say (but got distracted – ooh, squirrel!) that Edit > Character Panel will give the 255 character codes for 8-bit encodings; if you change from one encoding to another, the Character Panel will correctly update to match. It lists both decimal and hexadecimal character codes, along with the character at that point. If you double-click on the character, it will insert that character in the active editor (be warned: if you double-click on the hex value, it will “helpfully” type the hex value for you in the editor)

          E 1 Reply Last reply Feb 19, 2020, 5:41 PM Reply Quote 2
          • E
            Ertan Küçükoglu @PeterJones
            last edited by Feb 19, 2020, 5:41 PM

            @PeterJones I didn’t know about Character Panel. Thanks for mentioning about it, too.

            1 Reply Last reply Reply Quote 0
            3 out of 5
            • First post
              3/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors