• Login
Community
  • Login

Mac encoding

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
14 Posts 10 Posters 17.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Jan 7, 2016, 8:29 PM Jan 7, 2016, 8:19 PM

    So, Valerio,

    I slightly improve the above table, by notifying the corresponding Windows-1252 hex code of the character ( For instance, the Mac OS Roman hex value 80 represents the Ä character, which must be replaced with the hexa code \xC4), in order to be correctly displayed, in a document, with an ANSI or Windows-1252 encoding. )

    Note that some characters, displayed in MAC OS Roman encoding, DON’T exist, in Windows-1252 encoding. These are the characters :

    [\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]

    For these characters, in the fourth column, the mention NO has been added and NO corresponding Hex W-1252 value is indicated in the fifth column


    I found out an awful, but correct, regex, which converts a MAC OS Roman text in a Windows-1252 text. Basically, this regex find two types of characters :

    • Any character code, of the form \xnn, is changed in its corresponding Windows-1252 code, in order to get the same character glyph. For instance, the hexa code (\x80) ( group 1 ), is replaced with \xc4, thanks to the replacement form (?1\xC4)

    • Any character code, from the list ([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]) ( last group 104 ), which DON’T have any corresponding code, in Windows-1252 encoding, are replaced with the usual question mark character ?, of hexa code \x3F

    Of course, any character of code < \x80, is NOT changed, at all !

    Note that the (?-i) form, at the beginning of the search regex, forces the regex engine to take case in account ( NON insensitive ), even you didn’t check the match case option


    So, Valerio, follow the few steps, below :

    • Open Notepad++

    • Open a new document ( CTRL + N )

    • If necessary, choose the ANSI encoding ( Menu option Encoding - Convert to ANSI )

    • Copy your MAC OS Roman text, in this new document

    –> Well, your text should, still, miss some accentuated characters !

    • Move back to the very beginning of the file ( CTRL + Origin )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression search mode

    • In the Find what field, type the regex, below :

    (?-i)(\x80)|(\x81)|(\x82)|(\x83)|(\x84)|(\x85)|(\x86)|(\x87)|(\x88)|(\x89)|(\x8A)|(\x8B)|(\x8C)|(\x8D)|(\x8E)|(\x8F)|(\x90)|(\x91)|(\x92)|(\x93)|(\x94)|(\x95)|(\x96)|(\x97)|(\x98)|(\x99)|(\x9A)|(\x9B)|(\x9C)|(\x9D)|(\x9E)|(\x9F)|(\xA0)|(\xA1)|(\xA2)|(\xA3)|(\xA4)|(\xA5)|(\xA6)|(\xA7)|(\xA8)|(\xA9)|(\xAA)|(\xAB)|(\xAC)|(\xAE)|(\xAF)|(\xB1)|(\xB4)|(\xB5)|(\xBB)|(\xBC)|(\xBE)|(\xBF)|(\xC0)|(\xC1)|(\xC2)|(\xC4)|(\xC7)|(\xC8)|(\xC9)|(\xCA)|(\xCB)|(\xCC)|(\xCD)|(\xCE)|(\xCF)|(\xD0)|(\xD1)|(\xD2)|(\xD3)|(\xD4)|(\xD5)|(\xD6)|(\xD8)|(\xD9)|(\xDB)|(\xDC)|(\xDD)|(\xE0)|(\xE1)|(\xE2)|(\xE3)|(\xE4)|(\xE5)|(\xE6)|(\xE7)|(\xE8)|(\xE9)|(\xEA)|(\xEB)|(\xEC)|(\xED)|(\xEE)|(\xEF)|(\xF1)|(\xF2)|(\xF3)|(\xF4)|(\xF6)|(\xF7)|(\xF8)|(\xFC)|([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])

    • In the Replace with field, type the regex, below :

    (?1\xC4)(?2\xC5)(?3\xC7)(?4\xC9)(?5\xD1)(?6\xD6)(?7\xDC)(?8\xE1)(?9\xE0)(?10\xE2)(?11\xE4)(?12\xE3)(?13\xE5)(?14\xE7)(?15\xE9)(?16\xE8)(?17\xEA)(?18\xEB)(?19\xED)(?20\xEC)(?21\xEE)(?22\xEF)(?23\xF1)(?24\xF3)(?25\xF2)(?26\xF4)(?27\xF6)(?28\xF5)(?29\xFA)(?30\xF9)(?31\xFB)(?32\xFC)(?33\x86)(?34\xB0)(?35\xA2)(?36\xA3)(?37\xA7)(?38\x95)(?39\xB6)(?40\xDF)(?41\xAE)(?42\xA9)(?43\x99)(?44\xB4)(?45\xA8)(?46\xC6)(?47\xD8)(?48\xB1)(?49\xA5)(?50\xB5)(?51\xAA)(?52\xBA)(?53\xE6)(?54\xF8)(?55\xBF)(?56\xA1)(?57\xAC)(?58\x83)(?59\xAB)(?60\xBB)(?61\x85)(?62\xA0)(?63\xC0)(?64\xC3)(?65\xD5)(?66\x8C)(?67\x9C)(?68\x96)(?69\x97)(?70\x93)(?71\x94)(?72\x91)(?73\x92)(?74\xF7)(?75\xFF)(?76\x9F)(?77\x80)(?78\x8B)(?79\x9B)(?80\x87)(?81\xB7)(?82\x82)(?83\x84)(?84\x89)(?85\xC2)(?86\xCA)(?87\xC1)(?88\xCB)(?89\xC8)(?90\xCD)(?91\xCE)(?92\xCF)(?93\xCC)(?94\xD3)(?95\xD4)(?96\xD2)(?97\xDA)(?98\xDB)(?99\xD9)(?{100}\x88)(?{101}\x98)(?{102}\xAF)(?{103}\xB8)(?{104}\x3F)

    • Click on the Replace All button

    Et voilà ! This time, after that S/R, the text should be correctly displayed :-)) Then :

    • Select the Menu option Encoding - Convert to UTF-8 OR Encoding - Convert to UTF-8 BOM

    • Finally, save this changed file !

    Best Regards,

    guy038

    P.S. :

    BTW, Claudia, if you see that post, I saw a Python script, called, mac_roman.py, in the folder …\Plugins\PythonScript\lib\encodings. Unfortunately, I couldn’t make it work. I suppose that it changes a MAC Roman text in a standard UTF-8 text !

    C 1 Reply Last reply Jan 7, 2016, 9:47 PM Reply Quote 0
    • C
      Claudia Frank @guy038
      last edited by Jan 7, 2016, 9:47 PM

      Hello guy038,

      this file isn’t supposed to be used directly. It’s part of the codecs module which uses it internally
      when you specify its codec. E.g.

      import codecs
      
      with codecs.open(r'd:\macroman.txt', 'r', encoding='macroman') as fin:
          file_content = fin.read()
      
      with codecs.open(r'd:\macroman_utf8.txt', 'w', encoding='utf-8') as fout:
          fout.write(file_content)
      

      First with block opens a file which is assumed to be macroman encoded, reads it, saves it in variable file_content and
      closes file automatically.

      Next with block writes the file with utf-8 encoding and, again, closes automatically.

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • V
        Valerio J
        last edited by Jan 8, 2016, 8:25 PM

        Thank you very much, guy038.
        I’m sure that’d do the trick. I’m just surprised we don’t have an option for Mac Roman right in the menu. Do you think there’s a reason for that? It’s just another encoding method, right? Is it fundamentally different from any of the iso-8859-x?

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Jan 8, 2016, 9:42 PM Jan 8, 2016, 9:41 PM

          Hi, Valerio,

          Do you think there’s a reason for that?

          May be, it’s just because Notepad++ is rather “Windows oriented” !

          Is it fundamentally different from any of the iso-8859-x?

          Not at all. It’s just an other encoding, as all the others !


          You may, also, ask for adding the MAC Roman encoding, in N++, at the address, below :

          https://github.com/notepad-plus-plus/notepad-plus-plus/pulls

          However, as you can see, you have to be ( very ) patient :-(( There are plenty of requests, in that place !!

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 0
          • A
            Ali Abdelzaher
            last edited by Oct 21, 2016, 11:30 PM

            Thanks for sharing such amazing info
            مجلة رقيقة

            1 Reply Last reply Reply Quote 0
            • H
              Hamza Hamza
              last edited by Nov 26, 2016, 1:14 AM

              Thanks for sharing
              فوائد الليمون

              1 Reply Last reply Reply Quote 0
              • C
                chcg
                last edited by Nov 26, 2016, 1:27 PM

                At least there is an codepage on windows Code Page 10000 Macintosh Roman:
                https://msdn.microsoft.com/en-us/library/cc195076.aspx
                so adding support should be not to complicated.

                1 Reply Last reply Reply Quote 0
                • R
                  Ryan Webber
                  last edited by Mar 1, 2017, 8:48 PM

                  I’m trying to find a way to convert to macRoman. (needed for embedding subtitles into quicktime videos)
                  I am trying to use the python scripting option.
                  using:

                  import codecs
                  with codecs.open(r’d:\utf8.txt’, ‘r’, encoding=‘utf-8’) as fin:
                  file_content = fin.read()
                  with codecs.open(r’d:\macroman.txt’, ‘w’, encoding=‘macroman’) as fout:
                  fout.write(file_content)

                  I end up with macroman.txt encoded as ANSI, and empty.

                  Any help here would be much appreciated.

                  C 1 Reply Last reply Mar 1, 2017, 8:55 PM Reply Quote 0
                  • C
                    Claudia Frank @Ryan Webber
                    last edited by Mar 1, 2017, 8:55 PM

                    @Ryan-Webber

                    what about printing file_content to the python script console using

                    console.write(file_content)

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • Aziz AbdelA
                      Aziz Abdel
                      last edited by May 24, 2017, 8:32 PM

                      Thanks for sharing such amazing info
                      sihtk.com
                      كل ما يهمك سيدتي

                      1 Reply Last reply Reply Quote 0
                      • J. JonesJ
                        J. Jones
                        last edited by Nov 17, 2017, 6:11 AM

                        I did find an Encoding option for it, but it’s buried where you wouldn’t expect!

                        Encoding > Character Sets > Cyrillic > Macintosh

                        Presto, all the Õ symbols become ’ symbols the way they were typed on the Mac!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors