Mac encoding



  • Hi everyone.

    A friend sent me a plain text document he created on a Mac. We’re both Italian, so naturally there were a lot of è, é, ì characters involved, which i cannot see correctly on his file. By inspecting the document with a hex editor, I’ve come to the conclusion he must be using this encoding:
    https://en.wikipedia.org/wiki/Mac_OS_Roman
    Normally I would try and encode the document by it and then, perhaps, convert it to UTF-8 so I can work on it more portably. But I can’t find an entry for Mac OS Roman in the encodings list.
    Is it completely unsupported?
    Thanks.



  • thanks for this post :D
    ما يهمك سيدتي



  • Hello Valerio,

    As you’re Italian, you probably use, as default ANSI encoding, the Windows-1252 encoding. Refer the link below :

    https://msdn.microsoft.com/en-us/goglobal/cc305145

    You may verify that I’m not mistaken, by opening the Character Panel ( Menu option Edit - Character Panel ). The list displayed should be the same as the Microsoft table, above !

    Here is below, a table of the MAC OS Roman encoding, for characters over \x7F only. Remember that characters, with code-point < \x80, are always identical, in any Windows, OEM, ISO or UTF-8 encoding.

    •-------------------------------------------------------------------------------------------------•
    |                        MAC OS Roman Encoding  ( Windows Code Page 10000 )                       |
    •--------------•-------•------------------•--------•----------------------------------------------•
    | MAC OS Roman | Char. |   Windows-1252   |  UNI-  |                   UNICODE                    |
    |--------------|       |------------------•  CODE  |                                              |
    | Hexa | Deci. | Glyph | Encoded |  Hexa  | Value  |                Character Name                |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  80  |  128  |   Ä   |         |   C4   |  00C4  |  LATIN CAPITAL LETTER A WITH DIAERESIS       |
    |  81  |  129  |   Å   |         |   C5   |  00C5  |  LATIN CAPITAL LETTER A WITH RING ABOVE      |
    |  82  |  130  |   Ç   |         |   C7   |  00C7  |  LATIN CAPITAL LETTER C WITH CEDILLA         |
    |  83  |  131  |   É   |         |   C9   |  00C9  |  LATIN CAPITAL LETTER E WITH ACUTE           |
    |  84  |  132  |   Ñ   |         |   D1   |  00D1  |  LATIN CAPITAL LETTER N WITH TILDE           |
    |  85  |  133  |   Ö   |         |   D6   |  00D6  |  LATIN CAPITAL LETTER O WITH DIAERESIS       |
    |  86  |  134  |   Ü   |         |   DC   |  00DC  |  LATIN CAPITAL LETTER U WITH DIAERESIS       |
    |  87  |  135  |   á   |         |   E1   |  00E1  |  LATIN SMALL LETTER A WITH ACUTE             |
    |  88  |  136  |   à   |         |   E0   |  00E0  |  LATIN SMALL LETTER A WITH GRAVE             |
    |  89  |  137  |   â   |         |   E2   |  00E2  |  LATIN SMALL LETTER A WITH CIRCUMFLEX        |
    |  8A  |  138  |   ä   |         |   E4   |  00E4  |  LATIN SMALL LETTER A WITH DIAERESIS         |
    |  8B  |  139  |   ã   |         |   E3   |  00E3  |  LATIN SMALL LETTER A WITH TILDE             |
    |  8C  |  140  |   å   |         |   E5   |  00E5  |  LATIN SMALL LETTER A WITH RING ABOVE        |
    |  8D  |  141  |   ç   |         |   E7   |  00E7  |  LATIN SMALL LETTER C WITH CEDILLA           |
    |  8E  |  142  |   é   |         |   E9   |  00E9  |  LATIN SMALL LETTER E WITH ACUTE             |
    |  8F  |  143  |   è   |         |   E8   |  00E8  |  LATIN SMALL LETTER E WITH GRAVE             |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  90  |  144  |   ê   |         |   EA   |  00EA  |  LATIN SMALL LETTER E WITH CIRCUMFLEX        |
    |  91  |  145  |   ë   |         |   EB   |  00EB  |  LATIN SMALL LETTER E WITH DIAERESIS         |
    |  92  |  146  |   í   |         |   ED   |  00ED  |  LATIN SMALL LETTER I WITH ACUTE             |
    |  93  |  147  |   ì   |         |   EC   |  00EC  |  LATIN SMALL LETTER I WITH GRAVE             |
    |  94  |  148  |   î   |         |   EE   |  00EE  |  LATIN SMALL LETTER I WITH CIRCUMFLEX        |
    |  95  |  149  |   ï   |         |   EF   |  00EF  |  LATIN SMALL LETTER I WITH DIAERESIS         |
    |  96  |  150  |   ñ   |         |   F1   |  00F1  |  LATIN SMALL LETTER N WITH TILDE             |
    |  97  |  151  |   ó   |         |   F3   |  00F3  |  LATIN SMALL LETTER O WITH ACUTE             |
    |  98  |  152  |   ò   |         |   F2   |  00F2  |  LATIN SMALL LETTER O WITH GRAVE             |
    |  99  |  153  |   ô   |         |   F4   |  00F4  |  LATIN SMALL LETTER O WITH CIRCUMFLEX        |
    |  9A  |  154  |   ö   |         |   F6   |  00F6  |  LATIN SMALL LETTER O WITH DIAERESIS         |
    |  9B  |  155  |   õ   |         |   F5   |  00F5  |  LATIN SMALL LETTER O WITH TILDE             |
    |  9C  |  156  |   ú   |         |   FA   |  00FA  |  LATIN SMALL LETTER U WITH ACUTE             |
    |  9D  |  157  |   ù   |         |   F9   |  00F9  |  LATIN SMALL LETTER U WITH GRAVE             |
    |  9E  |  158  |   û   |         |   FB   |  00FB  |  LATIN SMALL LETTER U WITH CIRCUMFLEX        |
    |  9F  |  159  |   ü   |         |   FC   |  00FC  |  LATIN SMALL LETTER U WITH DIAERESIS         |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  A0  |  160  |   †   |         |   86   |  2020  |  DAGGER                                      |
    |  A1  |  161  |   °   |         |   B0   |  00B0  |  DEGREE SIGN                                 |
    |  A2  |  162  |   ¢   |         |   A2   |  00A2  |  CENT SIGN                                   |
    |  A3  |  163  |   £   |         |   A3   |  00A3  |  POUND SIGN                                  |
    |  A4  |  164  |   §   |         |   A7   |  00A7  |  SECTION SIGN                                |
    |  A5  |  165  |   •   |         |   95   |  2022  |  BULLET                                      |
    |  A6  |  166  |   ¶   |         |   B6   |  00B6  |  PILCROW SIGN                                |
    |  A7  |  167  |   ß   |         |   DF   |  00DF  |  LATIN SMALL LETTER SHARP S                  |
    |  A8  |  168  |   ®   |         |   AE   |  00AE  |  REGISTERED SIGN                             |
    |  A9  |  169  |   ©   |         |   A9   |  00A9  |  COPYRIGHT SIGN                              |
    |  AA  |  170  |   ™   |         |   99   |  2122  |  TRADE MARK SIGN                             |
    |  AB  |  171  |   ´   |         |   B4   |  00B4  |  ACUTE ACCENT                                |
    |  AC  |  172  |   ¨   |         |   A8   |  00A8  |  DIAERESIS                                   |
    |  AD  |  173  |   ≠   |   NO    |        |  2260  |  NOT EQUAL TO                                |
    |  AE  |  174  |   Æ   |         |   C6   |  00C6  |  LATIN CAPITAL LETTER AE                     |
    |  AF  |  175  |   Ø   |         |   D8   |  00D8  |  LATIN CAPITAL LETTER O WITH STROKE          |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  B0  |  176  |   ∞   |   NO    |        |  221E  |  INFINITY                                    |
    |  B1  |  177  |   ±   |         |   B1   |  00B1  |  PLUS-MINUS SIGN                             |
    |  B2  |  178  |   ≤   |   NO    |        |  2264  |  LESS-THAN OR EQUAL TO                       |
    |  B3  |  179  |   ≥   |   NO    |        |  2265  |  GREATER-THAN OR EQUAL TO                    |
    |  B4  |  180  |   ¥   |         |   A5   |  00A5  |  YEN SIGN                                    |
    |  B5  |  181  |   µ   |         |   B5   |  00B5  |  MICRO SIGN                                  |
    |  B6  |  182  |   ∂   |   NO    |        |  2202  |  PARTIAL DIFFERENTIAL                        |
    |  B7  |  183  |   ∑   |   NO    |        |  2211  |  N-ARY SUMMATION                             |
    |  B8  |  184  |   ∏   |   NO    |        |  220F  |  N-ARY PRODUCT                               |
    |  B9  |  185  |   π   |   NO    |        |  03C0  |  GREEK SMALL LETTER PI                       |
    |  BA  |  186  |   ∫   |   NO    |        |  222B  |  INTEGRAL                                    |
    |  BB  |  187  |   ª   |         |   AA   |  00AA  |  FEMININE ORDINAL INDICATOR                  |
    |  BC  |  188  |   º   |         |   BA   |  00BA  |  MASCULINE ORDINAL INDICATOR                 |
    |  BD  |  189  |   Ω   |   NO    |        |  03A9  |  GREEK CAPITAL LETTER OMEGA                  |
    |  BE  |  190  |   æ   |         |   E6   |  00E6  |  LATIN SMALL LETTER AE                       |
    |  BF  |  191  |   ø   |         |   F8   |  00F8  |  LATIN SMALL LETTER O WITH STROKE            |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  C0  |  192  |   ¿   |         |   BF   |  00BF  |  INVERTED QUESTION MARK                      |
    |  C1  |  193  |   ¡   |         |   A1   |  00A1  |  INVERTED EXCLAMATION MARK                   |
    |  C2  |  194  |   ¬   |         |   AC   |  00AC  |  NOT SIGN                                    |
    |  C3  |  195  |   √   |   NO    |        |  221A  |  SQUARE ROOT                                 |
    |  C4  |  196  |   ƒ   |         |   83   |  0192  |  LATIN SMALL LETTER F WITH HOOK              |
    |  C5  |  197  |   ≈   |   NO    |        |  2248  |  ALMOST EQUAL TO                             |
    |  C6  |  198  |   ∆   |   NO    |        |  2206  |  INCREMENT                                   |
    |  C7  |  199  |   «   |         |   AB   |  00AB  |  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK   |
    |  C8  |  200  |   »   |         |   BB   |  00BB  |  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK  |
    |  C9  |  201  |   …   |         |   85   |  2026  |  HORIZONTAL ELLIPSIS                         |
    |  CA  |  202  |       |         |   A0   |  00A0  |  NO-BREAK SPACE                              |
    |  CB  |  203  |   À   |         |   C0   |  00C0  |  LATIN CAPITAL LETTER A WITH GRAVE           |
    |  CC  |  204  |   Ã   |         |   C3   |  00C3  |  LATIN CAPITAL LETTER A WITH TILDE           |
    |  CD  |  205  |   Õ   |         |   D5   |  00D5  |  LATIN CAPITAL LETTER O WITH TILDE           |
    |  CE  |  206  |   Π  |         |   8C   |  0152  |  LATIN CAPITAL LIGATURE OE                   |
    |  CF  |  207  |   œ   |         |   9C   |  0153  |  LATIN SMALL LIGATURE OE                     |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  D0  |  208  |   –   |         |   96   |  2013  |  EN DASH                                     |
    |  D1  |  209  |   —   |         |   97   |  2014  |  EM DASH                                     |
    |  D2  |  210  |   “   |         |   93   |  201C  |  LEFT DOUBLE QUOTATION MARK                  |
    |  D3  |  211  |   ”   |         |   94   |  201D  |  RIGHT DOUBLE QUOTATION MARK                 |
    |  D4  |  212  |   ‘   |         |   91   |  2018  |  LEFT SINGLE QUOTATION MARK                  |
    |  D5  |  213  |   ’   |         |   92   |  2019  |  RIGHT SINGLE QUOTATION MARK                 |
    |  D6  |  214  |   ÷   |         |   F7   |  00F7  |  DIVISION SIGN                               |
    |  D7  |  215  |   ◊   |   NO    |        |  25CA  |  LOZENGE                                     |
    |  D8  |  216  |   ÿ   |         |   FF   |  00FF  |  LATIN SMALL LETTER Y WITH DIAERESIS         |
    |  D9  |  217  |   Ÿ   |         |   9F   |  0178  |  LATIN CAPITAL LETTER Y WITH DIAERESIS       |
    |  DA  |  218  |   ⁄   |   NO    |        |  2044  |  FRACTION SLASH                              |
    |  DB  |  219  |   €   |         |   80   |  20AC  |  EURO SIGN                                   |
    |  DC  |  220  |   ‹   |         |   8B   |  2039  |  SINGLE LEFT-POINTING ANGLE QUOTATION MARK   |
    |  DD  |  221  |   ›   |         |   9B   |  203A  |  SINGLE RIGHT-POINTING ANGLE QUOTATION MARK  |
    |  DE  |  222  |   fi   |   NO    |        |  FB01  |  LATIN SMALL LIGATURE FI                     |
    |  DF  |  223  |   fl   |   NO    |        |  FB02  |  LATIN SMALL LIGATURE FL                     |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  E0  |  224  |   ‡   |         |   87   |  2021  |  DOUBLE DAGGER                               |
    |  E1  |  225  |   ·   |         |   B7   |  00B7  |  MIDDLE DOT                                  |
    |  E2  |  226  |   ‚   |         |   82   |  201A  |  SINGLE LOW-9 QUOTATION MARK                 |
    |  E3  |  227  |   „   |         |   84   |  201E  |  DOUBLE LOW-9 QUOTATION MARK                 |
    |  E4  |  228  |   ‰   |         |   89   |  2030  |  PER MILLE SIGN                              |
    |  E5  |  229  |   Â   |         |   C2   |  00C2  |  LATIN CAPITAL LETTER A WITH CIRCUMFLEX      |
    |  E6  |  230  |   Ê   |         |   CA   |  00CA  |  LATIN CAPITAL LETTER E WITH CIRCUMFLEX      |
    |  E7  |  231  |   Á   |         |   C1   |  00C1  |  LATIN CAPITAL LETTER A WITH ACUTE           |
    |  E8  |  232  |   Ë   |         |   CB   |  00CB  |  LATIN CAPITAL LETTER E WITH DIAERESIS       |
    |  E9  |  233  |   È   |         |   C8   |  00C8  |  LATIN CAPITAL LETTER E WITH GRAVE           |
    |  EA  |  234  |   Í   |         |   CD   |  00CD  |  LATIN CAPITAL LETTER I WITH ACUTE           |
    |  EB  |  235  |   Î   |         |   CE   |  00CE  |  LATIN CAPITAL LETTER I WITH CIRCUMFLEX      |
    |  EC  |  236  |   Ï   |         |   CF   |  00CF  |  LATIN CAPITAL LETTER I WITH DIAERESIS       |
    |  ED  |  237  |   Ì   |         |   CC   |  00CC  |  LATIN CAPITAL LETTER I WITH GRAVE           |
    |  EE  |  238  |   Ó   |         |   D3   |  00D3  |  LATIN CAPITAL LETTER O WITH ACUTE           |
    |  EF  |  239  |   Ô   |         |   D4   |  00D4  |  LATIN CAPITAL LETTER O WITH CIRCUMFLEX      |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    |  F0  |  240  |      |   NO    |        |  F8FF  |  APPLE LOGO                                  |
    |  F1  |  241  |   Ò   |         |   D2   |  00D2  |  LATIN CAPITAL LETTER O WITH GRAVE           |
    |  F2  |  242  |   Ú   |         |   DA   |  00DA  |  LATIN CAPITAL LETTER U WITH ACUTE           |
    |  F3  |  243  |   Û   |         |   DB   |  00DB  |  LATIN CAPITAL LETTER U WITH CIRCUMFLEX      |
    |  F4  |  244  |   Ù   |         |   D9   |  00D9  |  LATIN CAPITAL LETTER U WITH GRAVE           |
    |  F5  |  245  |   ı   |   NO    |        |  0131  |  LATIN SMALL LETTER DOTLESS I                |
    |  F6  |  246  |   ˆ   |         |   88   |  02C6  |  MODIFIER LETTER CIRCUMFLEX ACCENT           |
    |  F7  |  247  |   ˜   |         |   98   |  02DC  |  SMALL TILDE                                 |
    |  F8  |  248  |   ¯   |         |   AF   |  00AF  |  MACRON                                      |
    |  F9  |  249  |   ˘   |   NO    |        |  02D8  |  BREVE                                       |
    |  FA  |  250  |   ˙   |   NO    |        |  02D9  |  DOT ABOVE                                   |
    |  FB  |  251  |   ˚   |   NO    |        |  02DA  |  RING ABOVE                                  |
    |  FC  |  252  |   ¸   |         |   B8   |  00B8  |  CEDILLA                                     |
    |  FD  |  253  |   ˝   |   NO    |        |  02DD  |  DOUBLE ACUTE ACCENT                         |
    |  FE  |  254  |   ˛   |   NO    |        |  02DB  |  OGONEK                                      |
    |  FF  |  255  |   ˇ   |   NO    |        |  02C7  |  CARON                                       |
    •------•-------•-------•---------•--------•--------•----------------------------------------------•
    

    IMPORTANT : I follow, with an other post, below, because a post can’t store more than 16384 characters !!

    guy038



  • So, Valerio,

    I slightly improve the above table, by notifying the corresponding Windows-1252 hex code of the character ( For instance, the Mac OS Roman hex value 80 represents the Ä character, which must be replaced with the hexa code \xC4), in order to be correctly displayed, in a document, with an ANSI or Windows-1252 encoding. )

    Note that some characters, displayed in MAC OS Roman encoding, DON’T exist, in Windows-1252 encoding. These are the characters :

    [\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]

    For these characters, in the fourth column, the mention NO has been added and NO corresponding Hex W-1252 value is indicated in the fifth column


    I found out an awful, but correct, regex, which converts a MAC OS Roman text in a Windows-1252 text. Basically, this regex find two types of characters :

    • Any character code, of the form \xnn, is changed in its corresponding Windows-1252 code, in order to get the same character glyph. For instance, the hexa code (\x80) ( group 1 ), is replaced with \xc4, thanks to the replacement form (?1\xC4)

    • Any character code, from the list ([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]) ( last group 104 ), which DON’T have any corresponding code, in Windows-1252 encoding, are replaced with the usual question mark character ?, of hexa code \x3F

    Of course, any character of code < \x80, is NOT changed, at all !

    Note that the (?-i) form, at the beginning of the search regex, forces the regex engine to take case in account ( NON insensitive ), even you didn’t check the match case option


    So, Valerio, follow the few steps, below :

    • Open Notepad++

    • Open a new document ( CTRL + N )

    • If necessary, choose the ANSI encoding ( Menu option Encoding - Convert to ANSI )

    • Copy your MAC OS Roman text, in this new document

    –> Well, your text should, still, miss some accentuated characters !

    • Move back to the very beginning of the file ( CTRL + Origin )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression search mode

    • In the Find what field, type the regex, below :

    (?-i)(\x80)|(\x81)|(\x82)|(\x83)|(\x84)|(\x85)|(\x86)|(\x87)|(\x88)|(\x89)|(\x8A)|(\x8B)|(\x8C)|(\x8D)|(\x8E)|(\x8F)|(\x90)|(\x91)|(\x92)|(\x93)|(\x94)|(\x95)|(\x96)|(\x97)|(\x98)|(\x99)|(\x9A)|(\x9B)|(\x9C)|(\x9D)|(\x9E)|(\x9F)|(\xA0)|(\xA1)|(\xA2)|(\xA3)|(\xA4)|(\xA5)|(\xA6)|(\xA7)|(\xA8)|(\xA9)|(\xAA)|(\xAB)|(\xAC)|(\xAE)|(\xAF)|(\xB1)|(\xB4)|(\xB5)|(\xBB)|(\xBC)|(\xBE)|(\xBF)|(\xC0)|(\xC1)|(\xC2)|(\xC4)|(\xC7)|(\xC8)|(\xC9)|(\xCA)|(\xCB)|(\xCC)|(\xCD)|(\xCE)|(\xCF)|(\xD0)|(\xD1)|(\xD2)|(\xD3)|(\xD4)|(\xD5)|(\xD6)|(\xD8)|(\xD9)|(\xDB)|(\xDC)|(\xDD)|(\xE0)|(\xE1)|(\xE2)|(\xE3)|(\xE4)|(\xE5)|(\xE6)|(\xE7)|(\xE8)|(\xE9)|(\xEA)|(\xEB)|(\xEC)|(\xED)|(\xEE)|(\xEF)|(\xF1)|(\xF2)|(\xF3)|(\xF4)|(\xF6)|(\xF7)|(\xF8)|(\xFC)|([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])

    • In the Replace with field, type the regex, below :

    (?1\xC4)(?2\xC5)(?3\xC7)(?4\xC9)(?5\xD1)(?6\xD6)(?7\xDC)(?8\xE1)(?9\xE0)(?10\xE2)(?11\xE4)(?12\xE3)(?13\xE5)(?14\xE7)(?15\xE9)(?16\xE8)(?17\xEA)(?18\xEB)(?19\xED)(?20\xEC)(?21\xEE)(?22\xEF)(?23\xF1)(?24\xF3)(?25\xF2)(?26\xF4)(?27\xF6)(?28\xF5)(?29\xFA)(?30\xF9)(?31\xFB)(?32\xFC)(?33\x86)(?34\xB0)(?35\xA2)(?36\xA3)(?37\xA7)(?38\x95)(?39\xB6)(?40\xDF)(?41\xAE)(?42\xA9)(?43\x99)(?44\xB4)(?45\xA8)(?46\xC6)(?47\xD8)(?48\xB1)(?49\xA5)(?50\xB5)(?51\xAA)(?52\xBA)(?53\xE6)(?54\xF8)(?55\xBF)(?56\xA1)(?57\xAC)(?58\x83)(?59\xAB)(?60\xBB)(?61\x85)(?62\xA0)(?63\xC0)(?64\xC3)(?65\xD5)(?66\x8C)(?67\x9C)(?68\x96)(?69\x97)(?70\x93)(?71\x94)(?72\x91)(?73\x92)(?74\xF7)(?75\xFF)(?76\x9F)(?77\x80)(?78\x8B)(?79\x9B)(?80\x87)(?81\xB7)(?82\x82)(?83\x84)(?84\x89)(?85\xC2)(?86\xCA)(?87\xC1)(?88\xCB)(?89\xC8)(?90\xCD)(?91\xCE)(?92\xCF)(?93\xCC)(?94\xD3)(?95\xD4)(?96\xD2)(?97\xDA)(?98\xDB)(?99\xD9)(?{100}\x88)(?{101}\x98)(?{102}\xAF)(?{103}\xB8)(?{104}\x3F)

    • Click on the Replace All button

    Et voilà ! This time, after that S/R, the text should be correctly displayed :-)) Then :

    • Select the Menu option Encoding - Convert to UTF-8 OR Encoding - Convert to UTF-8 BOM

    • Finally, save this changed file !

    Best Regards,

    guy038

    P.S. :

    BTW, Claudia, if you see that post, I saw a Python script, called, mac_roman.py, in the folder …\Plugins\PythonScript\lib\encodings. Unfortunately, I couldn’t make it work. I suppose that it changes a MAC Roman text in a standard UTF-8 text !



  • Hello guy038,

    this file isn’t supposed to be used directly. It’s part of the codecs module which uses it internally
    when you specify its codec. E.g.

    import codecs
    
    with codecs.open(r'd:\macroman.txt', 'r', encoding='macroman') as fin:
        file_content = fin.read()
    
    with codecs.open(r'd:\macroman_utf8.txt', 'w', encoding='utf-8') as fout:
        fout.write(file_content)
    

    First with block opens a file which is assumed to be macroman encoded, reads it, saves it in variable file_content and
    closes file automatically.

    Next with block writes the file with utf-8 encoding and, again, closes automatically.

    Cheers
    Claudia



  • Thank you very much, guy038.
    I’m sure that’d do the trick. I’m just surprised we don’t have an option for Mac Roman right in the menu. Do you think there’s a reason for that? It’s just another encoding method, right? Is it fundamentally different from any of the iso-8859-x?



  • Hi, Valerio,

    Do you think there’s a reason for that?

    May be, it’s just because Notepad++ is rather “Windows oriented” !

    Is it fundamentally different from any of the iso-8859-x?

    Not at all. It’s just an other encoding, as all the others !


    You may, also, ask for adding the MAC Roman encoding, in N++, at the address, below :

    https://github.com/notepad-plus-plus/notepad-plus-plus/pulls

    However, as you can see, you have to be ( very ) patient :-(( There are plenty of requests, in that place !!

    Cheers,

    guy038



  • Thanks for sharing such amazing info
    مجلة رقيقة



  • Thanks for sharing
    فوائد الليمون



  • At least there is an codepage on windows Code Page 10000 Macintosh Roman:
    https://msdn.microsoft.com/en-us/library/cc195076.aspx
    so adding support should be not to complicated.



  • I’m trying to find a way to convert to macRoman. (needed for embedding subtitles into quicktime videos)
    I am trying to use the python scripting option.
    using:

    import codecs
    with codecs.open(r’d:\utf8.txt’, ‘r’, encoding=‘utf-8’) as fin:
    file_content = fin.read()
    with codecs.open(r’d:\macroman.txt’, ‘w’, encoding=‘macroman’) as fout:
    fout.write(file_content)

    I end up with macroman.txt encoded as ANSI, and empty.

    Any help here would be much appreciated.



  • @Ryan-Webber

    what about printing file_content to the python script console using

    console.write(file_content)

    Cheers
    Claudia



  • Thanks for sharing such amazing info
    sihtk.com
    كل ما يهمك سيدتي



  • I did find an Encoding option for it, but it’s buried where you wouldn’t expect!

    Encoding > Character Sets > Cyrillic > Macintosh

    Presto, all the Õ symbols become symbols the way they were typed on the Mac!


Log in to reply