Community
    • Login

    Mac encoding

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    14 Posts 10 Posters 17.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Valerio JV
      Valerio J
      last edited by

      Hi everyone.

      A friend sent me a plain text document he created on a Mac. We’re both Italian, so naturally there were a lot of è, é, ì characters involved, which i cannot see correctly on his file. By inspecting the document with a hex editor, I’ve come to the conclusion he must be using this encoding:
      https://en.wikipedia.org/wiki/Mac_OS_Roman
      Normally I would try and encode the document by it and then, perhaps, convert it to UTF-8 so I can work on it more portably. But I can’t find an entry for Mac OS Roman in the encodings list.
      Is it completely unsupported?
      Thanks.

      1 Reply Last reply Reply Quote 0
      • khouya abelwahedK
        khouya abelwahed
        last edited by

        thanks for this post :D
        ما يهمك سيدتي

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello Valerio,

          As you’re Italian, you probably use, as default ANSI encoding, the Windows-1252 encoding. Refer the link below :

          https://msdn.microsoft.com/en-us/goglobal/cc305145

          You may verify that I’m not mistaken, by opening the Character Panel ( Menu option Edit - Character Panel ). The list displayed should be the same as the Microsoft table, above !

          Here is below, a table of the MAC OS Roman encoding, for characters over \x7F only. Remember that characters, with code-point < \x80, are always identical, in any Windows, OEM, ISO or UTF-8 encoding.

          •-------------------------------------------------------------------------------------------------•
          |                        MAC OS Roman Encoding  ( Windows Code Page 10000 )                       |
          •--------------•-------•------------------•--------•----------------------------------------------•
          | MAC OS Roman | Char. |   Windows-1252   |  UNI-  |                   UNICODE                    |
          |--------------|       |------------------•  CODE  |                                              |
          | Hexa | Deci. | Glyph | Encoded |  Hexa  | Value  |                Character Name                |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  80  |  128  |   Ä   |         |   C4   |  00C4  |  LATIN CAPITAL LETTER A WITH DIAERESIS       |
          |  81  |  129  |   Å   |         |   C5   |  00C5  |  LATIN CAPITAL LETTER A WITH RING ABOVE      |
          |  82  |  130  |   Ç   |         |   C7   |  00C7  |  LATIN CAPITAL LETTER C WITH CEDILLA         |
          |  83  |  131  |   É   |         |   C9   |  00C9  |  LATIN CAPITAL LETTER E WITH ACUTE           |
          |  84  |  132  |   Ñ   |         |   D1   |  00D1  |  LATIN CAPITAL LETTER N WITH TILDE           |
          |  85  |  133  |   Ö   |         |   D6   |  00D6  |  LATIN CAPITAL LETTER O WITH DIAERESIS       |
          |  86  |  134  |   Ü   |         |   DC   |  00DC  |  LATIN CAPITAL LETTER U WITH DIAERESIS       |
          |  87  |  135  |   á   |         |   E1   |  00E1  |  LATIN SMALL LETTER A WITH ACUTE             |
          |  88  |  136  |   à   |         |   E0   |  00E0  |  LATIN SMALL LETTER A WITH GRAVE             |
          |  89  |  137  |   â   |         |   E2   |  00E2  |  LATIN SMALL LETTER A WITH CIRCUMFLEX        |
          |  8A  |  138  |   ä   |         |   E4   |  00E4  |  LATIN SMALL LETTER A WITH DIAERESIS         |
          |  8B  |  139  |   ã   |         |   E3   |  00E3  |  LATIN SMALL LETTER A WITH TILDE             |
          |  8C  |  140  |   å   |         |   E5   |  00E5  |  LATIN SMALL LETTER A WITH RING ABOVE        |
          |  8D  |  141  |   ç   |         |   E7   |  00E7  |  LATIN SMALL LETTER C WITH CEDILLA           |
          |  8E  |  142  |   é   |         |   E9   |  00E9  |  LATIN SMALL LETTER E WITH ACUTE             |
          |  8F  |  143  |   è   |         |   E8   |  00E8  |  LATIN SMALL LETTER E WITH GRAVE             |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  90  |  144  |   ê   |         |   EA   |  00EA  |  LATIN SMALL LETTER E WITH CIRCUMFLEX        |
          |  91  |  145  |   ë   |         |   EB   |  00EB  |  LATIN SMALL LETTER E WITH DIAERESIS         |
          |  92  |  146  |   í   |         |   ED   |  00ED  |  LATIN SMALL LETTER I WITH ACUTE             |
          |  93  |  147  |   ì   |         |   EC   |  00EC  |  LATIN SMALL LETTER I WITH GRAVE             |
          |  94  |  148  |   î   |         |   EE   |  00EE  |  LATIN SMALL LETTER I WITH CIRCUMFLEX        |
          |  95  |  149  |   ï   |         |   EF   |  00EF  |  LATIN SMALL LETTER I WITH DIAERESIS         |
          |  96  |  150  |   ñ   |         |   F1   |  00F1  |  LATIN SMALL LETTER N WITH TILDE             |
          |  97  |  151  |   ó   |         |   F3   |  00F3  |  LATIN SMALL LETTER O WITH ACUTE             |
          |  98  |  152  |   ò   |         |   F2   |  00F2  |  LATIN SMALL LETTER O WITH GRAVE             |
          |  99  |  153  |   ô   |         |   F4   |  00F4  |  LATIN SMALL LETTER O WITH CIRCUMFLEX        |
          |  9A  |  154  |   ö   |         |   F6   |  00F6  |  LATIN SMALL LETTER O WITH DIAERESIS         |
          |  9B  |  155  |   õ   |         |   F5   |  00F5  |  LATIN SMALL LETTER O WITH TILDE             |
          |  9C  |  156  |   ú   |         |   FA   |  00FA  |  LATIN SMALL LETTER U WITH ACUTE             |
          |  9D  |  157  |   ù   |         |   F9   |  00F9  |  LATIN SMALL LETTER U WITH GRAVE             |
          |  9E  |  158  |   û   |         |   FB   |  00FB  |  LATIN SMALL LETTER U WITH CIRCUMFLEX        |
          |  9F  |  159  |   ü   |         |   FC   |  00FC  |  LATIN SMALL LETTER U WITH DIAERESIS         |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  A0  |  160  |   †   |         |   86   |  2020  |  DAGGER                                      |
          |  A1  |  161  |   °   |         |   B0   |  00B0  |  DEGREE SIGN                                 |
          |  A2  |  162  |   ¢   |         |   A2   |  00A2  |  CENT SIGN                                   |
          |  A3  |  163  |   £   |         |   A3   |  00A3  |  POUND SIGN                                  |
          |  A4  |  164  |   §   |         |   A7   |  00A7  |  SECTION SIGN                                |
          |  A5  |  165  |   •   |         |   95   |  2022  |  BULLET                                      |
          |  A6  |  166  |   ¶   |         |   B6   |  00B6  |  PILCROW SIGN                                |
          |  A7  |  167  |   ß   |         |   DF   |  00DF  |  LATIN SMALL LETTER SHARP S                  |
          |  A8  |  168  |   ®   |         |   AE   |  00AE  |  REGISTERED SIGN                             |
          |  A9  |  169  |   ©   |         |   A9   |  00A9  |  COPYRIGHT SIGN                              |
          |  AA  |  170  |   ™   |         |   99   |  2122  |  TRADE MARK SIGN                             |
          |  AB  |  171  |   ´   |         |   B4   |  00B4  |  ACUTE ACCENT                                |
          |  AC  |  172  |   ¨   |         |   A8   |  00A8  |  DIAERESIS                                   |
          |  AD  |  173  |   ≠   |   NO    |        |  2260  |  NOT EQUAL TO                                |
          |  AE  |  174  |   Æ   |         |   C6   |  00C6  |  LATIN CAPITAL LETTER AE                     |
          |  AF  |  175  |   Ø   |         |   D8   |  00D8  |  LATIN CAPITAL LETTER O WITH STROKE          |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  B0  |  176  |   ∞   |   NO    |        |  221E  |  INFINITY                                    |
          |  B1  |  177  |   ±   |         |   B1   |  00B1  |  PLUS-MINUS SIGN                             |
          |  B2  |  178  |   ≤   |   NO    |        |  2264  |  LESS-THAN OR EQUAL TO                       |
          |  B3  |  179  |   ≥   |   NO    |        |  2265  |  GREATER-THAN OR EQUAL TO                    |
          |  B4  |  180  |   ¥   |         |   A5   |  00A5  |  YEN SIGN                                    |
          |  B5  |  181  |   µ   |         |   B5   |  00B5  |  MICRO SIGN                                  |
          |  B6  |  182  |   ∂   |   NO    |        |  2202  |  PARTIAL DIFFERENTIAL                        |
          |  B7  |  183  |   ∑   |   NO    |        |  2211  |  N-ARY SUMMATION                             |
          |  B8  |  184  |   ∏   |   NO    |        |  220F  |  N-ARY PRODUCT                               |
          |  B9  |  185  |   π   |   NO    |        |  03C0  |  GREEK SMALL LETTER PI                       |
          |  BA  |  186  |   ∫   |   NO    |        |  222B  |  INTEGRAL                                    |
          |  BB  |  187  |   ª   |         |   AA   |  00AA  |  FEMININE ORDINAL INDICATOR                  |
          |  BC  |  188  |   º   |         |   BA   |  00BA  |  MASCULINE ORDINAL INDICATOR                 |
          |  BD  |  189  |   Ω   |   NO    |        |  03A9  |  GREEK CAPITAL LETTER OMEGA                  |
          |  BE  |  190  |   æ   |         |   E6   |  00E6  |  LATIN SMALL LETTER AE                       |
          |  BF  |  191  |   ø   |         |   F8   |  00F8  |  LATIN SMALL LETTER O WITH STROKE            |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  C0  |  192  |   ¿   |         |   BF   |  00BF  |  INVERTED QUESTION MARK                      |
          |  C1  |  193  |   ¡   |         |   A1   |  00A1  |  INVERTED EXCLAMATION MARK                   |
          |  C2  |  194  |   ¬   |         |   AC   |  00AC  |  NOT SIGN                                    |
          |  C3  |  195  |   √   |   NO    |        |  221A  |  SQUARE ROOT                                 |
          |  C4  |  196  |   ƒ   |         |   83   |  0192  |  LATIN SMALL LETTER F WITH HOOK              |
          |  C5  |  197  |   ≈   |   NO    |        |  2248  |  ALMOST EQUAL TO                             |
          |  C6  |  198  |   ∆   |   NO    |        |  2206  |  INCREMENT                                   |
          |  C7  |  199  |   «   |         |   AB   |  00AB  |  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK   |
          |  C8  |  200  |   »   |         |   BB   |  00BB  |  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK  |
          |  C9  |  201  |   …   |         |   85   |  2026  |  HORIZONTAL ELLIPSIS                         |
          |  CA  |  202  |       |         |   A0   |  00A0  |  NO-BREAK SPACE                              |
          |  CB  |  203  |   À   |         |   C0   |  00C0  |  LATIN CAPITAL LETTER A WITH GRAVE           |
          |  CC  |  204  |   Ã   |         |   C3   |  00C3  |  LATIN CAPITAL LETTER A WITH TILDE           |
          |  CD  |  205  |   Õ   |         |   D5   |  00D5  |  LATIN CAPITAL LETTER O WITH TILDE           |
          |  CE  |  206  |   Œ   |         |   8C   |  0152  |  LATIN CAPITAL LIGATURE OE                   |
          |  CF  |  207  |   œ   |         |   9C   |  0153  |  LATIN SMALL LIGATURE OE                     |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  D0  |  208  |   –   |         |   96   |  2013  |  EN DASH                                     |
          |  D1  |  209  |   —   |         |   97   |  2014  |  EM DASH                                     |
          |  D2  |  210  |   “   |         |   93   |  201C  |  LEFT DOUBLE QUOTATION MARK                  |
          |  D3  |  211  |   ”   |         |   94   |  201D  |  RIGHT DOUBLE QUOTATION MARK                 |
          |  D4  |  212  |   ‘   |         |   91   |  2018  |  LEFT SINGLE QUOTATION MARK                  |
          |  D5  |  213  |   ’   |         |   92   |  2019  |  RIGHT SINGLE QUOTATION MARK                 |
          |  D6  |  214  |   ÷   |         |   F7   |  00F7  |  DIVISION SIGN                               |
          |  D7  |  215  |   ◊   |   NO    |        |  25CA  |  LOZENGE                                     |
          |  D8  |  216  |   ÿ   |         |   FF   |  00FF  |  LATIN SMALL LETTER Y WITH DIAERESIS         |
          |  D9  |  217  |   Ÿ   |         |   9F   |  0178  |  LATIN CAPITAL LETTER Y WITH DIAERESIS       |
          |  DA  |  218  |   ⁄   |   NO    |        |  2044  |  FRACTION SLASH                              |
          |  DB  |  219  |   €   |         |   80   |  20AC  |  EURO SIGN                                   |
          |  DC  |  220  |   ‹   |         |   8B   |  2039  |  SINGLE LEFT-POINTING ANGLE QUOTATION MARK   |
          |  DD  |  221  |   ›   |         |   9B   |  203A  |  SINGLE RIGHT-POINTING ANGLE QUOTATION MARK  |
          |  DE  |  222  |   fi   |   NO    |        |  FB01  |  LATIN SMALL LIGATURE FI                     |
          |  DF  |  223  |   fl   |   NO    |        |  FB02  |  LATIN SMALL LIGATURE FL                     |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  E0  |  224  |   ‡   |         |   87   |  2021  |  DOUBLE DAGGER                               |
          |  E1  |  225  |   ·   |         |   B7   |  00B7  |  MIDDLE DOT                                  |
          |  E2  |  226  |   ‚   |         |   82   |  201A  |  SINGLE LOW-9 QUOTATION MARK                 |
          |  E3  |  227  |   „   |         |   84   |  201E  |  DOUBLE LOW-9 QUOTATION MARK                 |
          |  E4  |  228  |   ‰   |         |   89   |  2030  |  PER MILLE SIGN                              |
          |  E5  |  229  |   Â   |         |   C2   |  00C2  |  LATIN CAPITAL LETTER A WITH CIRCUMFLEX      |
          |  E6  |  230  |   Ê   |         |   CA   |  00CA  |  LATIN CAPITAL LETTER E WITH CIRCUMFLEX      |
          |  E7  |  231  |   Á   |         |   C1   |  00C1  |  LATIN CAPITAL LETTER A WITH ACUTE           |
          |  E8  |  232  |   Ë   |         |   CB   |  00CB  |  LATIN CAPITAL LETTER E WITH DIAERESIS       |
          |  E9  |  233  |   È   |         |   C8   |  00C8  |  LATIN CAPITAL LETTER E WITH GRAVE           |
          |  EA  |  234  |   Í   |         |   CD   |  00CD  |  LATIN CAPITAL LETTER I WITH ACUTE           |
          |  EB  |  235  |   Î   |         |   CE   |  00CE  |  LATIN CAPITAL LETTER I WITH CIRCUMFLEX      |
          |  EC  |  236  |   Ï   |         |   CF   |  00CF  |  LATIN CAPITAL LETTER I WITH DIAERESIS       |
          |  ED  |  237  |   Ì   |         |   CC   |  00CC  |  LATIN CAPITAL LETTER I WITH GRAVE           |
          |  EE  |  238  |   Ó   |         |   D3   |  00D3  |  LATIN CAPITAL LETTER O WITH ACUTE           |
          |  EF  |  239  |   Ô   |         |   D4   |  00D4  |  LATIN CAPITAL LETTER O WITH CIRCUMFLEX      |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          |  F0  |  240  |      |   NO    |        |  F8FF  |  APPLE LOGO                                  |
          |  F1  |  241  |   Ò   |         |   D2   |  00D2  |  LATIN CAPITAL LETTER O WITH GRAVE           |
          |  F2  |  242  |   Ú   |         |   DA   |  00DA  |  LATIN CAPITAL LETTER U WITH ACUTE           |
          |  F3  |  243  |   Û   |         |   DB   |  00DB  |  LATIN CAPITAL LETTER U WITH CIRCUMFLEX      |
          |  F4  |  244  |   Ù   |         |   D9   |  00D9  |  LATIN CAPITAL LETTER U WITH GRAVE           |
          |  F5  |  245  |   ı   |   NO    |        |  0131  |  LATIN SMALL LETTER DOTLESS I                |
          |  F6  |  246  |   ˆ   |         |   88   |  02C6  |  MODIFIER LETTER CIRCUMFLEX ACCENT           |
          |  F7  |  247  |   ˜   |         |   98   |  02DC  |  SMALL TILDE                                 |
          |  F8  |  248  |   ¯   |         |   AF   |  00AF  |  MACRON                                      |
          |  F9  |  249  |   ˘   |   NO    |        |  02D8  |  BREVE                                       |
          |  FA  |  250  |   ˙   |   NO    |        |  02D9  |  DOT ABOVE                                   |
          |  FB  |  251  |   ˚   |   NO    |        |  02DA  |  RING ABOVE                                  |
          |  FC  |  252  |   ¸   |         |   B8   |  00B8  |  CEDILLA                                     |
          |  FD  |  253  |   ˝   |   NO    |        |  02DD  |  DOUBLE ACUTE ACCENT                         |
          |  FE  |  254  |   ˛   |   NO    |        |  02DB  |  OGONEK                                      |
          |  FF  |  255  |   ˇ   |   NO    |        |  02C7  |  CARON                                       |
          •------•-------•-------•---------•--------•--------•----------------------------------------------•
          

          IMPORTANT : I follow, with an other post, below, because a post can’t store more than 16384 characters !!

          guy038

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            So, Valerio,

            I slightly improve the above table, by notifying the corresponding Windows-1252 hex code of the character ( For instance, the Mac OS Roman hex value 80 represents the Ä character, which must be replaced with the hexa code \xC4), in order to be correctly displayed, in a document, with an ANSI or Windows-1252 encoding. )

            Note that some characters, displayed in MAC OS Roman encoding, DON’T exist, in Windows-1252 encoding. These are the characters :

            [\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]

            For these characters, in the fourth column, the mention NO has been added and NO corresponding Hex W-1252 value is indicated in the fifth column


            I found out an awful, but correct, regex, which converts a MAC OS Roman text in a Windows-1252 text. Basically, this regex find two types of characters :

            • Any character code, of the form \xnn, is changed in its corresponding Windows-1252 code, in order to get the same character glyph. For instance, the hexa code (\x80) ( group 1 ), is replaced with \xc4, thanks to the replacement form (?1\xC4)

            • Any character code, from the list ([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]) ( last group 104 ), which DON’T have any corresponding code, in Windows-1252 encoding, are replaced with the usual question mark character ?, of hexa code \x3F

            Of course, any character of code < \x80, is NOT changed, at all !

            Note that the (?-i) form, at the beginning of the search regex, forces the regex engine to take case in account ( NON insensitive ), even you didn’t check the match case option


            So, Valerio, follow the few steps, below :

            • Open Notepad++

            • Open a new document ( CTRL + N )

            • If necessary, choose the ANSI encoding ( Menu option Encoding - Convert to ANSI )

            • Copy your MAC OS Roman text, in this new document

            –> Well, your text should, still, miss some accentuated characters !

            • Move back to the very beginning of the file ( CTRL + Origin )

            • Open the Replace dialog ( CTRL + H )

            • Select the Regular expression search mode

            • In the Find what field, type the regex, below :

            (?-i)(\x80)|(\x81)|(\x82)|(\x83)|(\x84)|(\x85)|(\x86)|(\x87)|(\x88)|(\x89)|(\x8A)|(\x8B)|(\x8C)|(\x8D)|(\x8E)|(\x8F)|(\x90)|(\x91)|(\x92)|(\x93)|(\x94)|(\x95)|(\x96)|(\x97)|(\x98)|(\x99)|(\x9A)|(\x9B)|(\x9C)|(\x9D)|(\x9E)|(\x9F)|(\xA0)|(\xA1)|(\xA2)|(\xA3)|(\xA4)|(\xA5)|(\xA6)|(\xA7)|(\xA8)|(\xA9)|(\xAA)|(\xAB)|(\xAC)|(\xAE)|(\xAF)|(\xB1)|(\xB4)|(\xB5)|(\xBB)|(\xBC)|(\xBE)|(\xBF)|(\xC0)|(\xC1)|(\xC2)|(\xC4)|(\xC7)|(\xC8)|(\xC9)|(\xCA)|(\xCB)|(\xCC)|(\xCD)|(\xCE)|(\xCF)|(\xD0)|(\xD1)|(\xD2)|(\xD3)|(\xD4)|(\xD5)|(\xD6)|(\xD8)|(\xD9)|(\xDB)|(\xDC)|(\xDD)|(\xE0)|(\xE1)|(\xE2)|(\xE3)|(\xE4)|(\xE5)|(\xE6)|(\xE7)|(\xE8)|(\xE9)|(\xEA)|(\xEB)|(\xEC)|(\xED)|(\xEE)|(\xEF)|(\xF1)|(\xF2)|(\xF3)|(\xF4)|(\xF6)|(\xF7)|(\xF8)|(\xFC)|([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])

            • In the Replace with field, type the regex, below :

            (?1\xC4)(?2\xC5)(?3\xC7)(?4\xC9)(?5\xD1)(?6\xD6)(?7\xDC)(?8\xE1)(?9\xE0)(?10\xE2)(?11\xE4)(?12\xE3)(?13\xE5)(?14\xE7)(?15\xE9)(?16\xE8)(?17\xEA)(?18\xEB)(?19\xED)(?20\xEC)(?21\xEE)(?22\xEF)(?23\xF1)(?24\xF3)(?25\xF2)(?26\xF4)(?27\xF6)(?28\xF5)(?29\xFA)(?30\xF9)(?31\xFB)(?32\xFC)(?33\x86)(?34\xB0)(?35\xA2)(?36\xA3)(?37\xA7)(?38\x95)(?39\xB6)(?40\xDF)(?41\xAE)(?42\xA9)(?43\x99)(?44\xB4)(?45\xA8)(?46\xC6)(?47\xD8)(?48\xB1)(?49\xA5)(?50\xB5)(?51\xAA)(?52\xBA)(?53\xE6)(?54\xF8)(?55\xBF)(?56\xA1)(?57\xAC)(?58\x83)(?59\xAB)(?60\xBB)(?61\x85)(?62\xA0)(?63\xC0)(?64\xC3)(?65\xD5)(?66\x8C)(?67\x9C)(?68\x96)(?69\x97)(?70\x93)(?71\x94)(?72\x91)(?73\x92)(?74\xF7)(?75\xFF)(?76\x9F)(?77\x80)(?78\x8B)(?79\x9B)(?80\x87)(?81\xB7)(?82\x82)(?83\x84)(?84\x89)(?85\xC2)(?86\xCA)(?87\xC1)(?88\xCB)(?89\xC8)(?90\xCD)(?91\xCE)(?92\xCF)(?93\xCC)(?94\xD3)(?95\xD4)(?96\xD2)(?97\xDA)(?98\xDB)(?99\xD9)(?{100}\x88)(?{101}\x98)(?{102}\xAF)(?{103}\xB8)(?{104}\x3F)

            • Click on the Replace All button

            Et voilà ! This time, after that S/R, the text should be correctly displayed :-)) Then :

            • Select the Menu option Encoding - Convert to UTF-8 OR Encoding - Convert to UTF-8 BOM

            • Finally, save this changed file !

            Best Regards,

            guy038

            P.S. :

            BTW, Claudia, if you see that post, I saw a Python script, called, mac_roman.py, in the folder …\Plugins\PythonScript\lib\encodings. Unfortunately, I couldn’t make it work. I suppose that it changes a MAC Roman text in a standard UTF-8 text !

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank @guy038
              last edited by

              Hello guy038,

              this file isn’t supposed to be used directly. It’s part of the codecs module which uses it internally
              when you specify its codec. E.g.

              import codecs
              
              with codecs.open(r'd:\macroman.txt', 'r', encoding='macroman') as fin:
                  file_content = fin.read()
              
              with codecs.open(r'd:\macroman_utf8.txt', 'w', encoding='utf-8') as fout:
                  fout.write(file_content)
              

              First with block opens a file which is assumed to be macroman encoded, reads it, saves it in variable file_content and
              closes file automatically.

              Next with block writes the file with utf-8 encoding and, again, closes automatically.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Valerio JV
                Valerio J
                last edited by

                Thank you very much, guy038.
                I’m sure that’d do the trick. I’m just surprised we don’t have an option for Mac Roman right in the menu. Do you think there’s a reason for that? It’s just another encoding method, right? Is it fundamentally different from any of the iso-8859-x?

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, Valerio,

                  Do you think there’s a reason for that?

                  May be, it’s just because Notepad++ is rather “Windows oriented” !

                  Is it fundamentally different from any of the iso-8859-x?

                  Not at all. It’s just an other encoding, as all the others !


                  You may, also, ask for adding the MAC Roman encoding, in N++, at the address, below :

                  https://github.com/notepad-plus-plus/notepad-plus-plus/pulls

                  However, as you can see, you have to be ( very ) patient :-(( There are plenty of requests, in that place !!

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • Ali AbdelzaherA
                    Ali Abdelzaher
                    last edited by

                    Thanks for sharing such amazing info
                    مجلة رقيقة

                    1 Reply Last reply Reply Quote 0
                    • Hamza HamzaH
                      Hamza Hamza
                      last edited by

                      Thanks for sharing
                      فوائد الليمون

                      1 Reply Last reply Reply Quote 0
                      • chcgC
                        chcg
                        last edited by

                        At least there is an codepage on windows Code Page 10000 Macintosh Roman:
                        https://msdn.microsoft.com/en-us/library/cc195076.aspx
                        so adding support should be not to complicated.

                        1 Reply Last reply Reply Quote 0
                        • Ryan WebberR
                          Ryan Webber
                          last edited by

                          I’m trying to find a way to convert to macRoman. (needed for embedding subtitles into quicktime videos)
                          I am trying to use the python scripting option.
                          using:

                          import codecs
                          with codecs.open(r’d:\utf8.txt’, ‘r’, encoding=‘utf-8’) as fin:
                          file_content = fin.read()
                          with codecs.open(r’d:\macroman.txt’, ‘w’, encoding=‘macroman’) as fout:
                          fout.write(file_content)

                          I end up with macroman.txt encoded as ANSI, and empty.

                          Any help here would be much appreciated.

                          Claudia FrankC 1 Reply Last reply Reply Quote 0
                          • Claudia FrankC
                            Claudia Frank @Ryan Webber
                            last edited by

                            @Ryan-Webber

                            what about printing file_content to the python script console using

                            console.write(file_content)

                            Cheers
                            Claudia

                            1 Reply Last reply Reply Quote 0
                            • Aziz AbdelA
                              Aziz Abdel
                              last edited by

                              Thanks for sharing such amazing info
                              sihtk.com
                              كل ما يهمك سيدتي

                              1 Reply Last reply Reply Quote 0
                              • J. JonesJ
                                J. Jones
                                last edited by

                                I did find an Encoding option for it, but it’s buried where you wouldn’t expect!

                                Encoding > Character Sets > Cyrillic > Macintosh

                                Presto, all the Õ symbols become ’ symbols the way they were typed on the Mac!

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors