Mac encoding

guy038

So, Valerio,

I slightly improve the above table, by notifying the corresponding Windows-1252 hex code of the character ( For instance, the Mac OS Roman hex value 80 represents the Ä character, which must be replaced with the hexa code \xC4), in order to be correctly displayed, in a document, with an ANSI or Windows-1252 encoding. )

Note that some characters, displayed in MAC OS Roman encoding, DON’T exist, in Windows-1252 encoding. These are the characters :

[\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]

For these characters, in the fourth column, the mention NO has been added and NO corresponding Hex W-1252 value is indicated in the fifth column

I found out an awful, but correct, regex, which converts a MAC OS Roman text in a Windows-1252 text. Basically, this regex find two types of characters :

Any character code, of the form \xnn, is changed in its corresponding Windows-1252 code, in order to get the same character glyph. For instance, the hexa code (\x80) ( group 1 ), is replaced with \xc4, thanks to the replacement form (?1\xC4)
Any character code, from the list ([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]) ( last group 104 ), which DON’T have any corresponding code, in Windows-1252 encoding, are replaced with the usual question mark character ?, of hexa code \x3F

Of course, any character of code < \x80, is NOT changed, at all !

Note that the (?-i) form, at the beginning of the search regex, forces the regex engine to take case in account ( NON insensitive ), even you didn’t check the match case option

So, Valerio, follow the few steps, below :

Open Notepad++
Open a new document ( CTRL + N )
If necessary, choose the ANSI encoding ( Menu option Encoding - Convert to ANSI )
Copy your MAC OS Roman text, in this new document

–> Well, your text should, still, miss some accentuated characters !

Move back to the very beginning of the file ( CTRL + Origin )
Open the Replace dialog ( CTRL + H )
Select the Regular expression search mode
In the Find what field, type the regex, below :

(?-i)(\x80)|(\x81)|(\x82)|(\x83)|(\x84)|(\x85)|(\x86)|(\x87)|(\x88)|(\x89)|(\x8A)|(\x8B)|(\x8C)|(\x8D)|(\x8E)|(\x8F)|(\x90)|(\x91)|(\x92)|(\x93)|(\x94)|(\x95)|(\x96)|(\x97)|(\x98)|(\x99)|(\x9A)|(\x9B)|(\x9C)|(\x9D)|(\x9E)|(\x9F)|(\xA0)|(\xA1)|(\xA2)|(\xA3)|(\xA4)|(\xA5)|(\xA6)|(\xA7)|(\xA8)|(\xA9)|(\xAA)|(\xAB)|(\xAC)|(\xAE)|(\xAF)|(\xB1)|(\xB4)|(\xB5)|(\xBB)|(\xBC)|(\xBE)|(\xBF)|(\xC0)|(\xC1)|(\xC2)|(\xC4)|(\xC7)|(\xC8)|(\xC9)|(\xCA)|(\xCB)|(\xCC)|(\xCD)|(\xCE)|(\xCF)|(\xD0)|(\xD1)|(\xD2)|(\xD3)|(\xD4)|(\xD5)|(\xD6)|(\xD8)|(\xD9)|(\xDB)|(\xDC)|(\xDD)|(\xE0)|(\xE1)|(\xE2)|(\xE3)|(\xE4)|(\xE5)|(\xE6)|(\xE7)|(\xE8)|(\xE9)|(\xEA)|(\xEB)|(\xEC)|(\xED)|(\xEE)|(\xEF)|(\xF1)|(\xF2)|(\xF3)|(\xF4)|(\xF6)|(\xF7)|(\xF8)|(\xFC)|([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])

In the Replace with field, type the regex, below :

(?1\xC4)(?2\xC5)(?3\xC7)(?4\xC9)(?5\xD1)(?6\xD6)(?7\xDC)(?8\xE1)(?9\xE0)(?10\xE2)(?11\xE4)(?12\xE3)(?13\xE5)(?14\xE7)(?15\xE9)(?16\xE8)(?17\xEA)(?18\xEB)(?19\xED)(?20\xEC)(?21\xEE)(?22\xEF)(?23\xF1)(?24\xF3)(?25\xF2)(?26\xF4)(?27\xF6)(?28\xF5)(?29\xFA)(?30\xF9)(?31\xFB)(?32\xFC)(?33\x86)(?34\xB0)(?35\xA2)(?36\xA3)(?37\xA7)(?38\x95)(?39\xB6)(?40\xDF)(?41\xAE)(?42\xA9)(?43\x99)(?44\xB4)(?45\xA8)(?46\xC6)(?47\xD8)(?48\xB1)(?49\xA5)(?50\xB5)(?51\xAA)(?52\xBA)(?53\xE6)(?54\xF8)(?55\xBF)(?56\xA1)(?57\xAC)(?58\x83)(?59\xAB)(?60\xBB)(?61\x85)(?62\xA0)(?63\xC0)(?64\xC3)(?65\xD5)(?66\x8C)(?67\x9C)(?68\x96)(?69\x97)(?70\x93)(?71\x94)(?72\x91)(?73\x92)(?74\xF7)(?75\xFF)(?76\x9F)(?77\x80)(?78\x8B)(?79\x9B)(?80\x87)(?81\xB7)(?82\x82)(?83\x84)(?84\x89)(?85\xC2)(?86\xCA)(?87\xC1)(?88\xCB)(?89\xC8)(?90\xCD)(?91\xCE)(?92\xCF)(?93\xCC)(?94\xD3)(?95\xD4)(?96\xD2)(?97\xDA)(?98\xDB)(?99\xD9)(?{100}\x88)(?{101}\x98)(?{102}\xAF)(?{103}\xB8)(?{104}\x3F)

Click on the Replace All button

Et voilà ! This time, after that S/R, the text should be correctly displayed :-)) Then :

Select the Menu option Encoding - Convert to UTF-8 OR Encoding - Convert to UTF-8 BOM
Finally, save this changed file !

Best Regards,

guy038

P.S. :

BTW, Claudia, if you see that post, I saw a Python script, called, mac_roman.py, in the folder …\Plugins\PythonScript\lib\encodings. Unfortunately, I couldn’t make it work. I suppose that it changes a MAC Roman text in a standard UTF-8 text !

Claudia Frank

Hello guy038,

this file isn’t supposed to be used directly. It’s part of the codecs module which uses it internally
when you specify its codec. E.g.

import codecs

with codecs.open(r'd:\macroman.txt', 'r', encoding='macroman') as fin:
    file_content = fin.read()

with codecs.open(r'd:\macroman_utf8.txt', 'w', encoding='utf-8') as fout:
    fout.write(file_content)

First with block opens a file which is assumed to be macroman encoded, reads it, saves it in variable file_content and
closes file automatically.

Next with block writes the file with utf-8 encoding and, again, closes automatically.

Cheers
Claudia

Valerio J

Thank you very much, guy038.
I’m sure that’d do the trick. I’m just surprised we don’t have an option for Mac Roman right in the menu. Do you think there’s a reason for that? It’s just another encoding method, right? Is it fundamentally different from any of the iso-8859-x?

guy038

Hi, Valerio,

Do you think there’s a reason for that?

May be, it’s just because Notepad++ is rather “Windows oriented” !

Is it fundamentally different from any of the iso-8859-x?

Not at all. It’s just an other encoding, as all the others !

You may, also, ask for adding the MAC Roman encoding, in N++, at the address, below :

https://github.com/notepad-plus-plus/notepad-plus-plus/pulls

However, as you can see, you have to be ( very ) patient :-(( There are plenty of requests, in that place !!

Cheers,

guy038

Ali Abdelzaher

Thanks for sharing such amazing info
مجلة رقيقة

Hamza Hamza

Thanks for sharing
فوائد الليمون

chcg

At least there is an codepage on windows Code Page 10000 Macintosh Roman:
https://msdn.microsoft.com/en-us/library/cc195076.aspx
so adding support should be not to complicated.

Ryan Webber

I’m trying to find a way to convert to macRoman. (needed for embedding subtitles into quicktime videos)
I am trying to use the python scripting option.
using:

import codecs
with codecs.open(r’d:\utf8.txt’, ‘r’, encoding=‘utf-8’) as fin:
file_content = fin.read()
with codecs.open(r’d:\macroman.txt’, ‘w’, encoding=‘macroman’) as fout:
fout.write(file_content)

I end up with macroman.txt encoded as ANSI, and empty.

Any help here would be much appreciated.

Claudia Frank

@Ryan-Webber

what about printing file_content to the python script console using

console.write(file_content)

Cheers
Claudia

Aziz Abdel

Thanks for sharing such amazing info
sihtk.com
كل ما يهمك سيدتي

J. Jones

I did find an Encoding option for it, but it’s buried where you wouldn’t expect!

Encoding > Character Sets > Cyrillic > Macintosh

Presto, all the Õ symbols become ’ symbols the way they were typed on the Mac!