Mac encoding
-
So, Valerio,
I slightly improve the above table, by notifying the corresponding Windows-1252 hex code of the character ( For instance, the Mac OS Roman hex value
80
represents theÄ
character, which must be replaced with the hexa code\xC4
), in order to be correctly displayed, in a document, with an ANSI or Windows-1252 encoding. )Note that some characters, displayed in MAC OS Roman encoding, DON’T exist, in Windows-1252 encoding. These are the characters :
[\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF]
For these characters, in the fourth column, the mention NO has been added and NO corresponding Hex W-1252 value is indicated in the fifth column
I found out an awful, but correct, regex, which converts a MAC OS Roman text in a Windows-1252 text. Basically, this regex find two types of characters :
-
Any character code, of the form
\xnn
, is changed in its corresponding Windows-1252 code, in order to get the same character glyph. For instance, the hexa code(\x80)
( group 1 ), is replaced with\xc4
, thanks to the replacement form(?1\xC4)
-
Any character code, from the list
([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])
( last group 104 ), which DON’T have any corresponding code, in Windows-1252 encoding, are replaced with the usual question mark character?
, of hexa code\x3F
Of course, any character of code <
\x80
, is NOT changed, at all !Note that the (?-i) form, at the beginning of the search regex, forces the regex engine to take case in account ( NON insensitive ), even you didn’t check the match case option
So, Valerio, follow the few steps, below :
-
Open Notepad++
-
Open a new document ( CTRL + N )
-
If necessary, choose the ANSI encoding ( Menu option Encoding - Convert to ANSI )
-
Copy your MAC OS Roman text, in this new document
–> Well, your text should, still, miss some accentuated characters !
-
Move back to the very beginning of the file ( CTRL + Origin )
-
Open the Replace dialog ( CTRL + H )
-
Select the Regular expression search mode
-
In the Find what field, type the regex, below :
(?-i)(\x80)|(\x81)|(\x82)|(\x83)|(\x84)|(\x85)|(\x86)|(\x87)|(\x88)|(\x89)|(\x8A)|(\x8B)|(\x8C)|(\x8D)|(\x8E)|(\x8F)|(\x90)|(\x91)|(\x92)|(\x93)|(\x94)|(\x95)|(\x96)|(\x97)|(\x98)|(\x99)|(\x9A)|(\x9B)|(\x9C)|(\x9D)|(\x9E)|(\x9F)|(\xA0)|(\xA1)|(\xA2)|(\xA3)|(\xA4)|(\xA5)|(\xA6)|(\xA7)|(\xA8)|(\xA9)|(\xAA)|(\xAB)|(\xAC)|(\xAE)|(\xAF)|(\xB1)|(\xB4)|(\xB5)|(\xBB)|(\xBC)|(\xBE)|(\xBF)|(\xC0)|(\xC1)|(\xC2)|(\xC4)|(\xC7)|(\xC8)|(\xC9)|(\xCA)|(\xCB)|(\xCC)|(\xCD)|(\xCE)|(\xCF)|(\xD0)|(\xD1)|(\xD2)|(\xD3)|(\xD4)|(\xD5)|(\xD6)|(\xD8)|(\xD9)|(\xDB)|(\xDC)|(\xDD)|(\xE0)|(\xE1)|(\xE2)|(\xE3)|(\xE4)|(\xE5)|(\xE6)|(\xE7)|(\xE8)|(\xE9)|(\xEA)|(\xEB)|(\xEC)|(\xED)|(\xEE)|(\xEF)|(\xF1)|(\xF2)|(\xF3)|(\xF4)|(\xF6)|(\xF7)|(\xF8)|(\xFC)|([\xAD\xB0\xB2\xB3\xB6\xB7\xB8\xB9\xBA\xBD\xC3\xC5\xC6\xD7\xDA\xDE\xDF\xF0\xF5\xF9\xFA\xFB\xFD\xFE\xFF])
- In the Replace with field, type the regex, below :
(?1\xC4)(?2\xC5)(?3\xC7)(?4\xC9)(?5\xD1)(?6\xD6)(?7\xDC)(?8\xE1)(?9\xE0)(?10\xE2)(?11\xE4)(?12\xE3)(?13\xE5)(?14\xE7)(?15\xE9)(?16\xE8)(?17\xEA)(?18\xEB)(?19\xED)(?20\xEC)(?21\xEE)(?22\xEF)(?23\xF1)(?24\xF3)(?25\xF2)(?26\xF4)(?27\xF6)(?28\xF5)(?29\xFA)(?30\xF9)(?31\xFB)(?32\xFC)(?33\x86)(?34\xB0)(?35\xA2)(?36\xA3)(?37\xA7)(?38\x95)(?39\xB6)(?40\xDF)(?41\xAE)(?42\xA9)(?43\x99)(?44\xB4)(?45\xA8)(?46\xC6)(?47\xD8)(?48\xB1)(?49\xA5)(?50\xB5)(?51\xAA)(?52\xBA)(?53\xE6)(?54\xF8)(?55\xBF)(?56\xA1)(?57\xAC)(?58\x83)(?59\xAB)(?60\xBB)(?61\x85)(?62\xA0)(?63\xC0)(?64\xC3)(?65\xD5)(?66\x8C)(?67\x9C)(?68\x96)(?69\x97)(?70\x93)(?71\x94)(?72\x91)(?73\x92)(?74\xF7)(?75\xFF)(?76\x9F)(?77\x80)(?78\x8B)(?79\x9B)(?80\x87)(?81\xB7)(?82\x82)(?83\x84)(?84\x89)(?85\xC2)(?86\xCA)(?87\xC1)(?88\xCB)(?89\xC8)(?90\xCD)(?91\xCE)(?92\xCF)(?93\xCC)(?94\xD3)(?95\xD4)(?96\xD2)(?97\xDA)(?98\xDB)(?99\xD9)(?{100}\x88)(?{101}\x98)(?{102}\xAF)(?{103}\xB8)(?{104}\x3F)
- Click on the Replace All button
Et voilà ! This time, after that S/R, the text should be correctly displayed :-)) Then :
-
Select the Menu option Encoding - Convert to UTF-8 OR Encoding - Convert to UTF-8 BOM
-
Finally, save this changed file !
Best Regards,
guy038
P.S. :
BTW, Claudia, if you see that post, I saw a Python script, called, mac_roman.py, in the folder …\Plugins\PythonScript\lib\encodings. Unfortunately, I couldn’t make it work. I suppose that it changes a MAC Roman text in a standard UTF-8 text !
-
-
Hello guy038,
this file isn’t supposed to be used directly. It’s part of the codecs module which uses it internally
when you specify its codec. E.g.import codecs with codecs.open(r'd:\macroman.txt', 'r', encoding='macroman') as fin: file_content = fin.read() with codecs.open(r'd:\macroman_utf8.txt', 'w', encoding='utf-8') as fout: fout.write(file_content)
First with block opens a file which is assumed to be macroman encoded, reads it, saves it in variable file_content and
closes file automatically.Next with block writes the file with utf-8 encoding and, again, closes automatically.
Cheers
Claudia -
Thank you very much, guy038.
I’m sure that’d do the trick. I’m just surprised we don’t have an option for Mac Roman right in the menu. Do you think there’s a reason for that? It’s just another encoding method, right? Is it fundamentally different from any of the iso-8859-x? -
Hi, Valerio,
Do you think there’s a reason for that?
May be, it’s just because Notepad++ is rather “Windows oriented” !
Is it fundamentally different from any of the iso-8859-x?
Not at all. It’s just an other encoding, as all the others !
You may, also, ask for adding the MAC Roman encoding, in N++, at the address, below :
https://github.com/notepad-plus-plus/notepad-plus-plus/pulls
However, as you can see, you have to be ( very ) patient :-(( There are plenty of requests, in that place !!
Cheers,
guy038
-
Thanks for sharing such amazing info
مجلة رقيقة -
Thanks for sharing
فوائد الليمون -
At least there is an codepage on windows Code Page 10000 Macintosh Roman:
https://msdn.microsoft.com/en-us/library/cc195076.aspx
so adding support should be not to complicated. -
I’m trying to find a way to convert to macRoman. (needed for embedding subtitles into quicktime videos)
I am trying to use the python scripting option.
using:import codecs
with codecs.open(r’d:\utf8.txt’, ‘r’, encoding=‘utf-8’) as fin:
file_content = fin.read()
with codecs.open(r’d:\macroman.txt’, ‘w’, encoding=‘macroman’) as fout:
fout.write(file_content)I end up with macroman.txt encoded as ANSI, and empty.
Any help here would be much appreciated.
-
what about printing file_content to the python script console using
console.write(file_content)
Cheers
Claudia -
Thanks for sharing such amazing info
sihtk.com
كل ما يهمك سيدتي -
I did find an Encoding option for it, but it’s buried where you wouldn’t expect!
Encoding > Character Sets > Cyrillic > Macintosh
Presto, all the Õ symbols become ’ symbols the way they were typed on the Mac!