Issue with Polish letters

Ekopalypse

That means, select Windows-1250

and if it looks ok - convert to utf8 - save it - done.

nightznero

@Ekopalypse Thanks Eko a lot!

Ekopalypse

@nightznero - my pleasure.

Alan Kilborn

I didn’t follow super-closely, but was there reason to not convert to UTF-8 and then stay with that?

Ekopalypse

@Alan-Kilborn

We had to find the right (ansi) encoding first, otherwise the conversion to utf-8 would result in incorrect text.

guy038

Hello, @nightznero, @alan-kilborn, @ekopalypse and All,

Encoding notions are really difficult to handle and are usually a nightmare for most of us !

From the @nightznero’s problem, I tried to build a method to guess the right encoding of an ANSI encoded file, containing characters wrongly displayed !

First, copy all the text, below, in the clipboard :

•--------•---------•---------•---------•---------•---------•---------•---------•
|  Code  | Win1250 | Win1251 | Win1252 | Win1253 | Win1254 | Win1257 | Win1258 |
•--------•---------•---------•---------•---------•---------•---------•---------•
|   80   |    €    |    Ђ    |    €    |    €    |    €    |    €    |    €    |
|   81   |    ◊    |    Ѓ    |    ◊    |    ◊    |    ◊    |    ◊    |    ◊    |
|   82   |    ‚    |    ‚    |    ‚    |    ‚    |    ‚    |    ‚    |    ‚    |
|   83   |    ◊    |    ѓ    |    ƒ    |    ƒ    |    ƒ    |    ◊    |    ƒ    |
|   84   |    „    |    „    |    „    |    „    |    „    |    „    |    „    |
|   85   |    …    |    …    |    …    |    …    |    …    |    …    |    …    |
|   86   |    †    |    †    |    †    |    †    |    †    |    †    |    †    |
|   87   |    ‡    |    ‡    |    ‡    |    ‡    |    ‡    |    ‡    |    ‡    |
|   88   |    ◊    |    €    |    ˆ    |    ◊    |    ˆ    |    ◊    |    ˆ    |
|   89   |    ‰    |    ‰    |    ‰    |    ‰    |    ‰    |    ‰    |    ‰    |
|   8A   |    Š    |    Љ    |    Š    |    ◊    |    Š    |    ◊    |    ◊    |
|   8B   |    ‹    |    ‹    |    ‹    |    ‹    |    ‹    |    ‹    |    ‹    |
|   8C   |    Ś    |    Њ    |    Œ    |    ◊    |    Œ    |    ◊    |    Œ    |
|   8D   |    Ť    |    Ќ    |    ◊    |    ◊    |    ◊    |    ¨    |    ◊    |
|   8E   |    Ž    |    Ћ    |    Ž    |    ◊    |    ◊    |    ˇ    |    ◊    |
|   8F   |    Ź    |    Џ    |    ◊    |    ◊    |    ◊    |    ¸    |    ◊    |
|   90   |    ◊    |    ђ    |    ◊    |    ◊    |    ◊    |    ◊    |    ◊    |
|   91   |    ‘    |    ‘    |    ‘    |    ‘    |    ‘    |    ‘    |    ‘    |
|   92   |    ’    |    ’    |    ’    |    ’    |    ’    |    ’    |    ’    |
|   93   |    “    |    “    |    “    |    “    |    “    |    “    |    “    |
|   94   |    ”    |    ”    |    ”    |    ”    |    ”    |    ”    |    ”    |
|   95   |    •    |    •    |    •    |    •    |    •    |    •    |    •    |
|   96   |    –    |    –    |    –    |    –    |    –    |    –    |    –    |
|   97   |    —    |    —    |    —    |    —    |    —    |    —    |    —    |
|   98   |    ◊    |    ◊    |    ˜    |    ◊    |    ˜    |    ◊    |    ˜    |
|   99   |    ™    |    ™    |    ™    |    ™    |    ™    |    ™    |    ™    |
|   9A   |    š    |    љ    |    š    |    ◊    |    š    |    ◊    |    ◊    |
|   9B   |    ›    |    ›    |    ›    |    ›    |    ›    |    ›    |    ›    |
|   9C   |    ś    |    њ    |    œ    |    ◊    |    œ    |    ◊    |    œ    |
|   9D   |    ť    |    ќ    |    ◊    |    ◊    |    ◊    |    ¯    |    ◊    |
|   9E   |    ž    |    ћ    |    ž    |    ◊    |    ◊    |    ˛    |    ◊    |
|   9F   |    ź    |    џ    |    Ÿ    |    ◊    |    Ÿ    |    ◊    |    Ÿ    |
•--------•---------•---------•---------•---------•---------•---------•---------•
|   A0   |         |         |         |         |         |         |         |
|   A1   |    ˇ    |    Ў    |    ¡    |    ΅    |    ¡    |    ◊    |    ¡    |
|   A2   |    ˘    |    ў    |    ¢    |    Ά    |    ¢    |    ¢    |    ¢    |
|   A3   |    Ł    |    Ј    |    £    |    £    |    £    |    £    |    £    |
|   A4   |    ¤    |    ¤    |    ¤    |    ¤    |    ¤    |    ¤    |    ¤    |
|   A5   |    Ą    |    Ґ    |    ¥    |    ¥    |    ¥    |    ◊    |    ¥    |
|   A6   |    ¦    |    ¦    |    ¦    |    ¦    |    ¦    |    ¦    |    ¦    |
|   A7   |    §    |    §    |    §    |    §    |    §    |    §    |    §    |
|   A8   |    ¨    |    Ё    |    ¨    |    ¨    |    ¨    |    Ø    |    ¨    |
|   A9   |    ©    |    ©    |    ©    |    ©    |    ©    |    ©    |    ©    |
|   AA   |    Ş    |    Є    |    ª    |    ◊    |    ª    |    Ŗ    |    ª    |
|   AB   |    «    |    «    |    «    |    «    |    «    |    «    |    «    |
|   AC   |    ¬    |    ¬    |    ¬    |    ¬    |    ¬    |    ¬    |    ¬    |
|   AD   |        |        |        |        |        |        |        |
|   AE   |    ®    |    ®    |    ®    |    ®    |    ®    |    ®    |    ®    |
|   AF   |    Ż    |    Ї    |    ¯    |    ―    |    ¯    |    Æ    |    ¯    |
|   B0   |    °    |    °    |    °    |    °    |    °    |    °    |    °    |
|   B1   |    ±    |    ±    |    ±    |    ±    |    ±    |    ±    |    ±    |
|   B2   |    ˛    |    І    |    ²    |    ²    |    ²    |    ²    |    ²    |
|   B3   |    ł    |    і    |    ³    |    ³    |    ³    |    ³    |    ³    |
|   B4   |    ´    |    ґ    |    ´    |    ΄    |    ´    |    ´    |    ´    |
|   B5   |    µ    |    µ    |    µ    |    µ    |    µ    |    µ    |    µ    |
|   B6   |    ¶    |    ¶    |    ¶    |    ¶    |    ¶    |    ¶    |    ¶    |
|   B7   |    ·    |    ·    |    ·    |    ·    |    ·    |    ·    |    ·    |
|   B8   |    ¸    |    ё    |    ¸    |    Έ    |    ¸    |    ø    |    ¸    |
|   B9   |    ą    |    №    |    ¹    |    Ή    |    ¹    |    ¹    |    ¹    |
|   BA   |    ş    |    є    |    º    |    Ί    |    º    |    ŗ    |    º    |
|   BB   |    »    |    »    |    »    |    »    |    »    |    »    |    »    |
|   BC   |    Ľ    |    ј    |    ¼    |    Ό    |    ¼    |    ¼    |    ¼    |
|   BD   |    ˝    |    Ѕ    |    ½    |    ½    |    ½    |    ½    |    ½    |
|   BE   |    ľ    |    ѕ    |    ¾    |    Ύ    |    ¾    |    ¾    |    ¾    |
|   BF   |    ż    |    ї    |    ¿    |    Ώ    |    ¿    |    æ    |    ¿    |
•--------•---------•---------•---------•---------•---------•---------•---------•
|   C0   |    Ŕ    |    А    |    À    |    ΐ    |    À    |    Ą    |    À    |
|   C1   |    Á    |    Б    |    Á    |    Α    |    Á    |    Į    |    Á    |
|   C2   |    Â    |    В    |    Â    |    Β    |    Â    |    Ā    |    Â    |
|   C3   |    Ă    |    Г    |    Ã    |    Γ    |    Ã    |    Ć    |    Ă    |
|   C4   |    Ä    |    Д    |    Ä    |    Δ    |    Ä    |    Ä    |    Ä    |
|   C5   |    Ĺ    |    Е    |    Å    |    Ε    |    Å    |    Å    |    Å    |
|   C6   |    Ć    |    Ж    |    Æ    |    Ζ    |    Æ    |    Ę    |    Æ    |
|   C7   |    Ç    |    З    |    Ç    |    Η    |    Ç    |    Ē    |    Ç    |
|   C8   |    Č    |    И    |    È    |    Θ    |    È    |    Č    |    È    |
|   C9   |    É    |    Й    |    É    |    Ι    |    É    |    É    |    É    |ֹ
|   CA   |    Ę    |    К    |    Ê    |    Κ    |    Ê    |    Ź    |    Ê    |ֺ
|   CB   |    Ë    |    Л    |    Ë    |    Λ    |    Ë    |    Ė    |    Ë    |
|   CC   |    Ě    |    М    |    Ì    |    Μ    |    Ì    |    Ģ    |    ̀     |
|   CD   |    Í    |    Н    |    Í    |    Ν    |    Í    |    Ķ    |    Í    |
|   CE   |    Î    |    О    |    Î    |    Ξ    |    Î    |    Ī    |    Î    |
|   CF   |    Ď    |    П    |    Ï    |    Ο    |    Ï    |    Ļ    |    Ï    |
|   D0   |    Đ    |    Р    |    Ð    |    Π    |    Ğ    |    Š    |    Đ    |
|   D1   |    Ń    |    С    |    Ñ    |    Ρ    |    Ñ    |    Ń    |    Ñ    |
|   D2   |    Ň    |    Т    |    Ò    |    ◊    |    Ò    |    Ņ    |    ̉     |
|   D3   |    Ó    |    У    |    Ó    |    Σ    |    Ó    |    Ó    |    Ó    |
|   D4   |    Ô    |    Ф    |    Ô    |    Τ    |    Ô    |    Ō    |    Ô    |
|   D5   |    Ő    |    Х    |    Õ    |    Υ    |    Õ    |    Õ    |    Ơ    |
|   D6   |    Ö    |    Ц    |    Ö    |    Φ    |    Ö    |    Ö    |    Ö    |
|   D7   |    ×    |    Ч    |    ×    |    Χ    |    ×    |    ×    |    ×    |
|   D8   |    Ř    |    Ш    |    Ø    |    Ψ    |    Ø    |    Ų    |    Ø    |
|   D9   |    Ů    |    Щ    |    Ù    |    Ω    |    Ù    |    Ł    |    Ù    |
|   DA   |    Ú    |    Ъ    |    Ú    |    Ϊ    |    Ú    |    Ś    |    Ú    |
|   DB   |    Ű    |    Ы    |    Û    |    Ϋ    |    Û    |    Ū    |    Û    |
|   DC   |    Ü    |    Ь    |    Ü    |    ά    |    Ü    |    Ü    |    Ü    |
|   DD   |    Ý    |    Э    |    Ý    |    έ    |    İ    |    Ż    |    Ư    |
|   DE   |    Ţ    |    Ю    |    Þ    |    ή    |    Ş    |    Ž    |    ̃     |
|   DF   |    ß    |    Я    |    ß    |    ί    |    ß    |    ß    |    ß    |
•--------•---------•---------•---------•---------•---------•---------•---------•
|   E0   |    ŕ    |    а    |    à    |    ΰ    |    à    |    ą    |    à    |
|   E1   |    á    |    б    |    á    |    α    |    á    |    į    |    á    |
|   E2   |    â    |    в    |    â    |    β    |    â    |    ā    |    â    |
|   E3   |    ă    |    г    |    ã    |    γ    |    ã    |    ć    |    ă    |
|   E4   |    ä    |    д    |    ä    |    δ    |    ä    |    ä    |    ä    |
|   E5   |    ĺ    |    е    |    å    |    ε    |    å    |    å    |    å    |
|   E6   |    ć    |    ж    |    æ    |    ζ    |    æ    |    ę    |    æ    |
|   E7   |    ç    |    з    |    ç    |    η    |    ç    |    ē    |    ç    |
|   E8   |    č    |    и    |    è    |    θ    |    è    |    č    |    è    |
|   E9   |    é    |    й    |    é    |    ι    |    é    |    é    |    é    |
|   EA   |    ę    |    к    |    ê    |    κ    |    ê    |    ź    |    ê    |
|   EB   |    ë    |    л    |    ë    |    λ    |    ë    |    ė    |    ë    |
|   EC   |    ě    |    м    |    ì    |    μ    |    ì    |    ģ    |    ́     |
|   ED   |    í    |    н    |    í    |    ν    |    í    |    ķ    |    í    |
|   EE   |    î    |    о    |    î    |    ξ    |    î    |    ī    |    î    |
|   EF   |    ď    |    п    |    ï    |    ο    |    ï    |    ļ    |    ï    |
|   F0   |    đ    |    р    |    ð    |    π    |    ğ    |    š    |    đ    |
|   F1   |    ń    |    с    |    ñ    |    ρ    |    ñ    |    ń    |    ñ    |
|   F2   |    ň    |    т    |    ò    |    ς    |    ò    |    ņ    |    ̣     |
|   F3   |    ó    |    у    |    ó    |    σ    |    ó    |    ó    |    ó    |
|   F4   |    ô    |    ф    |    ô    |    τ    |    ô    |    ō    |    ô    |
|   F5   |    ő    |    х    |    õ    |    υ    |    õ    |    õ    |    ơ    |
|   F6   |    ö    |    ц    |    ö    |    φ    |    ö    |    ö    |    ö    |
|   F7   |    ÷    |    ч    |    ÷    |    χ    |    ÷    |    ÷    |    ÷    |
|   F8   |    ř    |    ш    |    ø    |    ψ    |    ø    |    ų    |    ø    |
|   F9   |    ů    |    щ    |    ù    |    ω    |    ù    |    ł    |    ù    |
|   FA   |    ú    |    ъ    |    ú    |    ϊ    |    ú    |    ś    |    ú    |
|   FB   |    ű    |    ы    |    û    |    ϋ    |    û    |    ū    |    û    |
|   FC   |    ü    |    ь    |    ü    |    ό    |    ü    |    ü    |    ü    |
|   FD   |    ý    |    э    |    ý    |    ύ    |    ı    |    ż    |    ư    |
|   FE   |    ţ    |    ю    |    þ    |    ώ    |    ş    |    ž    |    ₫    |
|   FF   |    ˙    |    я    |    ÿ    |    ◊    |    ÿ    |    ˙    |    ÿ    |
•--------•---------•---------•---------•---------•---------•---------•---------•

Note that, in this table, the ◊ character means that the character is not defined for the corresponding encoding !

Open a new N++ tab ( Ctrl + N )
Run the command Encoding > Convert to UTF-8-BOM ( IMPORTANT )
Paste the clipboard contents in that new tab ( Ctrl + V )
Save this file as Windows_European_Encodings.txt
From the first word, not correctly displayed of your ANSI file ( le¿y in @nightznero’s text ), select the wrong character ( ¿ )
Open the Find dialog ( Ctrl + F )
Tick the March case and the Wrap around options
Select the Normal search mode
Switch back to the Windows_European_Encodings.txt file, that we just created
Click on the Find Next button

=> The caret should be on the line :

|   BF   |    ż    |    ї    |    ¿    |    Ώ    |    ¿    |    æ    |    ¿    |

Necessarily, your correct character, instead of the ¿ char, must be found within that line !

And @nightznero would have easily detected that the right character was ż, forming the word leży ! Now, as the ż belongs to the Windows-1250 encoding :

Select the command Encoding > Character Sets > Central European > Windows-1250

=> All the text seems, now, completely readable ;-))

So, encode this file with the UTF-8 encoding, running one of these two commands :
- Encoding > Convert to UTF-8
- Encoding > Convert to UTF-8-BOM
Save the changed contents ( Ctrl + S )

Note that we could have searched for other characters, listed below, which are accentuated characters from @nightznero’s text :

    •--------•---------•      •---------•
    |  Code  | Win1252 |      | Win1250 |
    •--------•---------•      •---------•
    |   8C   |    Œ    |      |    Ś    |
    |   9C   |    œ    |      |    ś    |
    |   9F   |    Ÿ    |      |    ź    |
    |   A3   |    £    |      |    Ł    |
    |   A5   |    ¥    |      |    Ą    |
    |   AF   |    ¯    |      |    Ż    |
    |   B3   |    ³    |  =>  |    ł    |
    |   B9   |    ¹    |      |    ą    |
    |   BF   |    ¿    |      |    ż    |
    |   C6   |    Æ    |      |    Ć    |
    |   E6   |    æ    |      |    ć    |
    |   EA   |    ê    |      |    ę    |
    |   F1   |    ñ    |      |    ń    |
    •--------•---------•      •---------•

BTW, I found out a character which is different in all the different Windows-125# Windows encodings. This is the ANSI char \x{de}. To write it, simply hold down the Alt key and hit, successively, the keys 0, 2, 2 and 2, from the numeric keypad !

•--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•
|  Code  |   Win-1250   |   Win-1251   |   Win-1252   |   Win-1253   |   Win-1254   |   Win-1257   |   Win-1258   |   Win-1255   |   Win-1256   |
|  ALT   •--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•
| + 0222 |  Centr. Eur. |   Cyrillic   |  West. Eur.  |    Greek     |   Turkish    |    Baltic    |  Vietnamese  |    Hebrew    |    Arabic    |
•--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•
|   DE   |      Ţ       |      Ю       |      Þ       |      ή       |      Ş       |      Ž       |      ̃        |  Undefined   |      ق       |
•--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•

So, for instance, if you type the \x{de} character, in an ANSI encoded file :

If the character displayed is ή, this means that your current ANSI codepage is probably Win-1253
If the character displayed is Ţ, this means that your current ANSI codepage must be Win-1250

Just run the command ? > Debug Info... to verify !

To end with, from this link, you should be convinced to always manage UTF-8 encoded files ! ( ~ 96,7 % of all files coded in Websites ! )

You may also click, to the left part, on the yearly list, which perfectly shows the growth of the UTF-8 encoding and the decrease of all other encodings, during these last ten years !

Now, to get the contents of the Windows encodings, as text files, click on : https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS

Best Regards,

guy038

ArkadiuszMichalski

For polish he should use ISO 8859-2 (Eastern European), but nowadays I would rather recommend UTF-8.
@nightznero To twoja twórczość?