Issue with Polish letters
-
Do you have a file which you can share?
-
@Ekopalypse ye that from screen i can share its jsut a text. Here is a link: https://www.mediafire.com/file/2u26drmdltsrnsg/Rap_Teksty.txt/file
-
that file seems to be Cyrillic (Windows-1251) encoded.
-
NO windows1250 does make more sense, I guess.
-
@Ekopalypse when i try to encode to that u mentioned the letter ‘k’ is changed to ‘к’ so its cant be cyrlic encrypted before
-
Does my last post look ok?
-
@Ekopalypse ye, its looks now fine
-
That means, select Windows-1250
and if it looks ok - convert to utf8 - save it - done.
-
@Ekopalypse Thanks Eko a lot!
-
@nightznero - my pleasure.
-
I didn’t follow super-closely, but was there reason to not convert to UTF-8 and then stay with that?
-
We had to find the right (ansi) encoding first, otherwise the conversion to utf-8 would result in incorrect text.
-
Hello, @nightznero, @alan-kilborn, @ekopalypse and All,
Encoding notions are really difficult to handle and are usually a nightmare for most of us !
From the @nightznero’s problem, I tried to build a method to guess the right encoding of an
ANSI
encoded file, containing characters wrongly displayed !
- First, copy all the text, below, in the clipboard :
•--------•---------•---------•---------•---------•---------•---------•---------• | Code | Win1250 | Win1251 | Win1252 | Win1253 | Win1254 | Win1257 | Win1258 | •--------•---------•---------•---------•---------•---------•---------•---------• | 80 | € | Ђ | € | € | € | € | € | | 81 | ◊ | Ѓ | ◊ | ◊ | ◊ | ◊ | ◊ | | 82 | ‚ | ‚ | ‚ | ‚ | ‚ | ‚ | ‚ | | 83 | ◊ | ѓ | ƒ | ƒ | ƒ | ◊ | ƒ | | 84 | „ | „ | „ | „ | „ | „ | „ | | 85 | … | … | … | … | … | … | … | | 86 | † | † | † | † | † | † | † | | 87 | ‡ | ‡ | ‡ | ‡ | ‡ | ‡ | ‡ | | 88 | ◊ | € | ˆ | ◊ | ˆ | ◊ | ˆ | | 89 | ‰ | ‰ | ‰ | ‰ | ‰ | ‰ | ‰ | | 8A | Š | Љ | Š | ◊ | Š | ◊ | ◊ | | 8B | ‹ | ‹ | ‹ | ‹ | ‹ | ‹ | ‹ | | 8C | Ś | Њ | Œ | ◊ | Œ | ◊ | Œ | | 8D | Ť | Ќ | ◊ | ◊ | ◊ | ¨ | ◊ | | 8E | Ž | Ћ | Ž | ◊ | ◊ | ˇ | ◊ | | 8F | Ź | Џ | ◊ | ◊ | ◊ | ¸ | ◊ | | 90 | ◊ | ђ | ◊ | ◊ | ◊ | ◊ | ◊ | | 91 | ‘ | ‘ | ‘ | ‘ | ‘ | ‘ | ‘ | | 92 | ’ | ’ | ’ | ’ | ’ | ’ | ’ | | 93 | “ | “ | “ | “ | “ | “ | “ | | 94 | ” | ” | ” | ” | ” | ” | ” | | 95 | • | • | • | • | • | • | • | | 96 | – | – | – | – | – | – | – | | 97 | — | — | — | — | — | — | — | | 98 | ◊ | ◊ | ˜ | ◊ | ˜ | ◊ | ˜ | | 99 | ™ | ™ | ™ | ™ | ™ | ™ | ™ | | 9A | š | љ | š | ◊ | š | ◊ | ◊ | | 9B | › | › | › | › | › | › | › | | 9C | ś | њ | œ | ◊ | œ | ◊ | œ | | 9D | ť | ќ | ◊ | ◊ | ◊ | ¯ | ◊ | | 9E | ž | ћ | ž | ◊ | ◊ | ˛ | ◊ | | 9F | ź | џ | Ÿ | ◊ | Ÿ | ◊ | Ÿ | •--------•---------•---------•---------•---------•---------•---------•---------• | A0 | | | | | | | | | A1 | ˇ | Ў | ¡ | ΅ | ¡ | ◊ | ¡ | | A2 | ˘ | ў | ¢ | Ά | ¢ | ¢ | ¢ | | A3 | Ł | Ј | £ | £ | £ | £ | £ | | A4 | ¤ | ¤ | ¤ | ¤ | ¤ | ¤ | ¤ | | A5 | Ą | Ґ | ¥ | ¥ | ¥ | ◊ | ¥ | | A6 | ¦ | ¦ | ¦ | ¦ | ¦ | ¦ | ¦ | | A7 | § | § | § | § | § | § | § | | A8 | ¨ | Ё | ¨ | ¨ | ¨ | Ø | ¨ | | A9 | © | © | © | © | © | © | © | | AA | Ş | Є | ª | ◊ | ª | Ŗ | ª | | AB | « | « | « | « | « | « | « | | AC | ¬ | ¬ | ¬ | ¬ | ¬ | ¬ | ¬ | | AD | | | | | | | | | AE | ® | ® | ® | ® | ® | ® | ® | | AF | Ż | Ї | ¯ | ― | ¯ | Æ | ¯ | | B0 | ° | ° | ° | ° | ° | ° | ° | | B1 | ± | ± | ± | ± | ± | ± | ± | | B2 | ˛ | І | ² | ² | ² | ² | ² | | B3 | ł | і | ³ | ³ | ³ | ³ | ³ | | B4 | ´ | ґ | ´ | ΄ | ´ | ´ | ´ | | B5 | µ | µ | µ | µ | µ | µ | µ | | B6 | ¶ | ¶ | ¶ | ¶ | ¶ | ¶ | ¶ | | B7 | · | · | · | · | · | · | · | | B8 | ¸ | ё | ¸ | Έ | ¸ | ø | ¸ | | B9 | ą | № | ¹ | Ή | ¹ | ¹ | ¹ | | BA | ş | є | º | Ί | º | ŗ | º | | BB | » | » | » | » | » | » | » | | BC | Ľ | ј | ¼ | Ό | ¼ | ¼ | ¼ | | BD | ˝ | Ѕ | ½ | ½ | ½ | ½ | ½ | | BE | ľ | ѕ | ¾ | Ύ | ¾ | ¾ | ¾ | | BF | ż | ї | ¿ | Ώ | ¿ | æ | ¿ | •--------•---------•---------•---------•---------•---------•---------•---------• | C0 | Ŕ | А | À | ΐ | À | Ą | À | | C1 | Á | Б | Á | Α | Á | Į | Á | | C2 |  | В |  | Β |  | Ā |  | | C3 | Ă | Г | à | Γ | à | Ć | Ă | | C4 | Ä | Д | Ä | Δ | Ä | Ä | Ä | | C5 | Ĺ | Е | Å | Ε | Å | Å | Å | | C6 | Ć | Ж | Æ | Ζ | Æ | Ę | Æ | | C7 | Ç | З | Ç | Η | Ç | Ē | Ç | | C8 | Č | И | È | Θ | È | Č | È | | C9 | É | Й | É | Ι | É | É | É |ֹ | CA | Ę | К | Ê | Κ | Ê | Ź | Ê |ֺ | CB | Ë | Л | Ë | Λ | Ë | Ė | Ë | | CC | Ě | М | Ì | Μ | Ì | Ģ | ̀ | | CD | Í | Н | Í | Ν | Í | Ķ | Í | | CE | Î | О | Î | Ξ | Î | Ī | Î | | CF | Ď | П | Ï | Ο | Ï | Ļ | Ï | | D0 | Đ | Р | Ð | Π | Ğ | Š | Đ | | D1 | Ń | С | Ñ | Ρ | Ñ | Ń | Ñ | | D2 | Ň | Т | Ò | ◊ | Ò | Ņ | ̉ | | D3 | Ó | У | Ó | Σ | Ó | Ó | Ó | | D4 | Ô | Ф | Ô | Τ | Ô | Ō | Ô | | D5 | Ő | Х | Õ | Υ | Õ | Õ | Ơ | | D6 | Ö | Ц | Ö | Φ | Ö | Ö | Ö | | D7 | × | Ч | × | Χ | × | × | × | | D8 | Ř | Ш | Ø | Ψ | Ø | Ų | Ø | | D9 | Ů | Щ | Ù | Ω | Ù | Ł | Ù | | DA | Ú | Ъ | Ú | Ϊ | Ú | Ś | Ú | | DB | Ű | Ы | Û | Ϋ | Û | Ū | Û | | DC | Ü | Ь | Ü | ά | Ü | Ü | Ü | | DD | Ý | Э | Ý | έ | İ | Ż | Ư | | DE | Ţ | Ю | Þ | ή | Ş | Ž | ̃ | | DF | ß | Я | ß | ί | ß | ß | ß | •--------•---------•---------•---------•---------•---------•---------•---------• | E0 | ŕ | а | à | ΰ | à | ą | à | | E1 | á | б | á | α | á | į | á | | E2 | â | в | â | β | â | ā | â | | E3 | ă | г | ã | γ | ã | ć | ă | | E4 | ä | д | ä | δ | ä | ä | ä | | E5 | ĺ | е | å | ε | å | å | å | | E6 | ć | ж | æ | ζ | æ | ę | æ | | E7 | ç | з | ç | η | ç | ē | ç | | E8 | č | и | è | θ | è | č | è | | E9 | é | й | é | ι | é | é | é | | EA | ę | к | ê | κ | ê | ź | ê | | EB | ë | л | ë | λ | ë | ė | ë | | EC | ě | м | ì | μ | ì | ģ | ́ | | ED | í | н | í | ν | í | ķ | í | | EE | î | о | î | ξ | î | ī | î | | EF | ď | п | ï | ο | ï | ļ | ï | | F0 | đ | р | ð | π | ğ | š | đ | | F1 | ń | с | ñ | ρ | ñ | ń | ñ | | F2 | ň | т | ò | ς | ò | ņ | ̣ | | F3 | ó | у | ó | σ | ó | ó | ó | | F4 | ô | ф | ô | τ | ô | ō | ô | | F5 | ő | х | õ | υ | õ | õ | ơ | | F6 | ö | ц | ö | φ | ö | ö | ö | | F7 | ÷ | ч | ÷ | χ | ÷ | ÷ | ÷ | | F8 | ř | ш | ø | ψ | ø | ų | ø | | F9 | ů | щ | ù | ω | ù | ł | ù | | FA | ú | ъ | ú | ϊ | ú | ś | ú | | FB | ű | ы | û | ϋ | û | ū | û | | FC | ü | ь | ü | ό | ü | ü | ü | | FD | ý | э | ý | ύ | ı | ż | ư | | FE | ţ | ю | þ | ώ | ş | ž | ₫ | | FF | ˙ | я | ÿ | ◊ | ÿ | ˙ | ÿ | •--------•---------•---------•---------•---------•---------•---------•---------•
Note that, in this table, the
◊
character means that the character is not defined for the corresponding encoding !
-
Open a new N++ tab (
Ctrl + N
) -
Run the command
Encoding > Convert to UTF-8-BOM
( IMPORTANT ) -
Paste the clipboard contents in that new tab (
Ctrl + V
) -
Save this file as
Windows_European_Encodings.txt
-
From the first word, not correctly displayed of your
ANSI
file (le¿y
in @nightznero’s text ), select the wrong character (¿
) -
Open the Find dialog (
Ctrl + F
) -
Tick the
March case
and theWrap around
options -
Select the
Normal
search mode -
Switch back to the
Windows_European_Encodings.txt
file, that we just created -
Click on the
Find Next
button
=> The caret should be on the line :
| BF | ż | ї | ¿ | Ώ | ¿ | æ | ¿ |
Necessarily, your correct character, instead of the
¿
char, must be found within that line !And @nightznero would have easily detected that the right character was
ż
, forming the wordleży
! Now, as theż
belongs to theWindows-1250
encoding :- Select the command
Encoding > Character Sets > Central European > Windows-1250
=> All the text seems, now, completely readable ;-))
-
So, encode this file with the
UTF-8
encoding, running one of these two commands :-
Encoding > Convert to UTF-8
-
Encoding > Convert to UTF-8-BOM
-
-
Save the changed contents (
Ctrl + S
)
Note that we could have searched for other characters, listed below, which are accentuated characters from @nightznero’s text :
•--------•---------• •---------• | Code | Win1252 | | Win1250 | •--------•---------• •---------• | 8C | Œ | | Ś | | 9C | œ | | ś | | 9F | Ÿ | | ź | | A3 | £ | | Ł | | A5 | ¥ | | Ą | | AF | ¯ | | Ż | | B3 | ³ | => | ł | | B9 | ¹ | | ą | | BF | ¿ | | ż | | C6 | Æ | | Ć | | E6 | æ | | ć | | EA | ê | | ę | | F1 | ñ | | ń | •--------•---------• •---------•
BTW, I found out a character which is different in all the different
Windows-125#
Windows encodings. This is theANSI
char\x{de}
. To write it, simply hold down theAlt
key and hit, successively, the keys0
,2
,2
and2
, from the numeric keypad !•--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------• | Code | Win-1250 | Win-1251 | Win-1252 | Win-1253 | Win-1254 | Win-1257 | Win-1258 | Win-1255 | Win-1256 | | ALT •--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------• | + 0222 | Centr. Eur. | Cyrillic | West. Eur. | Greek | Turkish | Baltic | Vietnamese | Hebrew | Arabic | •--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------• | DE | Ţ | Ю | Þ | ή | Ş | Ž | ̃ | Undefined | ق | •--------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•--------------•
So, for instance, if you type the
\x{de}
character, in anANSI
encoded file :-
If the character displayed is
ή
, this means that your currentANSI
codepage is probablyWin-1253
-
If the character displayed is
Ţ
, this means that your currentANSI
codepage must beWin-1250
Just run the command
? > Debug Info...
to verify !
To end with, from this link, you should be convinced to always manage
UTF-8
encoded files ! ( ~96,7 %
of all files coded in Websites ! )You may also click, to the left part, on the yearly list, which perfectly shows the growth of the
UTF-8
encoding and the decrease of all other encodings, during these last ten years !Now, to get the contents of the Windows encodings, as text files, click on : https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS
Best Regards,
guy038
-
For polish he should use ISO 8859-2 (Eastern European), but nowadays I would rather recommend UTF-8.
@nightznero To twoja twórczość?