Code page weird function; CP/Windows-1250
-
I see function of code page that I don’t understand. Try this:
- Create new file.
- Write down some glyphs.
- Save file as arbitrary file with any extension.
- Change code page to Windows-1250 (Central European).
- Delete text and paste ◘○•🔶 ěščřžýáíéúůťďňÓ.
- Save, close and open file.
- You’ll get •0•?? ěščřžýáíéúůťďňÓ.
- Specify code page again.
- Still •0•?? ěščřžýáíéúůťďňÓ.
Why glyphs are shown well before save if after re-open they are not?
-
Hello, @uzivatel919 and All
Before trying to explain this N++ behavior, we need additional information :
-
At step
1, when you create your new file ( File > New ) or with theCtrl+ Nshortcut, what’s the current encoding of thisnew #file, before doing anything else ? I guess it should beWindows-1250. Am I right about it ? -
At step
2, before saving file, did you type in pureASCIIglyphs, only ( with code in range[\x{0020}-\x{007f}]) or did you add some accentuated characters ( as, for instance ,čřorť) -
At step
8just to confirm : you meant “SpecifyWindows-1250code-page, again”, didn’t you ?
Best Regards
guy038
-
-
- Use Crtl + N as well as File/New. For me encoding is set to UTF-8 by default.
- Use arbitrary glyphs. It is just not allowed to save empty file.
- Yes, exactly choose Windows-1250 CP again.
-
Hi, @uzivatel919 and All
First, note that the tests, below, have been performed, whatever the option
Autodetect character encodingwas checked or not, in the dialogSettings > Preferences... > MISCYour method can be simplified to this first scenario, below :
- Create a new file (
Crtl + N)
Note that the present encoding is
UTF-8-
Select the option
Encoding > Character Sets > Central European > Windows-1250 -
Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This text is encoded with the
Windows-1250encoding using1byte to describe each character Note that some of the graphical characters, which do not belong to theWindows-1250encoding, are replaced, of course, with a question mark?!https://en.wikipedia.org/wiki/Windows-1250
-
Save the file, with, for instance, the name
Test.txt -
Close
Test.txt(Ctrl + W) -
Re-open
Test.txt(Ctrl + Shift + T)
=> The letters of the text are correct but, as expected, some graphical characters are replaced with a
?. Moreover, Notepad++ detects the ANSI encoding , which is, indeed, quite equivalent to theWindow-1250encoding, used by your system, for all NON-Unicode files- Select, again, the option
Encoding > Character Sets > Central European > Windows-1250
=> As the encoding process just re-interprets all the
1-byte encoded characters, nothing has changed, asWindows-1250 encoding≡≡ANSI encoding. Note, that, as I’m French, on my system, for instance, there is the equivalenceWindows-1252 encoding≡≡ANSI encoding
Now, let’s imagine the second scenario, below :
- Create a new file (
Crtl + N)
Note that the present encoding is
UTF-8- Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This time, as we haven’t change current encoding yet, this text is, then, encoded with the
UTF-8encoding, using between1to4bytes to describe all the characters-
Select the option
Encoding > Character Sets > Central European > Windows-1250 -
Click on the
Yesbutton of the small dialog Save Current Modification -
Choose, again, the name
Test.txtand save the file
=> So, the encoding is changed to
Windows-1250. But the encoding operation does NOT change the present contents of the file. Notepad++ just re-interprets all bytes of the file as it was a range of1-byteencoded characters, of theWindows-1250encoding => So, it’s obvious that all text seems rather incomprehensible !Thus, internally, the
Test.txtfile is still a suite of characters, each described according to theUTF-8encoding-
Close
Test.txt(Ctrl + W) -
Re-open
Test.txt(Ctrl + Shift + T)
=> The text and most of the graphical chars are correct, according to your current font, and the
UTF-8encoding is automatically chosen ;-))Remarks :
-
As this text is an UTF-8 encoded, you may “test” any other character set, using
Encoding > Character Sets > ....menu option -
You’ll notice that during this test phase, the file contents are NOT modified at all and the icon of the file remains blue !
-
At the end, after that test phase, just select the option
Encoding > Encode in UTF-8to get the original text back ;-))
Remember :
-
During an encoding operation, the present contents of the current file are, just, re-interpreted as they were encoded with this new encoding and are
nevermodified -
During a conversion operation, the present contents of the current file are
modified,so that the new contents of the file correspond to the new encoding of the same characters
In other words :
-
The option “
Encode in ...” OR “Character sets/ ...” just read the present file contents, according to the new chosen encoding, giving, generally, a new representation of the current file contents -
The option “
Convert to ...” does modify the present file contents in order to be read, in an identical way, with the new chosen encoding.
To end with :
-
As you see, encoding and conversion concepts are not easy to assimilate. So, I advice everyone to always use the
UTF-8encoding or, better, theUTF-8-BOMencoding, which is able to encode, absolutely, all the Unicode characters ! -
Of course, to fully exploit the
UTF-8files, your system must contain some fonts which cover most of Unicode characters and/or symbols ! -
For the record, as of today,
92.6%of Web pages are encoded inUTF-8-BOM;-)) Refer to the link, below :
https://w3techs.com/technologies/history_overview/character_encoding/ms/y
Best Regards,
guy038
- Create a new file (
-
Yes, yes. I know that things around. I was just surprised by given error since I was used to perfect Notepad++ code page functions. I did not realized at the moment what CP-1250 really includes.
Btw, thanks.