Code page weird function; CP/Windows-1250
-
I see function of code page that I don’t understand. Try this:
- Create new file.
- Write down some glyphs.
- Save file as arbitrary file with any extension.
- Change code page to Windows-1250 (Central European).
- Delete text and paste ◘○•🔶 ěščřžýáíéúůťďňÓ.
- Save, close and open file.
- You’ll get •0•?? ěščřžýáíéúůťďňÓ.
- Specify code page again.
- Still •0•?? ěščřžýáíéúůťďňÓ.
Why glyphs are shown well before save if after re-open they are not?
-
Hello, @uzivatel919 and All
Before trying to explain this N++ behavior, we need additional information :
-
At step
1, when you create your new file ( File > New ) or with theCtrl+ Nshortcut, what’s the current encoding of thisnew #file, before doing anything else ? I guess it should beWindows-1250. Am I right about it ? -
At step
2, before saving file, did you type in pureASCIIglyphs, only ( with code in range[\x{0020}-\x{007f}]) or did you add some accentuated characters ( as, for instance ,čřorť) -
At step
8just to confirm : you meant “SpecifyWindows-1250code-page, again”, didn’t you ?
Best Regards
guy038
-
-
- Use Crtl + N as well as File/New. For me encoding is set to UTF-8 by default.
- Use arbitrary glyphs. It is just not allowed to save empty file.
- Yes, exactly choose Windows-1250 CP again.
-
Hi, @uzivatel919 and All
First, note that the tests, below, have been performed, whatever the option
Autodetect character encodingwas checked or not, in the dialogSettings > Preferences... > MISCYour method can be simplified to this first scenario, below :
- Create a new file (
Crtl + N)
Note that the present encoding is
UTF-8-
Select the option
Encoding > Character Sets > Central European > Windows-1250 -
Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This text is encoded with the
Windows-1250encoding using1byte to describe each character Note that some of the graphical characters, which do not belong to theWindows-1250encoding, are replaced, of course, with a question mark?!https://en.wikipedia.org/wiki/Windows-1250
-
Save the file, with, for instance, the name
Test.txt -
Close
Test.txt(Ctrl + W) -
Re-open
Test.txt(Ctrl + Shift + T)
=> The letters of the text are correct but, as expected, some graphical characters are replaced with a
?. Moreover, Notepad++ detects the ANSI encoding , which is, indeed, quite equivalent to theWindow-1250encoding, used by your system, for all NON-Unicode files- Select, again, the option
Encoding > Character Sets > Central European > Windows-1250
=> As the encoding process just re-interprets all the
1-byte encoded characters, nothing has changed, asWindows-1250 encoding≡≡ANSI encoding. Note, that, as I’m French, on my system, for instance, there is the equivalenceWindows-1252 encoding≡≡ANSI encoding
Now, let’s imagine the second scenario, below :
- Create a new file (
Crtl + N)
Note that the present encoding is
UTF-8- Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This time, as we haven’t change current encoding yet, this text is, then, encoded with the
UTF-8encoding, using between1to4bytes to describe all the characters-
Select the option
Encoding > Character Sets > Central European > Windows-1250 -
Click on the
Yesbutton of the small dialog Save Current Modification -
Choose, again, the name
Test.txtand save the file
=> So, the encoding is changed to
Windows-1250. But the encoding operation does NOT change the present contents of the file. Notepad++ just re-interprets all bytes of the file as it was a range of1-byteencoded characters, of theWindows-1250encoding => So, it’s obvious that all text seems rather incomprehensible !Thus, internally, the
Test.txtfile is still a suite of characters, each described according to theUTF-8encoding-
Close
Test.txt(Ctrl + W) -
Re-open
Test.txt(Ctrl + Shift + T)
=> The text and most of the graphical chars are correct, according to your current font, and the
UTF-8encoding is automatically chosen ;-))Remarks :
-
As this text is an UTF-8 encoded, you may “test” any other character set, using
Encoding > Character Sets > ....menu option -
You’ll notice that during this test phase, the file contents are NOT modified at all and the icon of the file remains blue !
-
At the end, after that test phase, just select the option
Encoding > Encode in UTF-8to get the original text back ;-))
Remember :
-
During an encoding operation, the present contents of the current file are, just, re-interpreted as they were encoded with this new encoding and are
nevermodified -
During a conversion operation, the present contents of the current file are
modified,so that the new contents of the file correspond to the new encoding of the same characters
In other words :
-
The option “
Encode in ...” OR “Character sets/ ...” just read the present file contents, according to the new chosen encoding, giving, generally, a new representation of the current file contents -
The option “
Convert to ...” does modify the present file contents in order to be read, in an identical way, with the new chosen encoding.
To end with :
-
As you see, encoding and conversion concepts are not easy to assimilate. So, I advice everyone to always use the
UTF-8encoding or, better, theUTF-8-BOMencoding, which is able to encode, absolutely, all the Unicode characters ! -
Of course, to fully exploit the
UTF-8files, your system must contain some fonts which cover most of Unicode characters and/or symbols ! -
For the record, as of today,
92.6%of Web pages are encoded inUTF-8-BOM;-)) Refer to the link, below :
https://w3techs.com/technologies/history_overview/character_encoding/ms/y
Best Regards,
guy038
- Create a new file (
-
Yes, yes. I know that things around. I was just surprised by given error since I was used to perfect Notepad++ code page functions. I did not realized at the moment what CP-1250 really includes.
Btw, thanks.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login