Code page weird function; CP/Windows-1250
-
I see function of code page that I don’t understand. Try this:
- Create new file.
- Write down some glyphs.
- Save file as arbitrary file with any extension.
- Change code page to Windows-1250 (Central European).
- Delete text and paste ◘○•🔶 ěščřžýáíéúůťďňÓ.
- Save, close and open file.
- You’ll get •0•?? ěščřžýáíéúůťďňÓ.
- Specify code page again.
- Still •0•?? ěščřžýáíéúůťďňÓ.
Why glyphs are shown well before save if after re-open they are not?
-
Hello, @uzivatel919 and All
Before trying to explain this N++ behavior, we need additional information :
-
At step
1
, when you create your new file ( File > New ) or with theCtrl+ N
shortcut, what’s the current encoding of thisnew #
file, before doing anything else ? I guess it should beWindows-1250
. Am I right about it ? -
At step
2
, before saving file, did you type in pureASCII
glyphs, only ( with code in range[\x{0020}-\x{007f}]
) or did you add some accentuated characters ( as, for instance ,č
ř
orť
) -
At step
8
just to confirm : you meant “SpecifyWindows-1250
code-page, again”, didn’t you ?
Best Regards
guy038
-
-
- Use Crtl + N as well as File/New. For me encoding is set to UTF-8 by default.
- Use arbitrary glyphs. It is just not allowed to save empty file.
- Yes, exactly choose Windows-1250 CP again.
-
Hi, @uzivatel919 and All
First, note that the tests, below, have been performed, whatever the option
Autodetect character encoding
was checked or not, in the dialogSettings > Preferences... > MISC
Your method can be simplified to this first scenario, below :
- Create a new file (
Crtl + N
)
Note that the present encoding is
UTF-8
-
Select the option
Encoding > Character Sets > Central European > Windows-1250
-
Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This text is encoded with the
Windows-1250
encoding using1
byte to describe each character Note that some of the graphical characters, which do not belong to theWindows-1250
encoding, are replaced, of course, with a question mark?
!https://en.wikipedia.org/wiki/Windows-1250
-
Save the file, with, for instance, the name
Test.txt
-
Close
Test.txt
(Ctrl + W
) -
Re-open
Test.txt
(Ctrl + Shift + T
)
=> The letters of the text are correct but, as expected, some graphical characters are replaced with a
?
. Moreover, Notepad++ detects the ANSI encoding , which is, indeed, quite equivalent to theWindow-1250
encoding, used by your system, for all NON-Unicode files- Select, again, the option
Encoding > Character Sets > Central European > Windows-1250
=> As the encoding process just re-interprets all the
1
-byte encoded characters, nothing has changed, asWindows-1250 encoding
≡≡ANSI encoding
. Note, that, as I’m French, on my system, for instance, there is the equivalenceWindows-1252 encoding
≡≡ANSI encoding
Now, let’s imagine the second scenario, below :
- Create a new file (
Crtl + N
)
Note that the present encoding is
UTF-8
- Paste the text
◘○•🔶 ěščřžýáíéúůťďňÓ.
=> This time, as we haven’t change current encoding yet, this text is, then, encoded with the
UTF-8
encoding, using between1
to4
bytes to describe all the characters-
Select the option
Encoding > Character Sets > Central European > Windows-1250
-
Click on the
Yes
button of the small dialog Save Current Modification -
Choose, again, the name
Test.txt
and save the file
=> So, the encoding is changed to
Windows-1250
. But the encoding operation does NOT change the present contents of the file. Notepad++ just re-interprets all bytes of the file as it was a range of1-byte
encoded characters, of theWindows-1250
encoding => So, it’s obvious that all text seems rather incomprehensible !Thus, internally, the
Test.txt
file is still a suite of characters, each described according to theUTF-8
encoding-
Close
Test.txt
(Ctrl + W
) -
Re-open
Test.txt
(Ctrl + Shift + T
)
=> The text and most of the graphical chars are correct, according to your current font, and the
UTF-8
encoding is automatically chosen ;-))Remarks :
-
As this text is an UTF-8 encoded, you may “test” any other character set, using
Encoding > Character Sets > ....
menu option -
You’ll notice that during this test phase, the file contents are NOT modified at all and the icon of the file remains blue !
-
At the end, after that test phase, just select the option
Encoding > Encode in UTF-8
to get the original text back ;-))
Remember :
-
During an encoding operation, the present contents of the current file are, just, re-interpreted as they were encoded with this new encoding and are
never
modified -
During a conversion operation, the present contents of the current file are
modified,
so that the new contents of the file correspond to the new encoding of the same characters
In other words :
-
The option “
Encode in ...
” OR “Character sets/ ...
” just read the present file contents, according to the new chosen encoding, giving, generally, a new representation of the current file contents -
The option “
Convert to ...
” does modify the present file contents in order to be read, in an identical way, with the new chosen encoding.
To end with :
-
As you see, encoding and conversion concepts are not easy to assimilate. So, I advice everyone to always use the
UTF-8
encoding or, better, theUTF-8-BOM
encoding, which is able to encode, absolutely, all the Unicode characters ! -
Of course, to fully exploit the
UTF-8
files, your system must contain some fonts which cover most of Unicode characters and/or symbols ! -
For the record, as of today,
92.6%
of Web pages are encoded inUTF-8-BOM
;-)) Refer to the link, below :
https://w3techs.com/technologies/history_overview/character_encoding/ms/y
Best Regards,
guy038
- Create a new file (
-
Yes, yes. I know that things around. I was just surprised by given error since I was used to perfect Notepad++ code page functions. I did not realized at the moment what CP-1250 really includes.
Btw, thanks.