The Vietnamese language doesn't show up correctly after saving
-
@Nick-Boescht is this your private computer?
When you open config.xml, and search for stylerTheme, what does path report? -
@Ekopalypse Yes, I am working on a personal laptop
GUIConfig name=“stylerTheme” path=“C:\Users\ASUS\AppData\Roaming\Notepad++\themes\Monokai.xml” />
-
So your style has been saved!?
What is reverting? -
@Ekopalypse Seems like it’s only the Encoding that changed everytime I open it, but setting it back to UTF-8 still makes the accents show properly
-
ahh - I see, it is an xml file and this has an encoding set to windows-1258. Hmm, not sure if this can be forced to be open as utf8.
-
@Ekopalypse I am translating the text directly from the game files of S.T.A.L.K.E.R: Shadow Of Chernobyl, extracted from a db files using an unpacker from here https://www.moddb.com/mods/old-good-stalker-evolution/downloads/stalker-game-archives-unpacker
-
Could you please show us a screenshot of this text file open in your copy of Notepad++, with enough in the screenshot so we can see any header information (like an XML encoding tag), and the full Notepad++ status bar (so it shows file type, length, encoding, EOL format, etc). It’d be really nice if it also showed us what the example text you pasted looks like on your machine, so we can know what you think is “wrong” about it.
when asking us for help, especially in discussions that have gone on this long, it’s often best to err on the side of providing too much information rather than too little.
-
one thing you can try is to use this setting in addition.
-
For me it seems the same. If I open a xml with an encoding tag
then it overwrites my utf8 setting.Now the question is, @Nick-Boescht - do you want it to be stored in utf8? Or is, indeed, Windows1258 the correct encoding?
-
@PeterJones This is what I see and I am using Ekopalypse’s settings
-
@Ekopalypse Yep, Windows1258 is the correct encoding. This is what happens when I change it to the original encoding Window1251, damn now I have to type it again
-
so the issue is, that you saved it with utf8 encoding,
but you should have saved it with the encoding that opened it. -
I would give it a try to use “Convert to ANSI”.
-
@Ekopalypse Okay, I’ll do it and snip a pic for you if anything odd happens
Everything went great ! Thanks a lot man.
-
@Nick-Boescht said in The Vietnamese language doesn't show up correctly after saving:
Everything went great!
I assume this means the problem is solved for you. At least for now.
However, I am going to give future advice on this problem in case there are others with similar difficulties, or if you have more problems. As such, I will make one more comment on an earlier statement, then outline what I understand so far. If I have stuff wrong, you’ll have to correct me either now or in whenever you need our help again.
This is what I see
But it’s not all I asked for. You did not show the status bar. Which is still making us guess. Or assume that you have correctly understood what we’ve been asking
If you give a detailed, step by step explanation of exactly what you are doing, you can save a lot of the back-and-forth we’ve had to go through to get to this point.
The process I understand you are doing:
- use some extractor tool to extract XML from a game’s binary db file
- open the xml in Notepad++
- the file automatically opens in encoding _____ : I think it’s either opening as Win-1251 or some other western-european encoding instead of Win-1258… but I’m not sure, because you’ve been too vague) – but characters are not showing up right. Or maybe, if you’ve really set everything like @Ekopalypse showed, it was opening as UTF-8.
- You manually selected Encoding > ____ (I assume Win-1258), and it appeared correct. So you saved.
- When you exit and reload, it once again comes up in the same encoding as in step#3.
If this is not your process, you’ll need to correct it before we can give you more help.
Back to the problem at hand: I don’t actually do a lot of encoding-based text editing (except when I help others in this forum), as I’m in simple circumstances. But in what I’ve picked up over the years: if at all possible, it’s best to use UTF-8 or another unicode encoding – that’s been the right way of doing things since the 90s when Unicode was invented, and should have been more encouraged since the turn of the century once UTF-8 started gaining in popularity. The fact that any modern tool (game, what have you) is still using old 256-codepoint Win-#### encodings shows the complete lack of understanding on the part of those developers.
Unfortunately, without a unicode encoding, then Notepad++ is left with two options: guess the encoding based on the frequency of particular bytes with values between 128 and 255, or use a default setting. The guessing is often wrong, and using a default setting can make things even worse (especially if you’re dealing with a mix). This is because Windows (based on 1980’s pre-unicode DOS) didn’t store encoding information or other file meta-data in the directory table or any other file meta-data location, so it was up to applications to decide what to do with any particular sequence of bytes found in a disk file.
In this situation, I would err on the side of use-the-default-setting, then when it’s wrong, manually change the encoding – if you are frequently going to use Win-1258, then use Settings > Shortcut Mapper, set
filter: 1258
, and assign a keyboard shortcut to Win-1258, so from then on, you can just hit that keyboard shortcut to set that encoding.But actually, I might try the experiment of changing the xml encoding line to
<?xml version="1.0" encoding="utf-8">
, and do an Encoding > Convert to UTF-8 on the file. That way, when Notepad++ applies the reasonable default of UTF-8 to 8bit files, it will Do What You Mean. After saving it as UTF-8, it should maintain the right encoding from then on in Notepad++… and you should try to see if the game will accept an XML config file encoded in UTF-8. If not, complain to the game company that they don’t care about non-western languages, and see if you can convince them to accept utf-8 and other unicode-based encodings.