Can I tell np++ the encoding via pseudo comment?

Oliver Meyer

Hi,
Is there a way to permanently tell notepad++ which encoding to use for a specific file? Can I include something like

-*- encoding: "OEM 850" -*-

in my text file and notepad++ will set the encoding to OEM 850 after loading the file?

I have a file that displays perfectly when I set the encoding manually to OEM 850. I like notepad++ to display the file perfectly automatically. Of course, it cannot detect the encoding itself, so I am happy to tell it … but not everytime I open it.

If it is not possible, this is a feature request :) Documentation says, that for XML and HTML notepad++ uses the encoding given there. Adding <?xml version=“1.0” encoding=“IBM850”> did not help my case.

Thank you for pointers,
Oliver

PeterJones

@Oliver-Meyer said in Can I tell np++ the encoding via pseudo comment?:

Adding <?xml version=“1.0” encoding=“IBM850”> did not help my case.

I thought maybe the encoding value was wrong, because I’ve always seen it as CP850 in encoding fields instead.

But I created a dummy encode.xml,

<?xml version="1.0" encoding="IBM850" ?>
<blah>
</blah>

And it worked perfectly: when I open that file in Notepad++, it recognizes it as OEM 850. (It also works with <?xml version="1.0" encoding="CP850" ?>)

I noticed you said, <?xml version="1.0" encoding="IBM850">, which is not quite the same thing: you are missing the ? before the > … I thought that might have been the culprit, but editing encode.xml to have that typo, and it still recognizes it as OEM 850.

Also, if I have two identical files with the “right” contents I showed above, one named encode.xml and one named encode.850, when I open them both in Notepad++, it will recognize encode.xml is in OEM 850, but it uses its heuristics to determine the encoding of encode.850. So if you want Notepad++ to honor that setting, it needs to be an XML or HTML extension

For HTML, I got the autodetection to work with

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=ibm850">
</head>
<body>
<p>hello world</p>
</body>
</html>

(I had hoped maybe the EditorConfig plugin would parse and honor those; my experiments showed no, and this EditorConfig plugin issue says that it doesn’t honor the charset in the .editorconfig config file, either. But if that is ever implemented, then an .editorconfig including the charset = ibm850 setting for a specific file/mask might work. But not yet.)

BTW, you said,

Documentation says, that for XML and HTML notepad++ uses the encoding given there.

Out of curiosity, which documentation says that, and where (link please)? As the primary maintainer of the official online Notepad++ User Manual, I didn’t remember having seen that, and couldn’t see that when I searched for <?xml or encoding= … but maybe it’s phrased in a different way: after all, my search-fu is bad and my memory worse. ;-) But seriously, if it is in there, I want to rephrase things so I can find it; and if it’s in some other documentation site than the official, I’d like to know about it.

dinkumoil

@Oliver-Meyer

I wrote a plugin named AutoCodepage that may be useful for you. You can install it via built-in PluginsAdmin.

With this plugin it is possible to specify character encodings for certain filename extensions (e.g. *.bat or *.cmd files should always use code page 850). BUT with this plugin it is NOT possible to persistently set an encoding for a specific file.

Since from you posting I wasn’t able to exactly understand your use case I’m not sure if my plugin can help you.

Oliver Meyer

@dinkumoil Thanks. It is not what I am looking for, but it would eventually solve my problem. I have full control over the suffix of the file.

I am currently investigating the reply from @PeterJones, hoping to find a solution that way.

Oliver Meyer

@PeterJones Thank you very much for your detailed reply and experiments. Using the xml document declaration AND the xml suffix will resolve my case.

I did use the .txt extension in my tests and notepad++ did select XML as language and changed the formatting based on the first line, but did not honor the encoding value. That was unexpected.

Using @dinkumoil 's extension and .cp850-txt as suffix, might be a better solution, because the file is indeed not XML.

The documentation was linked in some very old stackoverflow entry.

Alan Kilborn

@Oliver-Meyer

Check back in a few days to a week; I’m thinking of a possibly better solution to this problem…

PeterJones

I earlier said,

(I had hoped maybe the EditorConfig plugin would parse and honor those; my experiments showed no, and this EditorConfig plugin issue says that it doesn’t honor the charset in the .editorconfig config file, either. But if that is ever implemented, then an .editorconfig including the charset = ibm850 setting for a specific file/mask might work. But not yet.)

After continuing the conversation in that plugin’s issue, the author didn’t comment on when/if the .editorconfig “charset” property might be implemented. However, I was directed to the not-yet-published NppFileSettings plugin, which currently handles other properties from vim-style modelines; I put in the request there to add coding/encoding to the modeline processing, and to publish the plugin. We’ll see if anything ever comes of it.

Alan Kilborn

@Alan-Kilborn said:

@Oliver-Meyer
Check back in a few days to a week; I’m thinking of a possibly better solution to this problem…

So I’ve looked into this a little bit and I believe there is a workable solution with 2 caveats that I see (at least so far):

it would have to be a scripted solution, thus you’d have to be willing to install and use the PythonScript plugin, as well as set the script up
due to the nature of the way Find in Files and Replace in Files do their work, the scripted solution would NOT work for these actions when the files in question are not already open into Notepad++ tabs (the workarounds being having the files open and using Find All in All Opened Documents and Replace All in All Opened Documents)

If those limitations are acceptable, I will “demo up” the solution for you, but I want to hear a “Let’s do it” from @Oliver-Meyer before I bother. I have no need to use non-UTF-8 encodings myself and thus I’m only interested in this solution for the sake of helping someone else, and a little bit of “let’s see if this can be done” coding fun.

Alan Kilborn

@Alan-Kilborn said in Can I tell np++ the encoding via pseudo comment?:

So I’ve looked into this a little bit…

I should also say that, aside from the small caveats, this would be a TRANSPARENT solution, meaning that once an encoding was selected for a file, that encoding would be remembered WITHOUT the need for a “pseudo comment” or anything similarly user-artificial.

Note that this only comes into play when a file is closed (removed from the active session) and then later reopened. If a file is not closed, Notepad++ will remember the set encoding across restarts (of the program or PC). Note that this is only true when the remember-current-session setting is active, but that’s the default case.

Alan Kilborn

Just to circle back (finally) on this; I ended up NOT pursuing a scripted solution to this because I didn’t hear anything back from the OP, and I don’t have great interest in this for my own use. Just FYI.