shortcuts.xml With UTF-8 BOM Fails to Load

Bob Smith 0

From within N++, I can edit shortcuts.xml, and it shows the encoding as UTF-8 w/o BOM. When I change its encoding to have a UTF-8 BOM, save it, exit, and restart N++, the shortcuts.xml file appears to be ignored.

My preference is to save the file with a BOM so my little utilities can just look at the first few bytes and use that to determine what to do with the file. I presume that when N++ loads a file such as shortcuts.xml, it scans the file to determine the encoding (in this case UTF-8 w/o BOM). My little utilities can’t afford to scan the file, so I rely on the presence/absence of a BOM to give me direction.

What am I missing? Does N++ actually allow that file to have a BOM, but I’m just doing it wrong?

PeterJones

@Bob-Smith-0 ,

Then your utilities are not implementing UTF-8. Because, UTF-8 does not require BOM. If your utilities do, they are fundamentally flawed.

Moreover, when dealing with another app’s internal configuration files, you have to play by their rules, not your own.

When dealing with XML, you don’t have to scan very far… Just read the XML declaration assuming it’s ASCII, and only change your interpretation if the encoding listed in the declaration is something else. That’s the way XML works.

Alan Kilborn

@PeterJones

Also, to quote Wikipedia: The Unicode Standard permits the BOM in UTF-8, but does not … recommend its use.

I’d say it isn’t the smartest thing to write any code that depends upon BOM being there (as is hinted by OP’s “so my little utilities can just look at the first few bytes”).

Hmm, yea, guess Peter already said this. :-)

Coises

@PeterJones said in shortcuts.xml With UTF-8 BOM Fails to Load:

Then your utilities are not implementing UTF-8. Because, UTF-8 does not require BOM. If your utilities do, they are fundamentally flawed.

In practice, on Windows, all methods of character set detection are fundamentally flawed:

https://devblogs.microsoft.com/oldnewthing/20070417-00/?p=27223

Adding a byte order mark at the beginning of a UTF-8 file is against recommendation, but it greatly increases the chances that a character detection algorithm in a Windows program will assume UTF-8, rather than ANSI using whatever codepage is default on that system.

Coises

@Bob-Smith-0 said in shortcuts.xml With UTF-8 BOM Fails to Load:

My preference is to save the file with a BOM so my little utilities can just look at the first few bytes and use that to determine what to do with the file. I presume that when N++ loads a file such as shortcuts.xml, it scans the file to determine the encoding (in this case UTF-8 w/o BOM). My little utilities can’t afford to scan the file, so I rely on the presence/absence of a BOM to give me direction.

Do you in fact have non-ASCII characters in shortcuts.xml? If not, it shouldn’t matter.

PeterJones

@Coises said in shortcuts.xml With UTF-8 BOM Fails to Load:

Adding a byte order mark at the beginning of a UTF-8 file … greatly increases the chances

I don’t disagree. And I often use BOM in UTF-8. Personally, I wish that the Unicode Consortium had defined UTF-8 to always have a BOM, just like UTF-16 always requires a BOM. It would have simplified things greatly, and made interoperability even easier.

But I cannot expect that every other application will also do that. And if Notepad++ has defined its config file XML to be UTF-8-without-BOM, I cannot force it to accept my preference… and @Bob-Smith-0’s utilities should not enforce Bob’s preferences on Notepad++'s config files, either.

mkupper

<BOM/>@Bob-Smith-0, I am astonished that you did not preface your message with a BOM!

As you seem to have already recognized the issue I suspect it would be enough to add two more tools to your bag, one that inserts a BOM, and another that removes it You can then do your desired magic on Notepad++'s config.xml.<EOM/>