Community
    • Login

    shortcuts.xml With UTF-8 BOM Fails to Load

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 5 Posters 601 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Bob Smith 0B
      Bob Smith 0
      last edited by

      From within N++, I can edit shortcuts.xml, and it shows the encoding as UTF-8 w/o BOM. When I change its encoding to have a UTF-8 BOM, save it, exit, and restart N++, the shortcuts.xml file appears to be ignored.

      My preference is to save the file with a BOM so my little utilities can just look at the first few bytes and use that to determine what to do with the file. I presume that when N++ loads a file such as shortcuts.xml, it scans the file to determine the encoding (in this case UTF-8 w/o BOM). My little utilities can’t afford to scan the file, so I rely on the presence/absence of a BOM to give me direction.

      What am I missing? Does N++ actually allow that file to have a BOM, but I’m just doing it wrong?

      PeterJonesP CoisesC mkupperM 3 Replies Last reply Reply Quote 1
      • PeterJonesP
        PeterJones @Bob Smith 0
        last edited by

        @Bob-Smith-0 ,

        Then your utilities are not implementing UTF-8. Because, UTF-8 does not require BOM. If your utilities do, they are fundamentally flawed.

        Moreover, when dealing with another app’s internal configuration files, you have to play by their rules, not your own.

        When dealing with XML, you don’t have to scan very far… Just read the XML declaration assuming it’s ASCII, and only change your interpretation if the encoding listed in the declaration is something else. That’s the way XML works.

        Alan KilbornA CoisesC 2 Replies Last reply Reply Quote 4
        • Alan KilbornA
          Alan Kilborn @PeterJones
          last edited by Alan Kilborn

          @PeterJones

          Also, to quote Wikipedia: The Unicode Standard permits the BOM in UTF-8, but does not … recommend its use.

          I’d say it isn’t the smartest thing to write any code that depends upon BOM being there (as is hinted by OP’s “so my little utilities can just look at the first few bytes”).

          Hmm, yea, guess Peter already said this. :-)

          1 Reply Last reply Reply Quote 2
          • CoisesC
            Coises @PeterJones
            last edited by

            @PeterJones said in shortcuts.xml With UTF-8 BOM Fails to Load:

            Then your utilities are not implementing UTF-8. Because, UTF-8 does not require BOM. If your utilities do, they are fundamentally flawed.

            In practice, on Windows, all methods of character set detection are fundamentally flawed:

            https://devblogs.microsoft.com/oldnewthing/20070417-00/?p=27223

            Adding a byte order mark at the beginning of a UTF-8 file is against recommendation, but it greatly increases the chances that a character detection algorithm in a Windows program will assume UTF-8, rather than ANSI using whatever codepage is default on that system.

            PeterJonesP 1 Reply Last reply Reply Quote 1
            • CoisesC
              Coises @Bob Smith 0
              last edited by

              @Bob-Smith-0 said in shortcuts.xml With UTF-8 BOM Fails to Load:

              My preference is to save the file with a BOM so my little utilities can just look at the first few bytes and use that to determine what to do with the file. I presume that when N++ loads a file such as shortcuts.xml, it scans the file to determine the encoding (in this case UTF-8 w/o BOM). My little utilities can’t afford to scan the file, so I rely on the presence/absence of a BOM to give me direction.

              Do you in fact have non-ASCII characters in shortcuts.xml? If not, it shouldn’t matter.

              1 Reply Last reply Reply Quote 1
              • PeterJonesP
                PeterJones @Coises
                last edited by

                @Coises said in shortcuts.xml With UTF-8 BOM Fails to Load:

                Adding a byte order mark at the beginning of a UTF-8 file … greatly increases the chances

                I don’t disagree. And I often use BOM in UTF-8. Personally, I wish that the Unicode Consortium had defined UTF-8 to always have a BOM, just like UTF-16 always requires a BOM. It would have simplified things greatly, and made interoperability even easier.

                But I cannot expect that every other application will also do that. And if Notepad++ has defined its config file XML to be UTF-8-without-BOM, I cannot force it to accept my preference… and @Bob-Smith-0’s utilities should not enforce Bob’s preferences on Notepad++'s config files, either.

                1 Reply Last reply Reply Quote 0
                • mkupperM
                  mkupper @Bob Smith 0
                  last edited by

                  <BOM/>@Bob-Smith-0, I am astonished that you did not preface your message with a BOM!

                  As you seem to have already recognized the issue I suspect it would be enough to add two more tools to your bag, one that inserts a BOM, and another that removes it You can then do your desired magic on Notepad++'s config.xml.<EOM/>

                  1 Reply Last reply Reply Quote 3
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors