Community
    • Login

    How to properly display French characters in HTML?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 4.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Pour SvdeuxP
      Pour Svdeux
      last edited by

      Hello, I sometimes save Thunderbird emails as HTML files to easily share them with colleagues. The problem is that most of the time special French characters (e.g., é, è, à, etc…) are wrongly displayed, i.e., as é, è, Ã, etc…
      As I didn’t find a way to solve the issue in Thunderbird, I was wondering whether there would be an easy and quick way to do the conversion in Notepad++.
      As I just discovered Notepad++, and not being familiar with it, any help would be much appreciated.
      Thank you
      Jean-Marc

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Pour Svdeux
        last edited by

        @Pour-Svdeux ,

        I have Thunderbird, so I tried to replicate; I started a new email, pasted in é, è, à,, and did Thunderbird > File > Save As > File to thunderbird.html. When I open that HTML in Notepad++, it understands the file as UTF-8, so the characters aren’t displayed wrong. And the file has the HTML meta encoding tag saying it should be UTF-8, so any HTML viewer (web browser) that you open the file in should interpret the bytes as UTF-8, so should be right.

        d10e0e83-ffb8-4715-9213-f6a97d3216ab-image.png

        The only time you would see é, è, à would be if whatever was loading the file didn’t know that the file was UTF-8, and was reading the two- and three-byte characters as individual bytes of ANSI text rather than as UTF-8 multi-byte characters.

        If I force Notepad++ to interpret thunderbird.html as ANSI (using Encoding > ANSI, not using Encoding > Convert to ANSI), then I see those characters.

        378ed93c-2c93-4fb0-9fd5-3c7c60df5dc4-image.png

        If you are in that state in Notepad++, then you can use Encoding > UTF-8 to get your Notepad++ to interpret the bytes correctly (back to my original image).

        Once you have Notepad++ reading it correctly (whether because it did “the right thing” right out of the box, or because you forced it to interpret the bytes in the file correctly), then you can use Encoding > Convert to UTF-8-BOM to get Notepad++ to add the Unicode BOM (U+FEFF) at the begnining of the file; then save the file. Technically, UTF-8 doesn’t require the BOM, but when most editors or viewers see the three-byte sequence (0xEF 0xBB 0xBF) that makes up the UTF-8-encoding for the BOM, then the editor/viewer can be reasonably certain that the file is UTF-8 and will interpret it that way. When you then give the file to your colleagues, it is more likely that they will be able to read the file correctly.

        Scott NielsonS Pour SvdeuxP 3 Replies Last reply Reply Quote 1
        • Scott NielsonS
          Scott Nielson @PeterJones
          last edited by

          @PeterJones What about other languages ? Will Hindi, Cyrillic etc. also be displayed?

          1 Reply Last reply Reply Quote 0
          • Pour SvdeuxP
            Pour Svdeux @PeterJones
            last edited by

            This post is deleted!
            1 Reply Last reply Reply Quote 0
            • Pour SvdeuxP
              Pour Svdeux @PeterJones
              last edited by

              @PeterJones said in How to properly display French characters in HTML?:

              Convert to UTF-8-BOM

              Thanks a lot Peter, perfect solution!

              The problem was that I use an old (61) version of Firefox (because I want to have the bookmarks Description Property, which disappeared after version 61, but that’s another story), with which special characters appear scrambled (they can be restored to normal with Alt -> View -> Text Encoding -> Unicode [set to Western by default]). Newer versions do interpret the UTF_8 correctly, and so does Edge. However, my iphone does not (I do not know about Safari on a computer). Therefore, from now on I will add the UTF-8 BOM to my HTML email files, in case other people may have the same issue with their browser or telephone…
              (btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Pour Svdeux
                last edited by

                @Pour-Svdeux said in How to properly display French characters in HTML?:

                (btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

                I presume you mean “is it hidden in Notepad++”.
                That answer is yes, you will not see anything “special” in the text of your file.
                However, if you open the file in a hex editor, you will see the BOM at the start of file, because to a hex editor, these bytes are just bytes:

                Imgur

                Pour SvdeuxP 1 Reply Last reply Reply Quote 1
                • Pour SvdeuxP
                  Pour Svdeux @Alan Kilborn
                  last edited by

                  @Alan-Kilborn
                  Got it, thanks!

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors