• Login
Community
  • Login

How to properly display French characters in HTML?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 4 Posters 4.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    Pour Svdeux
    last edited by Aug 6, 2021, 7:25 AM

    Hello, I sometimes save Thunderbird emails as HTML files to easily share them with colleagues. The problem is that most of the time special French characters (e.g., é, è, à, etc…) are wrongly displayed, i.e., as é, è, Ã, etc…
    As I didn’t find a way to solve the issue in Thunderbird, I was wondering whether there would be an easy and quick way to do the conversion in Notepad++.
    As I just discovered Notepad++, and not being familiar with it, any help would be much appreciated.
    Thank you
    Jean-Marc

    P 1 Reply Last reply Aug 6, 2021, 9:18 PM Reply Quote 0
    • P
      PeterJones @Pour Svdeux
      last edited by Aug 6, 2021, 9:18 PM

      @Pour-Svdeux ,

      I have Thunderbird, so I tried to replicate; I started a new email, pasted in é, è, à,, and did Thunderbird > File > Save As > File to thunderbird.html. When I open that HTML in Notepad++, it understands the file as UTF-8, so the characters aren’t displayed wrong. And the file has the HTML meta encoding tag saying it should be UTF-8, so any HTML viewer (web browser) that you open the file in should interpret the bytes as UTF-8, so should be right.

      d10e0e83-ffb8-4715-9213-f6a97d3216ab-image.png

      The only time you would see é, è, à would be if whatever was loading the file didn’t know that the file was UTF-8, and was reading the two- and three-byte characters as individual bytes of ANSI text rather than as UTF-8 multi-byte characters.

      If I force Notepad++ to interpret thunderbird.html as ANSI (using Encoding > ANSI, not using Encoding > Convert to ANSI), then I see those characters.

      378ed93c-2c93-4fb0-9fd5-3c7c60df5dc4-image.png

      If you are in that state in Notepad++, then you can use Encoding > UTF-8 to get your Notepad++ to interpret the bytes correctly (back to my original image).

      Once you have Notepad++ reading it correctly (whether because it did “the right thing” right out of the box, or because you forced it to interpret the bytes in the file correctly), then you can use Encoding > Convert to UTF-8-BOM to get Notepad++ to add the Unicode BOM (U+FEFF) at the begnining of the file; then save the file. Technically, UTF-8 doesn’t require the BOM, but when most editors or viewers see the three-byte sequence (0xEF 0xBB 0xBF) that makes up the UTF-8-encoding for the BOM, then the editor/viewer can be reasonably certain that the file is UTF-8 and will interpret it that way. When you then give the file to your colleagues, it is more likely that they will be able to read the file correctly.

      S P 3 Replies Last reply Aug 7, 2021, 4:02 AM Reply Quote 1
      • S
        Scott Nielson @PeterJones
        last edited by Aug 7, 2021, 4:02 AM

        @PeterJones What about other languages ? Will Hindi, Cyrillic etc. also be displayed?

        1 Reply Last reply Reply Quote 0
        • P
          Pour Svdeux @PeterJones
          last edited by Aug 8, 2021, 12:31 PM

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • P
            Pour Svdeux @PeterJones
            last edited by Aug 8, 2021, 12:50 PM

            @PeterJones said in How to properly display French characters in HTML?:

            Convert to UTF-8-BOM

            Thanks a lot Peter, perfect solution!

            The problem was that I use an old (61) version of Firefox (because I want to have the bookmarks Description Property, which disappeared after version 61, but that’s another story), with which special characters appear scrambled (they can be restored to normal with Alt -> View -> Text Encoding -> Unicode [set to Western by default]). Newer versions do interpret the UTF_8 correctly, and so does Edge. However, my iphone does not (I do not know about Safari on a computer). Therefore, from now on I will add the UTF-8 BOM to my HTML email files, in case other people may have the same issue with their browser or telephone…
            (btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

            A 1 Reply Last reply Aug 8, 2021, 1:08 PM Reply Quote 1
            • A
              Alan Kilborn @Pour Svdeux
              last edited by Aug 8, 2021, 1:08 PM

              @Pour-Svdeux said in How to properly display French characters in HTML?:

              (btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

              I presume you mean “is it hidden in Notepad++”.
              That answer is yes, you will not see anything “special” in the text of your file.
              However, if you open the file in a hex editor, you will see the BOM at the start of file, because to a hex editor, these bytes are just bytes:

              Imgur

              P 1 Reply Last reply Aug 8, 2021, 1:18 PM Reply Quote 1
              • P
                Pour Svdeux @Alan Kilborn
                last edited by Aug 8, 2021, 1:18 PM

                @Alan-Kilborn
                Got it, thanks!

                1 Reply Last reply Reply Quote 1
                3 out of 7
                • First post
                  3/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors