How to properly display French characters in HTML?

Pour Svdeux

Hello, I sometimes save Thunderbird emails as HTML files to easily share them with colleagues. The problem is that most of the time special French characters (e.g., é, è, à, etc…) are wrongly displayed, i.e., as Ã©, Ã¨, Ã, etc…
As I didn’t find a way to solve the issue in Thunderbird, I was wondering whether there would be an easy and quick way to do the conversion in Notepad++.
As I just discovered Notepad++, and not being familiar with it, any help would be much appreciated.
Thank you
Jean-Marc

PeterJones

@Pour-Svdeux ,

I have Thunderbird, so I tried to replicate; I started a new email, pasted in é, è, à,, and did Thunderbird > File > Save As > File to thunderbird.html. When I open that HTML in Notepad++, it understands the file as UTF-8, so the characters aren’t displayed wrong. And the file has the HTML meta encoding tag saying it should be UTF-8, so any HTML viewer (web browser) that you open the file in should interpret the bytes as UTF-8, so should be right.

The only time you would see Ã©, Ã¨, Ã would be if whatever was loading the file didn’t know that the file was UTF-8, and was reading the two- and three-byte characters as individual bytes of ANSI text rather than as UTF-8 multi-byte characters.

If I force Notepad++ to interpret thunderbird.html as ANSI (using Encoding > ANSI, not using Encoding > Convert to ANSI), then I see those characters.

If you are in that state in Notepad++, then you can use Encoding > UTF-8 to get your Notepad++ to interpret the bytes correctly (back to my original image).

Once you have Notepad++ reading it correctly (whether because it did “the right thing” right out of the box, or because you forced it to interpret the bytes in the file correctly), then you can use Encoding > Convert to UTF-8-BOM to get Notepad++ to add the Unicode BOM (U+FEFF) at the begnining of the file; then save the file. Technically, UTF-8 doesn’t require the BOM, but when most editors or viewers see the three-byte sequence (0xEF 0xBB 0xBF) that makes up the UTF-8-encoding for the BOM, then the editor/viewer can be reasonably certain that the file is UTF-8 and will interpret it that way. When you then give the file to your colleagues, it is more likely that they will be able to read the file correctly.

Scott Nielson

@PeterJones What about other languages ? Will Hindi, Cyrillic etc. also be displayed?

Pour Svdeux

This post is deleted!

Pour Svdeux

@PeterJones said in How to properly display French characters in HTML?:

Convert to UTF-8-BOM

Thanks a lot Peter, perfect solution!

The problem was that I use an old (61) version of Firefox (because I want to have the bookmarks Description Property, which disappeared after version 61, but that’s another story), with which special characters appear scrambled (they can be restored to normal with Alt -> View -> Text Encoding -> Unicode [set to Western by default]). Newer versions do interpret the UTF_8 correctly, and so does Edge. However, my iphone does not (I do not know about Safari on a computer). Therefore, from now on I will add the UTF-8 BOM to my HTML email files, in case other people may have the same issue with their browser or telephone…
(btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

Alan Kilborn

@Pour-Svdeux said in How to properly display French characters in HTML?:

(btw, I do not see any difference in the source code with and without the BOM; is the code hidden?)

I presume you mean “is it hidden in Notepad++”.
That answer is yes, you will not see anything “special” in the text of your file.
However, if you open the file in a hex editor, you will see the BOM at the start of file, because to a hex editor, these bytes are just bytes:

Imgur

Pour Svdeux

@Alan-Kilborn
Got it, thanks!