Html files with Charset = iso: I don't wanna see the dicritics (accent marks ) with bold

Hellena Crainicu

I made a text parsing with Python from an old website to a new website. The old website has charset=iso-8859-1 and the new one has charset="utf-8"

What is the best solution as not to see bold letters diacriitics (accent marks )? I try to change the charset="utf-8 to charset="utf-8" and viceversa. The same thing. Diacritics are further highlighted.

This is the code of the text in the image:

<p class="text_obisnuit">Într-o oarecare măsură, fără să întămpin vreo dificultate, dacă mi-aş măsura capacităţile de inventator prin experienţa confruntării cu moartea, cu privire la modul de a schimba o constantă a reprezentării cuceririi ştiinţei pe o traiectorie axiomatică unitară, atunci probabil m-aş ciocni de o latură trecută cu vederea, o excepţie, ceva ce nu aş fi gândit că pot face. O fi doar o problemă de credinţă şi de for interior?</p>

<p class="text_obisnuit">Totuşi, sunt un inventator-autodidact, şi pe această cale sunt îndreptăţit să accept modul de încadrare a invenţiilor mele în categoria celor ce nu se rostesc, dar se închipuiesc. Fără excepţie, lucrul cu materia se poate transforma într-o relaţie desfăşurată între ceea ce proiectez ca intenție, și efectivitatea protecţiei pe care natura mi-o asigură cu un singur scop: pentru a-i lărgi valenţele de “miracol” în afara materiei vizibile.</p>

What should I do ?

PeterJones

@hellena-crainicu

This forum is for Notepad++ questions. Your question has nothing to do with Notepad++: the answer will be the same whether you are using Notepad++, MS notepad.exe, or copy con. If you think “I am typing this with Notepad++, so it should be on topic,” then you haven’t read our FAQ which explains why that is a false interpretation, using the example of baking cookies.

But I’ll give you a hint: on my machine, that HTML doesn’t display with bold characters:.
(My guess is that it’s a font issue on your PC.)
Further, the snippet you showed has no characters outside of the ASCII range, so it doesn’t matter whether you have set charset="iso-8859-1" or charset="utf-8". If you do understand why having no characters outside of the ASCII range necessarily implies the “so…” part of my previous sentence, you need to go find a better tutorial on character encodings and HTML, because you obviously don’t understand the technology you are working with sufficiently. If you still don’t understand, you will have to find a forum that’s about HTML and web formatting, not one for a particular editor, and ask there. The Notepad++ Community Forum is not the right place for further discussion on this.

You can even use Notepad++ to prove to yourself that it doesn’t matter which charset you pick, given the data you showed:

FIND = [^\x20-\x7e\r\n] – this will find any character that is not between ASCII 32 (0x20) and ASCII 126 (0x7E), or not a CR or LF newline character.
COUNT

In your snippet, it finds 0 characters outside of that range. That means there is nothing in that snippet which is not ASCII, and thus nothing that will be different between ISO-8859-1 and UTF-8.

OTOH, if I add the characters ÀÁÅËË and do the COUNT again, it now counts 5 matches in the file, for those five characters.