Html files with Charset = iso: I don't wanna see the dicritics (accent marks ) with bold
-
I made a text parsing with Python from an old website to a new website. The old website has
charset=iso-8859-1
and the new one hascharset="utf-8"
What is the best solution as not to see bold letters diacriitics (accent marks )? I try to change the
charset="utf-8
tocharset="utf-8"
and viceversa. The same thing. Diacritics are further highlighted.This is the code of the text in the image:
<p class="text_obisnuit">Într-o oarecare măsură, fără să întămpin vreo dificultate, dacă mi-aş măsura capacităţile de inventator prin experienţa confruntării cu moartea, cu privire la modul de a schimba o constantă a reprezentării cuceririi ştiinţei pe o traiectorie axiomatică unitară, atunci probabil m-aş ciocni de o latură trecută cu vederea, o excepţie, ceva ce nu aş fi gândit că pot face. O fi doar o problemă de credinţă şi de for interior?</p>
<p class="text_obisnuit">Totuşi, sunt un inventator-autodidact, şi pe această cale sunt îndreptăţit să accept modul de încadrare a invenţiilor mele în categoria celor ce nu se rostesc, dar se închipuiesc. Fără excepţie, lucrul cu materia se poate transforma într-o relaţie desfăşurată între ceea ce proiectez ca intenție, și efectivitatea protecţiei pe care natura mi-o asigură cu un singur scop: pentru a-i lărgi valenţele de “miracol” în afara materiei vizibile.</p>
What should I do ?
-
This forum is for Notepad++ questions. Your question has nothing to do with Notepad++: the answer will be the same whether you are using Notepad++, MS notepad.exe, or
copy con
. If you think “I am typing this with Notepad++, so it should be on topic,” then you haven’t read our FAQ which explains why that is a false interpretation, using the example of baking cookies.But I’ll give you a hint: on my machine, that HTML doesn’t display with bold characters:.
(My guess is that it’s a font issue on your PC.)
Further, the snippet you showed has no characters outside of the ASCII range, so it doesn’t matter whether you have setcharset="iso-8859-1"
orcharset="utf-8"
. If you do understand why having no characters outside of the ASCII range necessarily implies the “so…” part of my previous sentence, you need to go find a better tutorial on character encodings and HTML, because you obviously don’t understand the technology you are working with sufficiently. If you still don’t understand, you will have to find a forum that’s about HTML and web formatting, not one for a particular editor, and ask there. The Notepad++ Community Forum is not the right place for further discussion on this.You can even use Notepad++ to prove to yourself that it doesn’t matter which charset you pick, given the data you showed:
- FIND =
[^\x20-\x7e\r\n]
– this will find any character that is not between ASCII 32 (0x20) and ASCII 126 (0x7E), or not a CR or LF newline character. - COUNT
In your snippet, it finds 0 characters outside of that range. That means there is nothing in that snippet which is not ASCII, and thus nothing that will be different between ISO-8859-1 and UTF-8.
OTOH, if I add the characters
ÀÁÅËË
and do the COUNT again, it now counts 5 matches in the file, for those five characters. - FIND =