• Login
Community
  • Login

Html files with Charset = iso: I don't wanna see the dicritics (accent marks ) with bold

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
2 Posts 2 Posters 379 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H
    Hellena Crainicu
    last edited by Feb 9, 2022, 1:43 PM

    I made a text parsing with Python from an old website to a new website. The old website has charset=iso-8859-1 and the new one has charset="utf-8"

    What is the best solution as not to see bold letters diacriitics (accent marks )? I try to change the charset="utf-8 to charset="utf-8" and viceversa. The same thing. Diacritics are further highlighted.

    enter image description here

    This is the code of the text in the image:

    <p class="text_obisnuit">&Icirc;ntr-o oarecare m&#259;sur&#259;, f&#259;r&#259; s&#259; &icirc;nt&#259;mpin vreo dificultate, dac&#259; mi-a&#351; m&#259;sura capacit&#259;&#355;ile de inventator prin experien&#355;a confrunt&#259;rii cu moartea, cu privire la modul de a schimba o constant&#259; a reprezent&#259;rii cuceririi &#351;tiin&#355;ei pe o traiectorie axiomatic&#259; unitar&#259;, atunci probabil m-a&#351; ciocni de o latur&#259; trecut&#259; cu vederea, o excep&#355;ie, ceva ce nu a&#351; fi g&acirc;ndit c&#259; pot face. O fi doar o problem&#259; de credin&#355;&#259; &#351;i de for interior?</p>

    <p class="text_obisnuit">Totu&#351;i, sunt un inventator-autodidact, &#351;i pe aceast&#259; cale sunt &icirc;ndrept&#259;&#355;it s&#259; accept modul de &icirc;ncadrare a inven&#355;iilor mele &icirc;n categoria celor ce nu se rostesc, dar se &icirc;nchipuiesc. F&#259;r&#259; excep&#355;ie, lucrul cu materia se poate transforma &icirc;ntr-o rela&#355;ie desf&#259;&#351;urat&#259; &icirc;ntre ceea ce proiectez ca inten&#539;ie, &#537;i efectivitatea protec&#355;iei pe care natura mi-o asigur&#259; cu un singur scop: pentru a-i l&#259;rgi valen&#355;ele de &ldquo;miracol&rdquo; &icirc;n afara materiei vizibile.</p>

    What should I do ?

    P 1 Reply Last reply Feb 9, 2022, 2:03 PM Reply Quote 0
    • P
      PeterJones @Hellena Crainicu
      last edited by PeterJones Feb 9, 2022, 2:17 PM Feb 9, 2022, 2:03 PM

      @hellena-crainicu

      This forum is for Notepad++ questions. Your question has nothing to do with Notepad++: the answer will be the same whether you are using Notepad++, MS notepad.exe, or copy con. If you think “I am typing this with Notepad++, so it should be on topic,” then you haven’t read our FAQ which explains why that is a false interpretation, using the example of baking cookies.

      But I’ll give you a hint: on my machine, that HTML doesn’t display with bold characters:bec266e6-2703-494b-8566-8b69172090c7-image.png.
      (My guess is that it’s a font issue on your PC.)
      Further, the snippet you showed has no characters outside of the ASCII range, so it doesn’t matter whether you have set charset="iso-8859-1" or charset="utf-8". If you do understand why having no characters outside of the ASCII range necessarily implies the “so…” part of my previous sentence, you need to go find a better tutorial on character encodings and HTML, because you obviously don’t understand the technology you are working with sufficiently. If you still don’t understand, you will have to find a forum that’s about HTML and web formatting, not one for a particular editor, and ask there. The Notepad++ Community Forum is not the right place for further discussion on this.

      You can even use Notepad++ to prove to yourself that it doesn’t matter which charset you pick, given the data you showed:

      • FIND = [^\x20-\x7e\r\n] – this will find any character that is not between ASCII 32 (0x20) and ASCII 126 (0x7E), or not a CR or LF newline character.
      • COUNT

      In your snippet, it finds 0 characters outside of that range. That means there is nothing in that snippet which is not ASCII, and thus nothing that will be different between ISO-8859-1 and UTF-8.

      9427fef9-7617-4935-8594-aad16725f4a4-image.png

      OTOH, if I add the characters ÀÁÅËË and do the COUNT again, it now counts 5 matches in the file, for those five characters.

      00008f65-c2ff-45c9-bfa9-8afb14c07f06-image.png

      1 Reply Last reply Reply Quote 4
      2 out of 2
      • First post
        2/2
        Last post
      The Community of users of the Notepad++ text editor.
      Powered by NodeBB | Contributors