Community
    • Login

    Html files with Charset = iso: I don't wanna see the dicritics (accent marks ) with bold

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 333 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH
      Hellena Crainicu
      last edited by

      I made a text parsing with Python from an old website to a new website. The old website has charset=iso-8859-1 and the new one has charset="utf-8"

      What is the best solution as not to see bold letters diacriitics (accent marks )? I try to change the charset="utf-8 to charset="utf-8" and viceversa. The same thing. Diacritics are further highlighted.

      enter image description here

      This is the code of the text in the image:

      <p class="text_obisnuit">&Icirc;ntr-o oarecare m&#259;sur&#259;, f&#259;r&#259; s&#259; &icirc;nt&#259;mpin vreo dificultate, dac&#259; mi-a&#351; m&#259;sura capacit&#259;&#355;ile de inventator prin experien&#355;a confrunt&#259;rii cu moartea, cu privire la modul de a schimba o constant&#259; a reprezent&#259;rii cuceririi &#351;tiin&#355;ei pe o traiectorie axiomatic&#259; unitar&#259;, atunci probabil m-a&#351; ciocni de o latur&#259; trecut&#259; cu vederea, o excep&#355;ie, ceva ce nu a&#351; fi g&acirc;ndit c&#259; pot face. O fi doar o problem&#259; de credin&#355;&#259; &#351;i de for interior?</p>

      <p class="text_obisnuit">Totu&#351;i, sunt un inventator-autodidact, &#351;i pe aceast&#259; cale sunt &icirc;ndrept&#259;&#355;it s&#259; accept modul de &icirc;ncadrare a inven&#355;iilor mele &icirc;n categoria celor ce nu se rostesc, dar se &icirc;nchipuiesc. F&#259;r&#259; excep&#355;ie, lucrul cu materia se poate transforma &icirc;ntr-o rela&#355;ie desf&#259;&#351;urat&#259; &icirc;ntre ceea ce proiectez ca inten&#539;ie, &#537;i efectivitatea protec&#355;iei pe care natura mi-o asigur&#259; cu un singur scop: pentru a-i l&#259;rgi valen&#355;ele de &ldquo;miracol&rdquo; &icirc;n afara materiei vizibile.</p>

      What should I do ?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Hellena Crainicu
        last edited by PeterJones

        @hellena-crainicu

        This forum is for Notepad++ questions. Your question has nothing to do with Notepad++: the answer will be the same whether you are using Notepad++, MS notepad.exe, or copy con. If you think “I am typing this with Notepad++, so it should be on topic,” then you haven’t read our FAQ which explains why that is a false interpretation, using the example of baking cookies.

        But I’ll give you a hint: on my machine, that HTML doesn’t display with bold characters:bec266e6-2703-494b-8566-8b69172090c7-image.png.
        (My guess is that it’s a font issue on your PC.)
        Further, the snippet you showed has no characters outside of the ASCII range, so it doesn’t matter whether you have set charset="iso-8859-1" or charset="utf-8". If you do understand why having no characters outside of the ASCII range necessarily implies the “so…” part of my previous sentence, you need to go find a better tutorial on character encodings and HTML, because you obviously don’t understand the technology you are working with sufficiently. If you still don’t understand, you will have to find a forum that’s about HTML and web formatting, not one for a particular editor, and ask there. The Notepad++ Community Forum is not the right place for further discussion on this.

        You can even use Notepad++ to prove to yourself that it doesn’t matter which charset you pick, given the data you showed:

        • FIND = [^\x20-\x7e\r\n] – this will find any character that is not between ASCII 32 (0x20) and ASCII 126 (0x7E), or not a CR or LF newline character.
        • COUNT

        In your snippet, it finds 0 characters outside of that range. That means there is nothing in that snippet which is not ASCII, and thus nothing that will be different between ISO-8859-1 and UTF-8.

        9427fef9-7617-4935-8594-aad16725f4a4-image.png

        OTOH, if I add the characters ÀÁÅËË and do the COUNT again, it now counts 5 matches in the file, for those five characters.

        00008f65-c2ff-45c9-bfa9-8afb14c07f06-image.png

        1 Reply Last reply Reply Quote 4
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors