Community
    • Login

    Notepad++ How to find in page with UTF-8 instead of ANSI ?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 230 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by Robin Cruise

      hello, I have a lot of words like ştiinţific and stiintific (with and without diacritics/accent marks). How can I search so as to find both versions?

      I can do this in all PDF and MS World files, but in notepad++ I cannot. So, is there a way to do this kind of find and also the replace just with UTF-8 ?

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @robin-cruise and All,

        You can achieve this kind of goal with equivalent class structures. Their global syntax is [[=<Single_Letter>=]]

        For instance, the regex [[=A=]] would match any of these 82 Unicode chars : AaªÀÁÂÃÄÅàáâãäåĀāĂ㥹ǍǎǞǟǠǡǺǻȀȁȂȃȦȧȺɐɑɒᴀᴬᵃᵄᶏᶐᶛḀḁẚẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặₐÅ⒜ⒶⓐⱥⱭⱯⱰ, which have a relation, in some way, with the first letter of the Latin alphabet !

        Actually, the regex should be more considered as the [=<Single_Letter>=] syntax, embedded in a usual character class [•••••]. For instance, the regex
        (?-i)[012[=A=]@b-y[=z=]|] matches all the following characters, sorted by ascending Unicode code-point :

        • ASCII chars :

          • 012
          • @
          • A
          • Z
          • a
          • bcdefghijklmnopqrstuvwxy
          • z
          • |
        • ANSI chars

          • ª
          • ÀÁÂÃÄÅ
          • àáâãäå
        • UNICODE chars, with code over \x{00ff}

          • ĀāĂ㥹
          • ŹźŻżŽž
          • Ǎǎ
          • Ǻǻ
          • ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặ

        So, practically, to match, either, your strings ştiinţific and stiintific, use the regex :

        [[=s=]]tiin[[=t=]]ific

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • Robin CruiseR
          Robin Cruise
          last edited by

          yes, nice answer. But very hard , because I need to change almost all words from every sentence:)

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors