Notepad++ How to find in page with UTF-8 instead of ANSI ?



  • hello, I have a lot of words like ştiinţific and stiintific (with and without diacritics/accent marks). How can I search so as to find both versions?

    I can do this in all PDF and MS World files, but in notepad++ I cannot. So, is there a way to do this kind of find and also the replace just with UTF-8 ?



  • Hello, @robin-cruise and All,

    You can achieve this kind of goal with equivalent class structures. Their global syntax is [[=<Single_Letter>=]]

    For instance, the regex [[=A=]] would match any of these 82 Unicode chars : AaªÀÁÂÃÄÅàáâãäåĀāĂ㥹ǍǎǞǟǠǡǺǻȀȁȂȃȦȧȺɐɑɒᴀᴬᵃᵄᶏᶐᶛḀḁẚẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặₐÅ⒜ⒶⓐⱥⱭⱯⱰ, which have a relation, in some way, with the first letter of the Latin alphabet !

    Actually, the regex should be more considered as the [=<Single_Letter>=] syntax, embedded in a usual character class [•••••]. For instance, the regex
    (?-i)[012[=A=]@b-y[=z=]|] matches all the following characters, sorted by ascending Unicode code-point :

    • ASCII chars :

      • 012
      • @
      • A
      • Z
      • a
      • bcdefghijklmnopqrstuvwxy
      • z
      • |
    • ANSI chars

      • ª
      • ÀÁÂÃÄÅ
      • àáâãäå
    • UNICODE chars, with code over \x{00ff}

      • ĀāĂ㥹
      • ŹźŻżŽž
      • Ǎǎ
      • Ǻǻ
      • ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặ

    So, practically, to match, either, your strings ştiinţific and stiintific, use the regex :

    [[=s=]]tiin[[=t=]]ific

    Best Regards,

    guy038



  • yes, nice answer. But very hard , because I need to change almost all words from every sentence:)


Log in to reply