Notepad++ How to find in page with UTF-8 instead of ANSI ?
-
hello, I have a lot of words like
ştiinţific
andstiintific
(with and without diacritics/accent marks). How can I search so as to find both versions?I can do this in all PDF and MS World files, but in notepad++ I cannot. So, is there a way to do this kind of find and also the replace just with UTF-8 ?
-
Hello, @robin-cruise and All,
You can achieve this kind of goal with equivalent class structures. Their global syntax is
[[=<Single_Letter>=]]
For instance, the regex
[[=A=]]
would match any of these82
Unicode chars :AaªÀÁÂÃÄÅàáâãäåĀāĂ㥹ǍǎǞǟǠǡǺǻȀȁȂȃȦȧȺɐɑɒᴀᴬᵃᵄᶏᶐᶛḀḁẚẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặₐÅ⒜ⒶⓐⱥⱭⱯⱰ
, which have a relation, in some way, with the first letter of the Latin alphabet !Actually, the regex should be more considered as the
[=<Single_Letter>=]
syntax, embedded in a usual character class[•••••]
. For instance, the regex
(?-i)[012[=A=]@b-y[=z=]|]
matches all the following characters, sorted by ascending Unicode code-point :-
ASCII
chars :- 012
- @
- A
- Z
- a
- bcdefghijklmnopqrstuvwxy
- z
- |
-
ANSI
chars- ª
- ÀÁÂÃÄÅ
- àáâãäå
-
UNICODE
chars, with code over\x{00ff}
- ĀāĂ㥹
- ŹźŻżŽž
- Ǎǎ
- Ǻǻ
- ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặ
So, practically, to match, either, your strings
ştiinţific
andstiintific
, use the regex :[[=s=]]tiin[[=t=]]ific
Best Regards,
guy038
-
-
yes, nice answer. But very hard , because I need to change almost all words from every sentence:)