Search for accented words.
-
How to search for a text in accented words.
For example, if I have the word “Comparación”, and in the search I type “Comparacion”, how can I make it show me all the words whether they are accented or not.
I have the option “Regular expression” checked, but it does not show it.
Thank you.
-
@socu ,
Search for
Comparaci[[=o=]]n, in regular expression mode, to find eitherComparaciónorComparacionIt’s called the equivalence class.
So if you wanted to search for the accented versinon of any of the vowewls or n in that word for some reason, it would be
C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]][[=n=]] -
Hello, @socu and All,
First, here are two regexes which help you to see where you are almost sure to get some accentuated characters :
-
In a
Unicodeencoded file ( so in all encoding options butANSI) :-
Open the Mark dialog (
Ctrl + M) -
SEARCH
(?-i)[\x{00C0}-\x{024F}] -
Untick all options
-
Tick the
Purge for each searchoption -
Tick the
Wrap aroundoption -
Select the
Regular expressionsearh mode -
Click on the
Mark Allbutton
-
-
In an
ANSIencoded file :-
Open the Mark dialog (
Ctrl + M) -
SEARCH
(?i)[\x8A\x8E\x9A\x9E\xC0-\xFF] -
Untick all options
-
Tick the
Purge for each searchoption -
Tick the
Wrap aroundoption -
Select the
Regular expressionsearh mode -
Click on the
Mark Allbutton
-
As developped by @peterjones, the general method to find any
vowel, accentuated or not, is to use the regex class equivalence syntax, below :[[=vowel=]]. Of course, you must replace the string vowel by the exact single vowel, accentuated or not, to search for !Now, this may be difficult to achieve when you want to find any form, from a specific word !
So, here is a work-around which enables you to search for any form of a specific word :
-
Select the specific word, which may contain one or several accentuated characters
-
Open the Replace dialog (
Ctrl + H) -
Wipe out the SEARCH field
-
SEARCH
(?i)([aeiouy])|\w -
REPLACE
?1[[=$0=]]:$0 -
Untick all options
-
Tick the
Wrap aroundoption -
Tick the
In selectionoption ( IMPORTANT ) -
Select the
Regular expressionsearch mode -
Click once on the
Replace Allbutton ( Do not use the `Replace button )
=> A new string should be selected
-
Hit the
Esckey to close the Replace dialog -
Open the Mark dialog (
Ctrl+ M)
=> The string, previously selected, should be automatically written in the SEARCH field
-
( SEARCH
C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]]n) -
Untick all options
-
If preferred, tick the
Bookmark lineoption -
Tick the
Purge for each searchoption -
Tick the
Wrap aroundoption -
Select the
Regular expressionsearh mode -
Click on the
Mark Allbutton
=> This regex should find any
comparacionword, whatever its case and whatever if accentuated characters exist in vowels or not, throughout the entire file !For instance, it would mark all the strings, below, based on the root
comparacion:comparacion cÒmparación CompàraciÔn cömparÅciõn CompâraciønBest Regards,
guy038
-
-
Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.
Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.
-
@socu said in Search for accented words.:
Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.
Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.
That is standard behavior in every regular expression engine that I have ever used in my 25+ years of using regular expression engines – if you want to match a single literal character, you type that literal character; if you want to match something more complicated (like a list of potential characters, predefined or not), then you have to use special syntax to invoke that mode. The Notepad++ application uses a pre-built regular expression engine, and doesn’t write their own, because the developers wanted to focus on the interesting things, not designing yet another regular expression engine from the ground up. So even if this Forum were the feature request tracker (and it’s not, as explained in “Please Read This Before Posting” and “Feature Request and Bug Report”), I would bet that the Developers would not implement such a request – moreover, I would lobby against such a change, because it would break decades of expectation that when you say “search for
o, that it searches for the literal charactero, and noto, plus some accented o-like characters.” -
I understand, it is clear that I am not very knowledgeable, I thought that you could add to the search engine some exceptions such as accented characters so that it does not take them into account when performing a search.
Thank you.
-
@socu ,
It would make sense if there were an “accent-insensitive” flag in the standard regex engines, just like there’s “case-insensitive” flag. But no regex engine that I’ve ever used has had such a flag… Given that some of those engines have decades of development (for example, the Boost regex engine used by Notepad++ was derived from the PCRE engine, which had its roots in late-90s Perl regular expression), most of which has included knowing about Unicode, and the number of times I’ve seen “is there an accent-insensitive flag for regex-flavor-X” questions answered in the negative in programming forums, I would assume that if it were technically reasonable to be included, it would have been developed and included in the major ones by now. Given that it hasn’t been developed, I am assuming that’s because there’s a huge technical roadblock that’s beyond my pay grade to understand.
-
P PeterJones referenced this topic on