Search for accented words.
-
How to search for a text in accented words.
For example, if I have the word “Comparación”, and in the search I type “Comparacion”, how can I make it show me all the words whether they are accented or not.
I have the option “Regular expression” checked, but it does not show it.
Thank you.
-
@socu ,
Search for
Comparaci[[=o=]]n
, in regular expression mode, to find eitherComparación
orComparacion
It’s called the equivalence class.
So if you wanted to search for the accented versinon of any of the vowewls or n in that word for some reason, it would be
C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]][[=n=]]
-
Hello, @socu and All,
First, here are two regexes which help you to see where you are almost sure to get some accentuated characters :
-
In a
Unicode
encoded file ( so in all encoding options butANSI
) :-
Open the Mark dialog (
Ctrl + M
) -
SEARCH
(?-i)[\x{00C0}-\x{024F}]
-
Untick all options
-
Tick the
Purge for each search
option -
Tick the
Wrap around
option -
Select the
Regular expression
searh mode -
Click on the
Mark All
button
-
-
In an
ANSI
encoded file :-
Open the Mark dialog (
Ctrl + M
) -
SEARCH
(?i)[\x8A\x8E\x9A\x9E\xC0-\xFF]
-
Untick all options
-
Tick the
Purge for each search
option -
Tick the
Wrap around
option -
Select the
Regular expression
searh mode -
Click on the
Mark All
button
-
As developped by @peterjones, the general method to find any
vowel
, accentuated or not, is to use the regex class equivalence syntax, below :[[=
vowel=]]
. Of course, you must replace the string vowel by the exact single vowel, accentuated or not, to search for !Now, this may be difficult to achieve when you want to find any form, from a specific word !
So, here is a work-around which enables you to search for any form of a specific word :
-
Select the specific word, which may contain one or several accentuated characters
-
Open the Replace dialog (
Ctrl + H
) -
Wipe out the SEARCH field
-
SEARCH
(?i)([aeiouy])|\w
-
REPLACE
?1[[=$0=]]:$0
-
Untick all options
-
Tick the
Wrap around
option -
Tick the
In selection
option ( IMPORTANT ) -
Select the
Regular expression
search mode -
Click once on the
Replace All
button ( Do not use the `Replace button )
=> A new string should be selected
-
Hit the
Esc
key to close the Replace dialog -
Open the Mark dialog (
Ctrl+ M
)
=> The string, previously selected, should be automatically written in the SEARCH field
-
( SEARCH
C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]]n
) -
Untick all options
-
If preferred, tick the
Bookmark line
option -
Tick the
Purge for each search
option -
Tick the
Wrap around
option -
Select the
Regular expression
searh mode -
Click on the
Mark All
button
=> This regex should find any
comparacion
word, whatever its case and whatever if accentuated characters exist in vowels or not, throughout the entire file !For instance, it would mark all the strings, below, based on the root
comparacion
:comparacion cÒmparación CompàraciÔn cömparÅciõn Compâraciøn
Best Regards,
guy038
-
-
Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.
Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.
-
@socu said in Search for accented words.:
Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.
Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.
That is standard behavior in every regular expression engine that I have ever used in my 25+ years of using regular expression engines – if you want to match a single literal character, you type that literal character; if you want to match something more complicated (like a list of potential characters, predefined or not), then you have to use special syntax to invoke that mode. The Notepad++ application uses a pre-built regular expression engine, and doesn’t write their own, because the developers wanted to focus on the interesting things, not designing yet another regular expression engine from the ground up. So even if this Forum were the feature request tracker (and it’s not, as explained in “Please Read This Before Posting” and “Feature Request and Bug Report”), I would bet that the Developers would not implement such a request – moreover, I would lobby against such a change, because it would break decades of expectation that when you say “search for
o
, that it searches for the literal charactero
, and noto
, plus some accented o-like characters.” -
I understand, it is clear that I am not very knowledgeable, I thought that you could add to the search engine some exceptions such as accented characters so that it does not take them into account when performing a search.
Thank you.
-
@socu ,
It would make sense if there were an “accent-insensitive” flag in the standard regex engines, just like there’s “case-insensitive” flag. But no regex engine that I’ve ever used has had such a flag… Given that some of those engines have decades of development (for example, the Boost regex engine used by Notepad++ was derived from the PCRE engine, which had its roots in late-90s Perl regular expression), most of which has included knowing about Unicode, and the number of times I’ve seen “is there an accent-insensitive flag for regex-flavor-X” questions answered in the negative in programming forums, I would assume that if it were technically reasonable to be included, it would have been developed and included in the major ones by now. Given that it hasn’t been developed, I am assuming that’s because there’s a huge technical roadblock that’s beyond my pay grade to understand.
-