regex: Search operators (like google)
-
hi. I want to make a search like this one I use on google. A simple example:
"I love * car"
practically
*
will be replace with a missing word, like “This” (I love this car).How can I search this way with notepad++ ?
-
Hi, Vasile
Easy ! To simulate this Google search, use, in N++, the Regular expression search mode and, simply, looks for the regex, below :
SEARCH
I love \w+ car
Remember that
\w
stands for a single word character, in the range[0-9A-Fa-f_]
, as well as any accentuated letter or any word character from the Greek, Cyrillic, Hebrew or Arab scripts !Cheers,
guy038
-
hello guy038, works just fine. But If I want to find more words, I try this, but doesn’t work:
I love \w{1,6}+ car
-
anyway, I find another solution to find all words between “I love” and “car”
I love \w.+? car
-
Vasile and All,
Ah, of course, Vasile, your regex
I love \w{1,6}+ car
contains the part\w{1,6}+
, with the quantifier{1,6}
, followed by a+
sign. It represents an atomic sequence. That is to say that, ONCE the regex engine matches the greatest possible amount of word characters , up to 6, it would NEVER backtrack, in order to satisfy, possibly, the remainder of the overall regex !Actually, your regex would just match the 6 following lines, below, with, only, ONE word ( of 1 to 6 letters ), between the words love and car
I love car I love 1 car I love 12 car I love 123 car I love 1234 car I love 12345 car I love 123456 car I love 1234567 car
Here are, below, TWO regexes, looking for a range of words, between TWO boundaries-words, let’s say, WORD_1 and WORD_2, which would be, both, separated by, at least, M words and NO more than N words :
-
WORD_1(\W+\w+){M,N}\W+WORD_2
-
WORD_1([^\w\r\n]+\w+){M,N}[^\w\r\n]+WORD_2
The first syntax may match over several consecutive lines
The second syntax forces the regex engine to match the two boundaries-words WORD_1 and WORD_2, on a SAME line
One example :
-
Let WORD_1 be the article I
-
Let WORD_2 be the name car
-
Let M and N be the values 2 and 5
So, the two resulting regexes are :
-
I(\W+\w+){2,5}\W+car
-
I([^\w\r\n]+\w+){2,5}[^\w\r\n]+car
=> The first syntax matches the two complete lines, first, then the lines, from 3 to 6
=> The second syntax matches, ONLY, a UNIQUE line, from 3 to 6, below :
I car ! I love car ! I love this car ! I love this blue car ! I love this nice blue car ! I love this very nice blue car ! I love this gleaming and very nice blue car !
I would like to point out a special regex construction, which may, sometimes, help to get powerful matches. It’s the part
[^\w\r\n]
!Indeed, I, originally, considered the classical syntax
\W+
, to match any range of NON-words characters, which occurs before a word. However, as the class\W
is the opposite of the\w
class, the NON-word\W
may, also, match any EOL character, like\n
or\r
, leading, sometimes, to matches on two consecutive lines !So, to find out the second case, I built this NEGATIVE class
[^\w\r\n]
, that considers a character, which is, both, NOT a Word character AND neither the\n
nor the\r
EOL character !
An other example : the regex
[^\W_a-z]
is a kind of double-negation construction : It, finally, matches any Word character, except for the underscore (_
) and all the usual lower-case letters ([a-z]
). In other words, this regex would match :-
Any digit or number-like symbol
-
Any upper-case letter, accentuated or NOT
-
Any accentuated lower-case letter, ONLY
Cheers,
guy038
-