regex: Search operators (like google)

Vasile Caraus

hi. I want to make a search like this one I use on google. A simple example:

"I love * car"

practically * will be replace with a missing word, like “This” (I love this car).

How can I search this way with notepad++ ?

guy038

Hi, Vasile

Easy ! To simulate this Google search, use, in N++, the Regular expression search mode and, simply, looks for the regex, below :

SEARCH I love \w+ car

Remember that \w stands for a single word character, in the range [0-9A-Fa-f_], as well as any accentuated letter or any word character from the Greek, Cyrillic, Hebrew or Arab scripts !

Cheers,

guy038

Vasile Caraus

hello guy038, works just fine. But If I want to find more words, I try this, but doesn’t work:

I love \w{1,6}+ car

Vasile Caraus

anyway, I find another solution to find all words between “I love” and “car”

I love \w.+? car

guy038

Vasile and All,

Ah, of course, Vasile, your regex I love \w{1,6}+ car contains the part \w{1,6}+, with the quantifier {1,6}, followed by a + sign. It represents an atomic sequence. That is to say that, ONCE the regex engine matches the greatest possible amount of word characters , up to 6, it would NEVER backtrack, in order to satisfy, possibly, the remainder of the overall regex !

Actually, your regex would just match the 6 following lines, below, with, only, ONE word ( of 1 to 6 letters ), between the words love and car

I love car
I love 1 car
I love 12 car
I love 123 car
I love 1234 car
I love 12345 car
I love 123456 car
I love 1234567 car

Here are, below, TWO regexes, looking for a range of words, between TWO boundaries-words, let’s say, WORD_1 and WORD_2, which would be, both, separated by, at least, M words and NO more than N words :

1. WORD_1(\W+\w+){M,N}\W+WORD_2
1. WORD_1([^\w\r\n]+\w+){M,N}[^\w\r\n]+WORD_2

The first syntax may match over several consecutive lines

The second syntax forces the regex engine to match the two boundaries-words WORD_1 and WORD_2, on a SAME line

One example :

Let WORD_1 be the article I
Let WORD_2 be the name car
Let M and N be the values 2 and 5

So, the two resulting regexes are :

1. I(\W+\w+){2,5}\W+car
1. I([^\w\r\n]+\w+){2,5}[^\w\r\n]+car

=> The first syntax matches the two complete lines, first, then the lines, from 3 to 6

=> The second syntax matches, ONLY, a UNIQUE line, from 3 to 6, below :

I car !
I love car !
I love this car !
I love this blue car !
I love this nice blue car !
I love this very nice blue car !
I love this gleaming and very nice blue car !

I would like to point out a special regex construction, which may, sometimes, help to get powerful matches. It’s the part [^\w\r\n] !

Indeed, I, originally, considered the classical syntax \W+, to match any range of NON-words characters, which occurs before a word. However, as the class \W is the opposite of the \w class, the NON-word \W may, also, match any EOL character, like \n or \r, leading, sometimes, to matches on two consecutive lines !

So, to find out the second case, I built this NEGATIVE class [^\w\r\n], that considers a character, which is, both, NOT a Word character AND neither the \n nor the \r EOL character !

An other example : the regex [^\W_a-z] is a kind of double-negation construction : It, finally, matches any Word character, except for the underscore ( _ ) and all the usual lower-case letters ( [a-z] ). In other words, this regex would match :

Any digit or number-like symbol
Any upper-case letter, accentuated or NOT
Any accentuated lower-case letter, ONLY

Cheers,

guy038