regex: Search operators (like google)



  • hi. I want to make a search like this one I use on google. A simple example:

    "I love * car"

    practically * will be replace with a missing word, like “This” (I love this car).

    How can I search this way with notepad++ ?



  • Hi, Vasile

    Easy ! To simulate this Google search, use, in N++, the Regular expression search mode and, simply, looks for the regex, below :

    SEARCH I love \w+ car

    Remember that \w stands for a single word character, in the range [0-9A-Fa-f_], as well as any accentuated letter or any word character from the Greek, Cyrillic, Hebrew or Arab scripts !

    Cheers,

    guy038



  • hello guy038, works just fine. But If I want to find more words, I try this, but doesn’t work:

    I love \w{1,6}+ car



  • anyway, I find another solution to find all words between “I love” and “car”

    I love \w.+? car



  • Vasile and All,

    Ah, of course, Vasile, your regex I love \w{1,6}+ car contains the part \w{1,6}+, with the quantifier {1,6}, followed by a + sign. It represents an atomic sequence. That is to say that, ONCE the regex engine matches the greatest possible amount of word characters , up to 6, it would NEVER backtrack, in order to satisfy, possibly, the remainder of the overall regex !

    Actually, your regex would just match the 6 following lines, below, with, only, ONE word ( of 1 to 6 letters ), between the words love and car

    I love car
    I love 1 car
    I love 12 car
    I love 123 car
    I love 1234 car
    I love 12345 car
    I love 123456 car
    I love 1234567 car
    

    Here are, below, TWO regexes, looking for a range of words, between TWO boundaries-words, let’s say, WORD_1 and WORD_2, which would be, both, separated by, at least, M words and NO more than N words :

      1. WORD_1(\W+\w+){M,N}\W+WORD_2
      1. WORD_1([^\w\r\n]+\w+){M,N}[^\w\r\n]+WORD_2

    The first syntax may match over several consecutive lines

    The second syntax forces the regex engine to match the two boundaries-words WORD_1 and WORD_2, on a SAME line


    One example :

    • Let WORD_1 be the article I

    • Let WORD_2 be the name car

    • Let M and N be the values 2 and 5

    So, the two resulting regexes are :

      1. I(\W+\w+){2,5}\W+car
      1. I([^\w\r\n]+\w+){2,5}[^\w\r\n]+car

    => The first syntax matches the two complete lines, first, then the lines, from 3 to 6

    => The second syntax matches, ONLY, a UNIQUE line, from 3 to 6, below :

    I car !
    I love car !
    I love this car !
    I love this blue car !
    I love this nice blue car !
    I love this very nice blue car !
    I love this gleaming and very nice blue car !
    

    I would like to point out a special regex construction, which may, sometimes, help to get powerful matches. It’s the part [^\w\r\n] !

    Indeed, I, originally, considered the classical syntax \W+, to match any range of NON-words characters, which occurs before a word. However, as the class \W is the opposite of the \w class, the NON-word \W may, also, match any EOL character, like \n or \r, leading, sometimes, to matches on two consecutive lines !

    So, to find out the second case, I built this NEGATIVE class [^\w\r\n], that considers a character, which is, both, NOT a Word character AND neither the \n nor the \r EOL character !


    An other example : the regex [^\W_a-z] is a kind of double-negation construction : It, finally, matches any Word character, except for the underscore ( _ ) and all the usual lower-case letters ( [a-z] ). In other words, this regex would match :

    • Any digit or number-like symbol

    • Any upper-case letter, accentuated or NOT

    • Any accentuated lower-case letter, ONLY

    Cheers,

    guy038


Log in to reply