Community
    • Login

    regex: Search operators (like google)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV
      Vasile Caraus
      last edited by

      hi. I want to make a search like this one I use on google. A simple example:

      "I love * car"

      practically * will be replace with a missing word, like “This” (I love this car).

      How can I search this way with notepad++ ?

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, Vasile

        Easy ! To simulate this Google search, use, in N++, the Regular expression search mode and, simply, looks for the regex, below :

        SEARCH I love \w+ car

        Remember that \w stands for a single word character, in the range [0-9A-Fa-f_], as well as any accentuated letter or any word character from the Greek, Cyrillic, Hebrew or Arab scripts !

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • Vasile CarausV
          Vasile Caraus
          last edited by Vasile Caraus

          hello guy038, works just fine. But If I want to find more words, I try this, but doesn’t work:

          I love \w{1,6}+ car

          1 Reply Last reply Reply Quote 0
          • Vasile CarausV
            Vasile Caraus
            last edited by Vasile Caraus

            anyway, I find another solution to find all words between “I love” and “car”

            I love \w.+? car

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by

              Vasile and All,

              Ah, of course, Vasile, your regex I love \w{1,6}+ car contains the part \w{1,6}+, with the quantifier {1,6}, followed by a + sign. It represents an atomic sequence. That is to say that, ONCE the regex engine matches the greatest possible amount of word characters , up to 6, it would NEVER backtrack, in order to satisfy, possibly, the remainder of the overall regex !

              Actually, your regex would just match the 6 following lines, below, with, only, ONE word ( of 1 to 6 letters ), between the words love and car

              I love car
              I love 1 car
              I love 12 car
              I love 123 car
              I love 1234 car
              I love 12345 car
              I love 123456 car
              I love 1234567 car
              

              Here are, below, TWO regexes, looking for a range of words, between TWO boundaries-words, let’s say, WORD_1 and WORD_2, which would be, both, separated by, at least, M words and NO more than N words :

                1. WORD_1(\W+\w+){M,N}\W+WORD_2
                1. WORD_1([^\w\r\n]+\w+){M,N}[^\w\r\n]+WORD_2

              The first syntax may match over several consecutive lines

              The second syntax forces the regex engine to match the two boundaries-words WORD_1 and WORD_2, on a SAME line


              One example :

              • Let WORD_1 be the article I

              • Let WORD_2 be the name car

              • Let M and N be the values 2 and 5

              So, the two resulting regexes are :

                1. I(\W+\w+){2,5}\W+car
                1. I([^\w\r\n]+\w+){2,5}[^\w\r\n]+car

              => The first syntax matches the two complete lines, first, then the lines, from 3 to 6

              => The second syntax matches, ONLY, a UNIQUE line, from 3 to 6, below :

              I car !
              I love car !
              I love this car !
              I love this blue car !
              I love this nice blue car !
              I love this very nice blue car !
              I love this gleaming and very nice blue car !
              

              I would like to point out a special regex construction, which may, sometimes, help to get powerful matches. It’s the part [^\w\r\n] !

              Indeed, I, originally, considered the classical syntax \W+, to match any range of NON-words characters, which occurs before a word. However, as the class \W is the opposite of the \w class, the NON-word \W may, also, match any EOL character, like \n or \r, leading, sometimes, to matches on two consecutive lines !

              So, to find out the second case, I built this NEGATIVE class [^\w\r\n], that considers a character, which is, both, NOT a Word character AND neither the \n nor the \r EOL character !


              An other example : the regex [^\W_a-z] is a kind of double-negation construction : It, finally, matches any Word character, except for the underscore ( _ ) and all the usual lower-case letters ( [a-z] ). In other words, this regex would match :

              • Any digit or number-like symbol

              • Any upper-case letter, accentuated or NOT

              • Any accentuated lower-case letter, ONLY

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors