Community
    • Login

    regex: Search operators (like google)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 4.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV Offline
      Vasile Caraus
      last edited by

      hi. I want to make a search like this one I use on google. A simple example:

      "I love * car"

      practically * will be replace with a missing word, like “This” (I love this car).

      How can I search this way with notepad++ ?

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hi, Vasile

        Easy ! To simulate this Google search, use, in N++, the Regular expression search mode and, simply, looks for the regex, below :

        SEARCH I love \w+ car

        Remember that \w stands for a single word character, in the range [0-9A-Fa-f_], as well as any accentuated letter or any word character from the Greek, Cyrillic, Hebrew or Arab scripts !

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • Vasile CarausV Offline
          Vasile Caraus
          last edited by Vasile Caraus

          hello guy038, works just fine. But If I want to find more words, I try this, but doesn’t work:

          I love \w{1,6}+ car

          1 Reply Last reply Reply Quote 0
          • Vasile CarausV Offline
            Vasile Caraus
            last edited by Vasile Caraus

            anyway, I find another solution to find all words between “I love” and “car”

            I love \w.+? car

            1 Reply Last reply Reply Quote 0
            • guy038G Offline
              guy038
              last edited by

              Vasile and All,

              Ah, of course, Vasile, your regex I love \w{1,6}+ car contains the part \w{1,6}+, with the quantifier {1,6}, followed by a + sign. It represents an atomic sequence. That is to say that, ONCE the regex engine matches the greatest possible amount of word characters , up to 6, it would NEVER backtrack, in order to satisfy, possibly, the remainder of the overall regex !

              Actually, your regex would just match the 6 following lines, below, with, only, ONE word ( of 1 to 6 letters ), between the words love and car

              I love car
              I love 1 car
              I love 12 car
              I love 123 car
              I love 1234 car
              I love 12345 car
              I love 123456 car
              I love 1234567 car
              

              Here are, below, TWO regexes, looking for a range of words, between TWO boundaries-words, let’s say, WORD_1 and WORD_2, which would be, both, separated by, at least, M words and NO more than N words :

                1. WORD_1(\W+\w+){M,N}\W+WORD_2
                1. WORD_1([^\w\r\n]+\w+){M,N}[^\w\r\n]+WORD_2

              The first syntax may match over several consecutive lines

              The second syntax forces the regex engine to match the two boundaries-words WORD_1 and WORD_2, on a SAME line


              One example :

              • Let WORD_1 be the article I

              • Let WORD_2 be the name car

              • Let M and N be the values 2 and 5

              So, the two resulting regexes are :

                1. I(\W+\w+){2,5}\W+car
                1. I([^\w\r\n]+\w+){2,5}[^\w\r\n]+car

              => The first syntax matches the two complete lines, first, then the lines, from 3 to 6

              => The second syntax matches, ONLY, a UNIQUE line, from 3 to 6, below :

              I car !
              I love car !
              I love this car !
              I love this blue car !
              I love this nice blue car !
              I love this very nice blue car !
              I love this gleaming and very nice blue car !
              

              I would like to point out a special regex construction, which may, sometimes, help to get powerful matches. It’s the part [^\w\r\n] !

              Indeed, I, originally, considered the classical syntax \W+, to match any range of NON-words characters, which occurs before a word. However, as the class \W is the opposite of the \w class, the NON-word \W may, also, match any EOL character, like \n or \r, leading, sometimes, to matches on two consecutive lines !

              So, to find out the second case, I built this NEGATIVE class [^\w\r\n], that considers a character, which is, both, NOT a Word character AND neither the \n nor the \r EOL character !


              An other example : the regex [^\W_a-z] is a kind of double-negation construction : It, finally, matches any Word character, except for the underscore ( _ ) and all the usual lower-case letters ( [a-z] ). In other words, this regex would match :

              • Any digit or number-like symbol

              • Any upper-case letter, accentuated or NOT

              • Any accentuated lower-case letter, ONLY

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors