Regex: Find words between words



  • I have a gift for my mother. I want a nice book for myself. I love a beautiful woman for her eyes.

    So, as you can see, I want to find all words expression starting with a and ending with for. But for not more then 6 words between them.

    I made a regex, but it is not so good.

    \bA[\s\S]*?\w+{1,6}\bFOR[\s\S]*?



  • @Vasile-Caraus ,

    First, may I say, thank you for showing example data and what you had previously tried. That make it so much easier to help you.

    I’ve got a solution that I think will work: (?i)\bA\b(?:\s*\b\w+\b\s*){1,6}\bFOR\b See it described at https://regexr.com/3uuht.

    In your sequence, the problems I see:

    • [\s\S]*? means a non-greedy selection of 0 or more spaces or non-spaces – that means it non-greedy matches anything. I am not sure that’s what you really intended
    • \w+{1,6}: one or more word characters (from \w+), followed by nothing repeated 1 to 6 times. This is actually an error in the regex, and probably will cause the regex to do nothing (NPP Find says “invalid regular expression”). If you want the {1,6} to apply to the groupings of one-or-more word characters, you have to parenthesize around the \w+, as shown

    I fixed those in my version.

    I also added a couple more word boundaries, just to be explicit, and to prevent some false matches that I think may or may not be what you want.

    Caveats in my interpretation:

    • (?i): I was explicit about case insensitive (otherwise A and FOR would not match a and for)
    • I wanted to make it match a day for, but not an evening for, because you said “a”. If you want it to be able to start with a or an, but not and, use \bAN*\b
    • Since you said “for”, I assumed you wanted to match a day for, but not a day to go forth nor a day to go forward. If you want the latter as well, then \bFOR\w*\b
    • You weren’t explict as to whether you wanted the space after “for” to be included in the match or not (your final [\s\S] kindof hinted you do, but I wasn’t sure). If you do, then \bFOR\b\s+


  • Hello @vasile-caraus, @peterjones and All,

    Why not this regex :

    a\s+(\w+\s+){1,6}?for

    OR

    a\h+(\w+\h+){1,6}?for

    With the sample text, below, it matches, only, in sentences 2 to 7 !

    1 : If was a for sale ( Incorrect sentence ! )
    2 : It was a house for sale.
    3 : It was a small house for sale.
    4 : It was a small old house for sale.
    5 : It was a very small old house for sale.
    6 : It was a very small old green house for sale.
    7 : It was a very small old green wooden house for sale.
    8 : It was a very small old green wooden house designed for sale.
    9 : It was a very small old green wooden house not designed for sale.
    

    With a single long line of text, joining lines 1 to 9, it works nice too !

    If was a for sale ( Incorrect sentence ! ) It was a house for sale. It was a small house for sale. It was a small old house for sale. It wasavery small old house for sale. It was a very small old green house for sale. It was a very small old green wooden house for sale. It was a very small old green wooden house designed for sale. It was a very small old green wooden house not designed for sale.
    

    Best Regards,

    guy038



  • thank you

    also,

    a(\W+\w+){1,6}\W+for


Log in to reply