Regex: Match the first three words from every line
-
Hello, I want to match the first three words from every line of my file, whether or not they have a dash.
For example:
Intr-o zi plecarea mea s-a amanat pentru toata viata.
The output match of the first three words should be:
Intr-o zi plecarea
I made some regex, but which are not working:
^(\w+){3}
or^(\w+\s+){3}
or^(?=(\w+)){3}
-
Hello, @vasile-caraus and All,
Interesting problem, indeed ! In this search, we must considered that the dash
-
is also a virtualword
character. So, in this example, a word char is defined as[\w-]
and common words are defined as[\w-]+
Thus, a non-word char, in this specific example, is, necessarily, defined with the regex
[^\w-]
but we usually use this regex[^\w\r\n-]
, in order to not match EOL chars, too ! Note that the dash must be the last character of the character class, because of its meaning inside square brackets !So, in this specific example, a non-empty range of non-word characters is matched with the regex
[^\w\r\n-]+
Now the first three common words, of each line can be expressed, in common language, as :
^ Word range + Non-word range + Word range + Non-word range + Word range or, more simply :
^ ( Word range + Non-word range ) {2} + Word range
which gives, when translated to regex, with the free-spacing mode :
SEARCH
(?x) ^ ( [\w-]+ [^\w\r\n-]+ ) {2} [\w-]+
So the minimal form :
SEARCH
^([\w-]+[^\w\r\n-]+){2}[\w-]+
@Vasile-caraus, you did not speak about the case of sentences with one or two words, only as, for instance:
Intr-o zi Intr-o
If you also want to match theses cases, prefer the following search regex :
SEARCH
^([\w-]+[^\w\r\n-]+){0,2}[\w-]+
Best Regards,
guy038
-
thank you, @guy038 . Also, there must be another case:
The space of the beginning.
Intr-o zi plecarea mea s-a amanat pentru toata viata. Intr-o zi plecarea mea s-a amanat pentru toata viata.
I try to add
\s\S
in your regex but is not working^\s\S([\w-]+[^\w\r\n-]+){2}[\w-]+
-
Hi, @vasile-caraus,
Ah… OK ! Note that you could have stated, in your initial post, that possible blank spaces may occur before the first word !
Moreover, the regexes that you provided, in your first post, were all anchored to beginning of line
^
!
Now, we still need additional information : do you want to match these leading blanks chars as well, along with the three “words” or not ?
BR
guy038
-
@guy038 said in Regex: Match the first three words from every line:
non-word
empty spaces are non-words.
So, finding those 3 words must not contain space in front of them. I don’t need to find empty spaces :)
-
Hi, @vasile-caraus,
Then, use the following search regex :
^\h*\K([\w-]+[^\w\r\n-]+){2}[\w-]+
And, in case of replacement, click on the
Replace All
button, only, because of the\K
syntax !Cheers,
guy038