Regex: Match the first three words from every line
-
Hello, I want to match the first three words from every line of my file, whether or not they have a dash.
For example:
Intr-o zi plecarea mea s-a amanat pentru toata viata.The output match of the first three words should be:
Intr-o zi plecareaI made some regex, but which are not working:
^(\w+){3}or^(\w+\s+){3}or^(?=(\w+)){3} -
Hello, @vasile-caraus and All,
Interesting problem, indeed ! In this search, we must considered that the dash
-is also a virtualwordcharacter. So, in this example, a word char is defined as[\w-]and common words are defined as[\w-]+Thus, a non-word char, in this specific example, is, necessarily, defined with the regex
[^\w-]but we usually use this regex[^\w\r\n-], in order to not match EOL chars, too ! Note that the dash must be the last character of the character class, because of its meaning inside square brackets !So, in this specific example, a non-empty range of non-word characters is matched with the regex
[^\w\r\n-]+
Now the first three common words, of each line can be expressed, in common language, as :
^ Word range + Non-word range + Word range + Non-word range + Word range or, more simply :
^ ( Word range + Non-word range ) {2} + Word range
which gives, when translated to regex, with the free-spacing mode :
SEARCH
(?x) ^ ( [\w-]+ [^\w\r\n-]+ ) {2} [\w-]+So the minimal form :
SEARCH
^([\w-]+[^\w\r\n-]+){2}[\w-]+
@Vasile-caraus, you did not speak about the case of sentences with one or two words, only as, for instance:
Intr-o zi Intr-oIf you also want to match theses cases, prefer the following search regex :
SEARCH
^([\w-]+[^\w\r\n-]+){0,2}[\w-]+Best Regards,
guy038
-
thank you, @guy038 . Also, there must be another case:
The space of the beginning.
Intr-o zi plecarea mea s-a amanat pentru toata viata. Intr-o zi plecarea mea s-a amanat pentru toata viata.I try to add
\s\Sin your regex but is not working^\s\S([\w-]+[^\w\r\n-]+){2}[\w-]+ -
Hi, @vasile-caraus,
Ah… OK ! Note that you could have stated, in your initial post, that possible blank spaces may occur before the first word !
Moreover, the regexes that you provided, in your first post, were all anchored to beginning of line
^!
Now, we still need additional information : do you want to match these leading blanks chars as well, along with the three “words” or not ?
BR
guy038
-
@guy038 said in Regex: Match the first three words from every line:
non-word
empty spaces are non-words.
So, finding those 3 words must not contain space in front of them. I don’t need to find empty spaces :)
-
Hi, @vasile-caraus,
Then, use the following search regex :
^\h*\K([\w-]+[^\w\r\n-]+){2}[\w-]+And, in case of replacement, click on the
Replace Allbutton, only, because of the\Ksyntax !Cheers,
guy038