Regex: Match the first three words from every line
-
Hello, I want to match the first three words from every line of my file, whether or not they have a dash.
For example:
Intr-o zi plecarea mea s-a amanat pentru toata viata.The output match of the first three words should be:
Intr-o zi plecareaI made some regex, but which are not working:
^(\w+){3}or^(\w+\s+){3}or^(?=(\w+)){3} -
Hello, @vasile-caraus and All,
Interesting problem, indeed ! In this search, we must considered that the dash
-is also a virtualwordcharacter. So, in this example, a word char is defined as[\w-]and common words are defined as[\w-]+Thus, a non-word char, in this specific example, is, necessarily, defined with the regex
[^\w-]but we usually use this regex[^\w\r\n-], in order to not match EOL chars, too ! Note that the dash must be the last character of the character class, because of its meaning inside square brackets !So, in this specific example, a non-empty range of non-word characters is matched with the regex
[^\w\r\n-]+
Now the first three common words, of each line can be expressed, in common language, as :
^ Word range + Non-word range + Word range + Non-word range + Word range or, more simply :
^ ( Word range + Non-word range ) {2} + Word range
which gives, when translated to regex, with the free-spacing mode :
SEARCH
(?x) ^ ( [\w-]+ [^\w\r\n-]+ ) {2} [\w-]+So the minimal form :
SEARCH
^([\w-]+[^\w\r\n-]+){2}[\w-]+
@Vasile-caraus, you did not speak about the case of sentences with one or two words, only as, for instance:
Intr-o zi Intr-oIf you also want to match theses cases, prefer the following search regex :
SEARCH
^([\w-]+[^\w\r\n-]+){0,2}[\w-]+Best Regards,
guy038
-
thank you, @guy038 . Also, there must be another case:
The space of the beginning.
Intr-o zi plecarea mea s-a amanat pentru toata viata. Intr-o zi plecarea mea s-a amanat pentru toata viata.I try to add
\s\Sin your regex but is not working^\s\S([\w-]+[^\w\r\n-]+){2}[\w-]+ -
Hi, @vasile-caraus,
Ah… OK ! Note that you could have stated, in your initial post, that possible blank spaces may occur before the first word !
Moreover, the regexes that you provided, in your first post, were all anchored to beginning of line
^!
Now, we still need additional information : do you want to match these leading blanks chars as well, along with the three “words” or not ?
BR
guy038
-
@guy038 said in Regex: Match the first three words from every line:
non-word
empty spaces are non-words.
So, finding those 3 words must not contain space in front of them. I don’t need to find empty spaces :)
-
Hi, @vasile-caraus,
Then, use the following search regex :
^\h*\K([\w-]+[^\w\r\n-]+){2}[\w-]+And, in case of replacement, click on the
Replace Allbutton, only, because of the\Ksyntax !Cheers,
guy038
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login