Regex: Put a dot after every 10 words

Neculai I. Fantanaru

hi, I need to put a dot after every 10 words. My regex is not so good:

SEARCH: (\w+){0,5}
REPLACE BY: \1.

Alan Kilborn

Now THAT looks like a regex just thrown out there, in order to get some help.
Can you please explain how that regex is supposed to work?
If you do that, perhaps you’ll see what’s wrong with it.

Neculai I. Fantanaru

@Alan-Kilborn good day, sir. My regex is not just thrown out there, in order to get some help. It is a solution, a step forward. I know is not good, but I don’t know any other solution…

Neculai I. Fantanaru

ok, I have another solution, will put a dot at every 5 words. But seems to work only for the first line, not for all lines:

SEARCH: (\w+){10}\K
REPLACE BY: \1.

Alan Kilborn

@Neculai-I-Fantanaru

Sorry, it’s just that your initial regex looked very suspicious, because you mentioned 10 but then nothing even close to 10 appeared in your regex.

So \w does NOT mean to match a “word”, it means match a “word character”.

For example, there are three word characters in the first word of this:

abc defg hijkl

Neculai I. Fantanaru

oh, yes. I mention 5 words, sorry, should be 10 words. But the regex should be the same (I will change the number)

Neculai I. Fantanaru

can anyone help me? @guy038

guy038

Hello, @neculai-i-fantanaru, @alan-kilborn and All,

I agree with @alan-kilborn’s comments and let you look for a solution by yourself ! But, apparently, you’ve reached a dead-end !

You said :

hi, I need to put a dot after every 10 words

But you haven’t shown us which kind of text is concerned :-(

I suppose that your initial text does not contain any punctuation and is, mainly, a list of words, separated with space characters ?!

As an example, let’s take the first sentence of the preamble of the license.txt file

The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.

AFTER removing any punctuation sign, we get :

The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too

Note that, in this sentence, it remains the possessive structure Free Software Foundation's software. To my mind, the expression Foundation's should be considered as a single word, as well as other contracted English forms such as I'm, don't,…

So, an appropriate regex S/R could be :

SEARCH (?:([\w'’]+)\W+){9}(?1)\K or (?:[\w'’]+\W+){9}[\w'’]+\K

REPLACE .

And, after a single click on the Replace All button, this OUTPUT sentence becomes :

The licenses for most software are designed to take. away your freedom to share and change it By contrast. the GNU General Public License is intended to guarantee your. freedom to share and change free software to make sure. the software is free for all its users This General. Public License applies to most of the Free Software Foundation's. software and to any other program whose authors commit to. using it Some other Free Software Foundation software is covered. by the GNU Library General Public License instead You can. apply it to your programs too

However, note that this text is not consistent as a full stop is inserted every 10 words and does not respect, obviously, the English language !!

Best Regards,

guy038

Neculai I. Fantanaru

@guy038 said in Regex: Put a dot after every 10 words:

(?:([\w’’]+)\W+){9}(?1)\K

thank you @guy038

guy038

Hi, @neculai-i-fantanaru and All,

I forgot to explain why the first provided regex was (?:([\w'’]+)\W+){9}(?1)\K and not the regex (?:([\w'’]+)\W+){9}\1\K

Well, the \1 right before \K would match the last occurrence of the group 1, that is the ninth word found with the regex [\w'’]+ !

So, the regex (?:([\w'’]+)\W+){9}\1 would match strings like :

111 222 333 444 555 666 777 888 999 999

or

000 111 222 333 444 555 666 777 888 888

but NOT the string :

000 111 222 333 444 555 666 777 888 999

On the contrary, the (?1) syntax, is a subroutine call to the group 1 and is, fundamentally, identical to the regex [\w'’]+ itself. So, the regex (?:([\w'’]+)\W+){9}(?1) would also match my third example too and any other word ;-))

Best Regards

guy038