Regex: Put a dot after every 10 words



  • hi, I need to put a dot after every 10 words. My regex is not so good:

    SEARCH: (\w+){0,5}
    REPLACE BY: \1.



  • @Neculai-I-Fantanaru

    Now THAT looks like a regex just thrown out there, in order to get some help.
    Can you please explain how that regex is supposed to work?
    If you do that, perhaps you’ll see what’s wrong with it.



  • @Alan-Kilborn good day, sir. My regex is not just thrown out there, in order to get some help. It is a solution, a step forward. I know is not good, but I don’t know any other solution…



  • ok, I have another solution, will put a dot at every 5 words. But seems to work only for the first line, not for all lines:

    SEARCH: (\w+){10}\K
    REPLACE BY: \1.



  • @Neculai-I-Fantanaru

    Sorry, it’s just that your initial regex looked very suspicious, because you mentioned 10 but then nothing even close to 10 appeared in your regex.

    So \w does NOT mean to match a “word”, it means match a “word character”.

    For example, there are three word characters in the first word of this:

    abc defg hijkl



  • oh, yes. I mention 5 words, sorry, should be 10 words. But the regex should be the same (I will change the number)



  • can anyone help me? @guy038



  • Hello, @neculai-i-fantanaru, @alan-kilborn and All,

    I agree with @alan-kilborn’s comments and let you look for a solution by yourself ! But, apparently, you’ve reached a dead-end !

    You said :

    hi, I need to put a dot after every 10 words

    But you haven’t shown us which kind of text is concerned :-(

    I suppose that your initial text does not contain any punctuation and is, mainly, a list of words, separated with space characters ?!


    As an example, let’s take the first sentence of the preamble of the license.txt file

    The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.
    

    AFTER removing any punctuation sign, we get :

    The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too
    

    Note that, in this sentence, it remains the possessive structure Free Software Foundation's software. To my mind, the expression Foundation's should be considered as a single word, as well as other contracted English forms such as I'm, don't,…

    So, an appropriate regex S/R could be :

    SEARCH (?:([\w'’]+)\W+){9}(?1)\K    or    (?:[\w'’]+\W+){9}[\w'’]+\K

    REPLACE .

    And, after a single click on the Replace All button, this OUTPUT sentence becomes :

    The licenses for most software are designed to take. away your freedom to share and change it By contrast. the GNU General Public License is intended to guarantee your. freedom to share and change free software to make sure. the software is free for all its users This General. Public License applies to most of the Free Software Foundation's. software and to any other program whose authors commit to. using it Some other Free Software Foundation software is covered. by the GNU Library General Public License instead You can. apply it to your programs too
    

    However, note that this text is not consistent as a full stop is inserted every 10 words and does not respect, obviously, the English language !!

    Best Regards,

    guy038



  • @guy038 said in Regex: Put a dot after every 10 words:

    (?:([\w’’]+)\W+){9}(?1)\K

    thank you @guy038



  • Hi, @neculai-i-fantanaru and All,

    I forgot to explain why the first provided regex was (?:([\w'’]+)\W+){9}(?1)\K and not the regex (?:([\w'’]+)\W+){9}\1\K


    Well, the \1 right before \K would match the last occurrence of the group 1, that is the ninth word found with the regex [\w'’]+ !

    So, the regex (?:([\w'’]+)\W+){9}\1 would match strings like :

    111 222 333 444 555 666 777 888 999 999
    

    or

    000 111 222 333 444 555 666 777 888 888
    

    but NOT the string :

    000 111 222 333 444 555 666 777 888 999
    

    On the contrary, the (?1) syntax, is a subroutine call to the group 1 and is, fundamentally, identical to the regex [\w'’]+ itself. So, the regex (?:([\w'’]+)\W+){9}(?1) would also match my third example too and any other word ;-))

    Best Regards

    guy038


Log in to reply