Find and replace the 4th occurrence of string </p>



  • Hello,
    Plese help, I have a text with hundreds of lines containing a lot of html tags.
    I want to find and replace the 4th occurrence of string </p> in every line.
    some text </p> some text </p> some text </p> some text </p> some text </p>.
    Can i do that with regex in notepad++ ?

    Thanks



  • Hello @Radu-Lucian,

    I assume so .
    But if this is a valid html doc a closing tag </p>
    has an appropriate opening tag <p> .
    So by deleting it you may corrupt the html document.
    Or is this needed to repair it?
    And what is the text to be used as replacement?
    Can you give an more concrete example line?

    Cheers
    Claudia



  • Hello,
    In fact, every line of my csv file is an html article and i want to split every article after 4th paragraf. For this, i want to replace the 4th occurrence of </p> in every line with </p> " , " because my csv columns are separated by comma.
    I hope it is clear now.

    Thank you very much!



  • So you do not want to replace something but inserting text, correct?

    If so, this

    some text </p> some text </p> some text </p> some text </p> some text </p>.
    

    should become that

    some text </p> some text </p> some text </p> some text </p>"," some text </p>.
    

    If so,

    find what:^((.+?</p>){4})(.+)$
    

    and

    replace with:\1","\2\.
    

    \1 gets what is matched by (.+?</p>){4} (text followed by </p> four times)
    \2 gets what is matched by (,+)
    . actually I don’t know why it is needed as it should have been matched with (.+), but it didn’t

    Cheers
    Claudia



  • Thank very much for your response. The solution works partially. The problem is best seen using the example below:

    some text1 </p> some text2 </p> some text3 </p> some text4 </p> some text5 </p> some text6 </p>.

    Result is

    some text1 </p> some text2 </p> some text3 </p> some text4 </p>"," some text4 </p>.

    I need do not change anything in my article, just insert “,” after 4th paragraf

    some text1 </p> some text2 </p> some text3 </p> some text4 </p> “,” some text5 </p> some text6 </p>.

    Best regards,
    Radu



  • Hello Radu Lucian,

    There is a small mistake in the search regex, proposed by Claudia, which concerns grouping. Moreover, the form \., at the end of the replacement regex seems useless ! Indeed :

    • The contents of the external group 1 is (.+?</p>){4}
    • The contents of the inner group 2 is .+?</p>
    • The contents of group 3 is .+, before the $ anchor

    So, I suppose that Claudia thought about the right syntax below :

    Find what    : ^((.+?</p>){4})(.+)$
    
    Replace with : \1","\3
    

    At that point, the Claudia’s solution works quite well, if your file contains less than 8 strings </p>, per line. Otherwise, and if you want to add the string "," after each group ----n----</p>---n+1---</p>---n+2---</p>---n+3---</p>, you need to omit the anchor ^, at the beginning of the regex.

    Here is, below, an second solution, which searches for the empty string, located just after each 4th string </p> :

    Find what    :   (.*?</p>){4}\K
    
    Replace with :   ","
    

    Notes :

    • Before running this search/replacement, you must move the cursor at the beginning of a line. Then, use, ONLY, the Replace All button
      ( Do NOT use the Replace button, if your search regex contains any \K syntax )

    • The \K regex syntax resets the position of the regex engine, forgetting anything that was previously matched. As nothing follows the \K form, it, only, searches for an empty string, after 4 ranges .*?</p>

    • Remember the difference between the two syntaxes .*</p> and .*?</p> :

      • The former searches, from the cursor location, any text, even empty, till the last string </p> found

      • The later searches, from the cursor location, any text, even empty, till the first string </p> found

    • Finally, it, simply, replaces the empty string with the literal string ","

    So, for instance, the text

    ---1---</p>---2---</p>---3---</p>---4---</p>---5---</p>---6---</p>---7---</p>---8---</p>---9---</p>---10---</p>
    

    will be changed into :

    ---1---</p>---2---</p>---3---</p>---4---</p>","---5---</p>---6---</p>---7---</p>---8---</p>","---9---</p>---10---</p>
    

    Best Regards,

    guy038



  • Hello guy038,

    thank you for helping out and correcting my mistake - very much appreciated.
    I knew that there is a problem but didn’t saw the wood because of the trees.
    Sometime it isn’t good to do many tasks in parallel but actually I’m doing this
    to train myself to become better in multitasking.

    Sorry Radu Lucian for not being precise enough.

    Cheers
    Claudia



  • Thank you all for your help. I solved my problem with your help, I really appreciate that.

    Best Regards,
    Radu


Log in to reply