Regex: Find duplicate tags/words from some tags



  • Hello, I have this sentence “The use of private-label products by small companies has grown

    The problem is, somehow, there are more <em> and </em> in this sentence,

    <p class="text_obisnuit2"><em>The use of </em>private-label products by small <em>companies has grown.</em></p>
    

    So, i need to find all this kind of sentences which have more <em> and </em>. So, after regex, the imput should be:

    <p class="text_obisnuit2"><em>The use of private-label products by small companies has grown.</em></p>


  • Hi, @vasile-caraus,

    Due to lack of additional information, I supposed two points :

    • Any <em>.......</em> range is located in a same line

    • All the <em>.......</em> ranges are simply consecutive ones and are NOT nested. So the case, below, never occurs !

    <em>.....<em>..... </em>.........</em>


    Then a possible regex S/R could be :

    SEARCH (?-s)(^.*?<em>)|</?em>(?=.*</em>)

    REPLACE ?1\1

    So, the text, below :

    ....<em>..........</em>.....<em>..........</em>............<em></em>......<em>.............</em>...........<em>....</em>...
    

    will be changed into :

    ....<em>.......................................................................</em>...
    

    Cheers,

    guy038



  • hello guy, your regex works great, but it selects all text and all tags from my html pages. And I want only this particular tag:

    <p class="text_obisnuit2"><em>...</p>



  • Hi, @vasile-caraus, and All,

    Ah, OK ! So, I propose two consecutive regex S/R :

    A)

    SEARCH (?-s)^\h*<p class="text_obisnuit2"><em>.+

    REPLACE $0#

    which adds the specific character #( acting as a marker ) if the line begins with the string <p class="text_obisnuit2"><em>, possibly preceded by some blank characters

    B )

    SEARCH (?-s)(^.*?<em>)|</?em>(?=.*</em>.+>#)|#

    REPLACE ?1\1

    which deletes the specific # marker as well as any <em> or </em> tag, located between the outer <em>........</em> range, ONLY IF exists, further on, a last </em> tag and a # symbol, as last character of the current line !

    Of course, you may choose any other marker character. It just has to be not already present, in your file !

    Preferably, tick the Wrap around option

    Cheers,

    guy038



  • @guy038

    can u gife me your acc facebook plase



  • works great, thanks a lor Guy !


Log in to reply