How to find two strings then replace the first set with the second set and also keep the second set of strings ?

dr ramaanand

<h2>What is the best cure for cancer without surgery in Bangalore ?</h2>
<h2>What is the best cure for cancer in Bangalore without surgery ?</h2>
<h2>What is the best cure for carcinoma without surgery in Bangalore ?</h2>
Some paragraphs here
<p>What is the best cure for cancer without surgery in Bangalore ?</p>
<p>Homeopathy has the best cure for cancer without surgery in Bangalore.</p>

should become

<h2>What is the best cure for cancer without surgery in Bangalore ?</h2>
<h2>Homeopathy has the best cure for cancer without surgery in Bangalore.</h2>
Some paragraphs here
<p>What is the best cure for cancer without surgery in Bangalore ?</p>
<p>Homeopathy has the best cure for cancer without surgery in Bangalore.</p>

dr ramaanand

@dr-ramaanand If I tick the Regular expression mode and type <h2[^<>]+>\K(?>.*?</h2>\s*<h2[^<>]*+>)+.*?(?=</h2>)?=[\S\s]*?<p[^<>]*>(What[^<>].*)</p>\R<p[^<>]*>(.*)</p> in the Find field, I am able to find both strings but I need help with what I should type in the Replace All field. I am trying to replace this for multiple files of a folder which is why I am forced to ask for help here!

PeterJones

@dr-ramaanand ,

Since you know how to figure out how to match certain things, I’m just going to give you the simplified concept, rather than trying to give you the full solution.

The simplified is that you want to match from a to b and replace it with what’s found later, from y to z.

a to b
something else
y to z

To accomplish this, you need your search regex to cover the stuff after b, but not to make it part of the match, so that it doesn’t replace what comes after b: the concept needed is to put a numbered capture group inside a lookahead.

With the simplified data I showed,
FIND = (?s)a.*?b(?=.*(y.*?z))
REPLACE = $1

after the replacement will be:

y to z
something else
y to z

explanation

(?s)a.*?b(?=.*(y.*?z))
^^^^ = . matches newline

(?s)a.*?b(?=.*(y.*?z))
    ^^^^^ = the actual match will be everything from a to the first subsequent b

(?s)a.*?b(?=.*(y.*?z))
         ^^^^^^^^^^^^^ this is a lookahead, so nothing in this section will be replaced by your REPLACE WITH

(?s)a.*?b(?=.*(y.*?z))
            ^^ = this is the "everything in between"

(?s)a.*?b(?=.*(y.*?z))
              ^^^^^^^ = this is numbered capture group #1

(?s)a.*?b(?=.*(y.*?z))              
               ^^^^^ = everything from y to the subsequent z

dr ramaanand

@PeterJones OK, thanks, I could manage the rest.