Regex: Delete all the instances of html tag, except the first one</h1> <article> <h1>Reply to Regex: Delete all the instances of <title> html tag, except the first one on Tue, 05 Dec 2023 17:00:32 GMT</h1> <p>mkupper — Tue, 05 Dec 2023 17:00:32 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/21856">@Hellena-Crainicu</a> I was puzzled by your comment. I suspect you were testing by having the expression you were testing with at the top of the file. In that case the first <code>title</code> was in the expression itself.</p> <p dir="auto">I modified my search expression slightly to replace <code><</code> with <code>\x3c</code> so that we can have the search/replace expression within the file for testing. I put it at the bottom in these examples.</p> <p dir="auto">Here is the test I ran:</p> <h3>Original data</h3> <pre><code class="language-txt"><p>但是减脂期可不一定就意味着每天只吃水煮鸡胸和水煮西蓝花，完全苦行僧一样的生活。如果在营养均衡的三餐之间适当的尝试一些健康的小零食，不仅能为减脂期提供动力和新鲜感，还可以为生活增添不少的趣味呢。</p>除此之外，从营养学角度，适当的加餐可以预防三餐之间出现低血糖的现象，还能防止因为饥饿而在下一餐中暴饮暴食，摄入过多热量的情况发生。</p>因此，健康的小零食不仅有助于完成减脂目标，还能让你的减肥期丰富多彩，何乐而不为呢？下面就来推荐给大家10种好吃又健康的小零食。</p> <title>用正确方式打开 MyGainer增健肌粉！ - MYPROTEIN™ blah bhla blah bhla Home is me blah bhla Payton is your name Search: (?s)(\x3ctitle>.*?\x3c/title>.*?)\x3ctitle>.*?\x3c/title> Replace: $1

###First pass
This is after doing search-replace-all one time. It removed the second title that was on line 5.

但是减脂期可不一定就意味着每天只吃水煮鸡胸和水煮西蓝花，完全苦行僧一样的生活。如果在营养均衡的三餐之间适当的尝试一些健康的小零食，不仅能为减脂期提供动力和新鲜感，还可以为生活增添不少的趣味呢。
除此之外，从营养学角度，适当的加餐可以预防三餐之间出现低血糖的现象，还能防止因为饥饿而在下一餐中暴饮暴食，摄入过多热量的情况发生。
因此，健康的小零食不仅有助于完成减脂目标，还能让你的减肥期丰富多彩，何乐而不为呢？下面就来推荐给大家10种好吃又健康的小零食。
用正确方式打开 MyGainer增健肌粉！ - MYPROTEIN™
blah bhla
blah bhla

blah bhla
Payton is your name


Search: (?s)(\x3ctitle>.*?\x3c/title>.*?)\x3ctitle>.*?\x3c/title>
Replace: $1

###Second pass
This is after doing search-replace-all twice. The first pass removed second title that was on line 5 and the second pass removed the third title that was on line 7.

但是减脂期可不一定就意味着每天只吃水煮鸡胸和水煮西蓝花，完全苦行僧一样的生活。如果在营养均衡的三餐之间适当的尝试一些健康的小零食，不仅能为减脂期提供动力和新鲜感，还可以为生活增添不少的趣味呢。
除此之外，从营养学角度，适当的加餐可以预防三餐之间出现低血糖的现象，还能防止因为饥饿而在下一餐中暴饮暴食，摄入过多热量的情况发生。
因此，健康的小零食不仅有助于完成减脂目标，还能让你的减肥期丰富多彩，何乐而不为呢？下面就来推荐给大家10种好吃又健康的小零食。
用正确方式打开 MyGainer增健肌粉！ - MYPROTEIN™
blah bhla
blah bhla

blah bhla



Search: (?s)(\x3ctitle>.*?\x3c/title>.*?)\x3ctitle>.*?\x3c/title>
Replace: $1

If you watch the status line at the bottom of the search/replace box you will see:

After pass 1: Replace All: 1 occurrence was replaced in entire file
After pass 2: Replace All: 1 occurrence was replaced in entire file
After pass 3: Replace All: 0 occurrences were replaced in entire file

While your examples had the titles on their own lines I had coded to allow them to be anywhere in a line and for them to span lines as that’s what HTML allows. If you want to only support titles on a line by itself then we can add some anchoring:
Search: (?s)^(\x3ctitle>.*?\x3c/title>\R.*?\R)\x3ctitle>.*?\x3c/title>$
Replace: $1

Even that is not perfect as it allows titles to span or more lines. If you insists on only matching titles on one line and not to span them then toggle the dot/EOL spanner flag:
Search: ^(\x3ctitle>(?-s).*?\x3c/title>\R(?s).*?\R)\x3ctitle>(?-s).*?\x3c/title>$
Replace: $1

As you can see, the expression is getting more complicated to deal with the edge cases and requirements.

....... line is the very first one of your file(s) :

Then, I personally, found out two other solutions :

SEARCH **`(?s)(?!\A).*?\R`

REPLACE `Leave EMPTY`**

AND

SEARCH **`(?s)\A(.?\R)(SKIP)(F)|(?1)`

REPLACE `Leave EMPTY`

However, the @coises’s formulation, with the leading modifier `(?s)`

SEARCH `(?s)\R.?`

REPLACE `Leave EMPTY`

is really clever and definitively the best one, as the `\R` syntax is quicker to execute than the negative look-ahead `(?!\A)` anyway and could be of importance if numerous files are concerned !

B) The `.......`** line may NOT be, necessarily, the very first one of your file(s) :

In this case, a solution, derived from my second formulation above, could be :

SEARCH **`(?s)\A.?(.?\R)(SKIP)(F)|(?1)`

REPLACE `Leave EMPTY`**

Best Regards,

guy038

Reply to Regex: Delete all the instances of html tag, except the first one on Tue, 05 Dec 2023 10:42:34 GMT</h1> <p>Hellena Crainicu — Tue, 05 Dec 2023 10:42:34 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/5329">@mkupper</a></p> <p dir="auto">yes, but if I have text before the first <title> instance, it will delete exactly the <title> instances that are not needed.</p> <p dir="auto">try your regex with this example. You will see that the first instance of <title> will be deleted. And I need exactly that one to keep.</p> <pre><code><p>但是减脂期可不一定就意味着每天只吃水煮鸡胸和水煮西蓝花，完全苦行僧一样的生活。如果在营养均衡的三餐之间适当的尝试一些健康的小零食，不仅能为减脂期提供动力和新鲜感，还可以为生活增添不少的趣味呢。</p>除此之外，从营养学角度，适当的加餐可以预防三餐之间出现低血糖的现象，还能防止因为饥饿而在下一餐中暴饮暴食，摄入过多热量的情况发生。</p>因此，健康的小零食不仅有助于完成减脂目标，还能让你的减肥期丰富多彩，何乐而不为呢？下面就来推荐给大家10种好吃又健康的小零食。</p> <title>用正确方式打开 MyGainer增健肌粉！ - MYPROTEIN™ blah bhla blah bhla Home is me blah bhla Payton is your name

Reply to Regex: Delete all the instances of html tag, except the first one on Mon, 04 Dec 2023 04:24:01 GMT</h1> <p>mkupper — Mon, 04 Dec 2023 04:24:01 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/21856">@Hellena-Crainicu</a> Another way to do this is:</p> <p dir="auto">Search: <code>(?s)(<title>.?.?).?
Replace: `$1`

You would need to repeat this until it stops replacing.
In summary:

`(?s)` puts the regexp engine in dot matches newline mode meaning scans for “.” also include end of line characters. Normally a scan for “.” stops at the end of the line.

`(.?.?)` grab the first title and everything after the first title using a non-greedy scan.

`.?` is the second title.

Thus we are saving the first title and everything after it up to the second title and discarding the second title. You will find that as you do the search/replace that it re-positions the cursor meaning the second search replace will save the third and discard the fourth title. Keep repeating. Eventually there will be just one title left and it will always be the first one.

Reply to Regex: Delete all the instances of html tag, except the first one on Sun, 03 Dec 2023 20:31:59 GMT</h1> <p>Terry R — Sun, 03 Dec 2023 20:31:59 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/21856">@Hellena-Crainicu</a></p> <p dir="auto">I agree with <a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/27184">@Coises</a>. In fact I had exactly the same regex. As asked, if it is definitely at the very start of the file, that should work.</p> <p dir="auto">Otherwise if the first <title> isn’t at the very start of the file regex won’t be able to do this in 1 pass. The other option would be to just find the first instance and tag it. then a second pass to remove all other instances and a third pass to remove the tag on the remaining <title>.</p> <p dir="auto">Terry</p> </article> <article> <h1>Reply to Regex: Delete all the instances of <title> html tag, except the first one on Sun, 03 Dec 2023 20:49:03 GMT</h1> <p>Hellena Crainicu — Sun, 03 Dec 2023 20:49:03 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/27184">@Coises</a> super formula, was so easy. Thanks a lot !</p> <p dir="auto">Also, I update with anothe Python code that makes the same thing:</p> <pre><code> import regex def remove_last_title_tags(text): # Find all instances of the `<title>` tag title_tags = regex.findall(r"(?<=^|\n)<title>.?", text, flags=regex.DOTALL) # Replace the last instance of each `` tag with an empty string for i in range(len(title_tags) - 1, -1, -1): if i == 0: continue text = text.replace(title_tags[i], "") return text extracted_content = remove_last_title_tags(extracted_content) </code></pre> </article> <article> <h1>Reply to Regex: Delete all the instances of <title> html tag, except the first one on Sun, 03 Dec 2023 20:26:52 GMT</h1> <p>Coises — Sun, 03 Dec 2023 20:26:52 GMT</p> <p dir="auto"><a class="mention plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/21856">@Hellena-Crainicu</a></p> <p dir="auto">If your example is precise in that the title line you want to keep is the very first line of the file, then you can use the fact that all the other title lines will be preceded by line ending characters; so use:</p> <p dir="auto"><strong><code>\R<title>.?

and replace with an empty string.