How to delete a duplicate paragraph at a particular place in multiple files
-
<H1…>Heading1</H1>
<H2…>Some text</H2>
<H2…>Different text</H2>
<H2…>Altogether different text</H2>
Some paragraphs here
</P> (or </ul>)
<P…><span…><b>Please E-mail us</b></span></P>
<H2…>Heading that should not be reproduced</H2>
Some paragraphs here
</ul> (or </P>)
<P…>We have</P>
<P …>Some text</P>
<P …>Different text</P>
<P …>Same text</P>
<P …>Same text</P>
<P…><b><span…>Please E-mail us</span></b></P> -
@dr-ramaanand said in How to delete a duplicate paragraph at a particular place in multiple files:
<H1…>Heading1</H1>
<H2…>Some text</H2>
<H2…>Different text</H2>
<H2…>Altogether different text</H2>
Some paragraphs here
</P> (or </ul>)
<P…><span…><b>Please E-mail us</b></span></P>
<H2…>Heading that should not be reproduced</H2>
Some paragraphs here
</ul> (or </P>)
<P…>We have</P>
<P …>Some text</P>
<P …>Different text</P>
<P …>Same text</P>
<P …>Same text</P>
<P…><b><span…>Please E-mail us</span></b></P>For the above test string, if I put
(?s)\A.+?\K((<h2.+?</h2>\R)+).*\K(?=<p.*?</p>\R<p.*?Please\s*E-mail\s*us)
in the Find field and select the Regular expression mode, I can find (and remove) a paragraph just before the paragraph with the “Please E-mail us” text as that has the same text as the paragraph above it in most files of a folder. However, in some cases (in other files), it doesn’t have the same text, so how do I avoid finding/removing it if it doesn’t have the same text? I believe this paragraph with the same text was added by Notepad++ during my previous find and replace exercise due to a bug. -
@dr-ramaanand Please don’t tell me to do it on my own. I have tried and failed already.
-
@dr-ramaanand said in How to delete a duplicate paragraph at a particular place in multiple files:
Please don’t tell me to do it on my own. I have tried and failed already.
Probably the best thing to do is to seek help on a site that specializes in regular-expression help.
-
@Alan-Kilborn I asked at www.regex101.com and they told me to put
(?s)^(<p.*?<\/p>\R)(\1<p.*?Please\s*E-mail\s*us)
in the Find field, select the Regular Expression mode and$2
in the Replace field and hit “Replace All” and all the duplicate paragraphs disappeared. -
@dr-ramaanand said in How to delete a duplicate paragraph at a particular place in multiple files:
and all the duplicate paragraphs disappeared.
So that’s good, right?
What you wanted? -
@Alan-Kilborn yes and thanks for your time also. Please keep this community going as there are lots of people who will ask for solutions here (notepad++ community)!
-
Our goals are to get you the best help available.
We can answer regex questions here, but the same/similar questions from the same poster get tiring as we are interested in much more diverse Notepad++ topics than just data conversion with regex.
So, if we can redirect you to a site where they are excited about regex, and only regex, well, we’ll do that.
I think maybe you’ve found a site for that now.
But I encourage you to learn to do it yourself – if someone else can write something that works, then so can you! -
@Alan-Kilborn I have learnt quite a bit but not everything which is why I seek solutions here. Notepad++ has a “delete duplicate lines” in an open file feature which is why I asked for a solution here first.
-
@Alan-Kilborn I can even explain the above. In that RegEx,
(?s)^(<p.*?<\/p>\R)(\1<p.*?Please\s*E-mail\s*us)
-(?s)
means “search”,^
means at the beginning of the line,(<p.*?<\/p>\R)
means the first captured group, from<p...................</p>
including the next line (which is done with the\R
) and the rest is the second captured group in which\1
is to search for a duplicate of the first captured group, followed by another<p...................</p>
string, followed by, “Please E-mail us”. The\s*
before and after the, “E-mail” will make the words, “Please E-mail us” to be captured even if they are all on different lines (as well as if they are all on the same line).
The$2
in the Replace field (“Replace in files” in this case) is to reproduce the second captured group in the final result.