How to stop searching or replacing after a string?

PeterJones

@dr-ramaanand said in How to stop searching or replacing after a string?:

@PeterJones I went to the FAQ section and searched for, “within a range”

Sorry, I was on my phone browser at the time, and thought I had given enough information for someone to find it:

FAQ section
I said “find the generic regex”. So search for “generic”. => Generic Regex Formulas
I said “range” but the entry’s name is actually “zone”; the concept is the same, and reading the descriptions from the entry you went to in step 2 should have gotten you to the right place, even if I didn’t use the exact words from the FAQ. So go to Replacing in a specific zone of text

That page gives the regex formula you will have to use.

dr ramaanand

@PeterJones https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text says to use the RegEx (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR), so perhaps (?-si:<h2|(?!\A)\G)(?s-i:(?!</h2).)*?\K(?-si:</p>\R<p[^<>]*>) will find what I want to find. Then I probably have to use, </h2>\r\n<h2> in the, “Replace All” field. Please comment!

mkupper

@dr-ramaanand, It’s not clear what you are trying to do. For example I saw that:

Line 1 starts with an <h2> and ends with an </p>.
Line 5 starts with a <p> and ends with an </h2>.

Is the intent to just find and correct those two lines or are you trying to do more?

If the goal is to find and correct those two lines then one option is:

** Search: (?i)<([a-z][a-z0-9]*)>([^<]*)</(?!$1)[a-z][a-z0-9]*>
Replace: <$1>$2</$1> **

I’ll unpack that, reading from left to right:

(?i) - case insensitive matches - this is optional
<([a-z][a-z0-9]*)> Match an HTML tag that starts with a letter and is followed by either letters, digits, and dashes. If you only care about <p> and <h2> then this can be <(p|h2)>.
([^<]*)</ - Skip everything that is not a < and then expect a < followed by a /.
(?!$1) - Match anything that is not the HTML tag we saw in the <([a-z][a-z0-9]*)> (or <(p|h2)> if you use that version). The (?!...) style matches are peculiar in that they are what are known as lookahead matches. They match for searching but are not part of the “match” when replacing.
[a-z][a-z0-9]*> - This deals with that (?!...) style matches are lookaheads by matching anything that looks like an HTML tag. You could use (p|h2)> here instead if you want.

The replacement part has:

<$1> - Generate a starting tag which is < followed by the first HTML tag, followed by the trailing >.
$2 - Generate whatever was between the starting and ending tags.
</$1> - Generate a closing tag using the same tag name as the starting tag.

PeterJones

@dr-ramaanand said in How to stop searching or replacing after a string?:

Please comment!

What is there to comment on? If you plug in those FIND and REPLACE expressions, and then hit REPLACE ALL, it does the replacement you want, to the best of my ability to understand.

For example, if it started with

<h2>What is the best cure for warts in Boston without cutting ?</p>
<p>What is the best cure for warts without cutting in Boston ?</p>
<p>What is the best cure for warts in Boston without burning ?</p>
<p>What is the best cure for warts without burning in Boston ?</p>
<p>What is the best warts treatment in Boston without cutting ?</h2>
<p>What is the best cure for warts in Boston without cutting ?</p>
<p>What is the best cure for warts without cutting in Boston ?</p>
<p>What is the best cure for warts in Boston without burning ?</p>
<p>What is the best cure for warts without burning in Boston ?</p>
<p>What is the best warts treatment in Boston without cutting ?</p>

clicking REPLACE ALL with the expressions you figured out from the formula give me

<h2>What is the best cure for warts in Boston without cutting ?</h2>
<h2>What is the best cure for warts without cutting in Boston ?</h2>
<h2>What is the best cure for warts in Boston without burning ?</h2>
<h2>What is the best cure for warts without burning in Boston ?</h2>
<h2>What is the best warts treatment in Boston without cutting ?</h2>
<p>What is the best cure for warts in Boston without cutting ?</p>
<p>What is the best cure for warts without cutting in Boston ?</p>
<p>What is the best cure for warts in Boston without burning ?</p>
<p>What is the best cure for warts without burning in Boston ?</p>
<p>What is the best warts treatment in Boston without cutting ?</p>

That is the only logical behavior I can come up with from your initial description: since your initial data showed 10 lines of input and 13 lines of output, I had to assume you were just giving the general idea, and not an exact “before” and “after”, because that “after” would be impossible. And when i tried your expression from your original followon post, and try it with the original “before” multiple times (until it stops finding matches), I am left with the identical ten lines to the “after” that I showed.

So that one regex with hitting REPLACE ALL once does the same thing as your original regex with four REPLACE ALLs.

So again, why ask for comment. It works.

mkupper

@PeterJones said in How to stop searching or replacing after a string?:

So again, why ask for comment. It works.

I think I now understand what @dr-ramaanand was struggling with which was
Search: \?</p>\R<p[^<>]*>(.*)(?=<\/h2)
Replace: ?</h2>\r\n<h2>$1

The OP was looking for lines that end with </p> followed by lines that start with <p> (done in a convoluted way) that in turn ended with lines that end with </h2>. The ending match was a lookahead.

In the replacement part the end of that first line, which had ended in </p> is replaced with </h2> and that second line which started with <p> now starts with <h2>.

If the OP does the search/replace over and over it ends up walking the unbalanced p / h2 lines up line with each round of search/replace without ever fixing the underlying issue.

The OP’s expression also did not detect nor fix line 1 with its unbalanced <h2> ...</p> until the search/replace had been repeated enough times that it accidentally made line 1 become a balanced <h2> ...</h2> as the lower down unbalanced line was walked up to lines 1 and 2.

The puzzle is that in the OP’s “after” text black three additional lines were added between the expected lines 5 and 6. I suspect the OP, in the fog of war, was copy/pasting the blocks of lines and then trying various expressions.

dr ramaanand

@mkupper @PeterJones Akismet.com wasn’t letting me post all the 13 lines, so I deleted some lines. I however, did not delete the 3 lines from the output results. I am sorry about that. I will check out the RegEx you helped me figure out as soon as I get back home. Thanks a lot!

dr ramaanand

@PeterJones it is working perfectly. Thanks a lot! For your information, www.regex101.com suggests to use the RegEx, “(?-si:<h2|(?!\A)\G)(?s-i:(?!<\/h2).)*?\K(?-si:<\/p>\R<p[^<>]*>)”

Terry R

@dr-ramaanand
You need to understand that there are many flavours of regular expression engines used in the world. Regex101.com can test using some of them, however I don’t think it has access to the exact regex engine used with Notepad++.

The site itself is very useful though in describing a particular regular expression, but don’t rely on the expression defined as correct on that site working in Notepad++.

Read the FAQ post on it here.

Terry

PS your regex changes equated to “escaping” what regex101.com described as meta-characters (special characters). To see the actual special characters as used within Notepad++ see the online manual reference here. Note the / is not one of those special characters and therefore does not need escaping with Notepad++. The regex will however still work as the “escape” is basically ignored for a normal character. However it’s not a good idea to just use a regex as confirmed by regex101.com without fully understanding it as it could be possible for the regex to make unwanted changes if used in Notepad++

guy038

Hi, @dr-ramaanand, @peterjones, @mkupper, @terry-R and All,

@dr-ramaanand, in a nutshell, I would say :

From this INPUT text, relative to the beginning of our GNU license, pasted in a new tab :

<h2>The licenses for most software are designed to take away your freedom to share and change it.</p>
<p>By contrast, the GNU General Public License is intended to guarantee your freedom to share</p>
<p>and change free software--to make sure the software is free for all its users.</p>
<p>This General Public License applies to most of the Free Software Foundation's software</p>
<p>and to any other program whose authors commit to using it.</h2>
<p>(Some other Free Software Foundation software is covered by the GNU Library General</p>
<p>Public License instead) You can apply it to your programs, too.</p>
<p>When we speak of free software, we are referring to freedom, not price.</p>
<h2>Our General Public Licenses are designed to make sure that you have the freedom to distribute</p>
<p>copies of free software (and charge for this service if you wish), that you receive source code</p>
<p>or can get it if you want it, that you can change the software or use pieces of it in new</p>
<p>free programs; and that you know you can do these things.</h2>
<p>To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights.</p>
<p>These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.</p>

<p>For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have.</p>
<p>You must make sure that they, too, receive or can get the source code</p>
<h2>and you must show them these terms so they know their rights.</p>
<p>We protect your rights with two steps: (1) copyright the software, and (2) offer you this</p>
<p>license which gives you legal permission to copy, distribute and/or modify the software.</h2>

<h2>Also, for each author's protection and ours, we want to make certain that</p>
<p>everyone understands that there is no warranty for this free software.</h2>
<p>If the software is modified by someone else and passed on, we want its recipients to know that what they have</p>
<p>is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.</p>

Here is the minimal syntax of the generic S/R to get what you expect to :

Move to the very beginning of the file ( Ctrl + Home ) (IMPORTANT )
Open the Replace dialog
Untick all box options
SEARCH (?-i:<h2|(?!\A)\G)(?s-i:(?!</h2).)*?\K(?-i:/p>\R<p)
REPLACE /h2>\r\n<h2
Select the Regular expression search mode
Click on the Replace All button

=> Voila ! You should get your expected OUTPUT text :

<h2>The licenses for most software are designed to take away your freedom to share and change it.</h2>
<h2>By contrast, the GNU General Public License is intended to guarantee your freedom to share</h2>
<h2>and change free software--to make sure the software is free for all its users.</h2>
<h2>This General Public License applies to most of the Free Software Foundation's software</h2>
<h2>and to any other program whose authors commit to using it.</h2>
<p>(Some other Free Software Foundation software is covered by the GNU Library General</p>
<p>Public License instead) You can apply it to your programs, too.</p>
<p>When we speak of free software, we are referring to freedom, not price.</p>
<h2>Our General Public Licenses are designed to make sure that you have the freedom to distribute</h2>
<h2>copies of free software (and charge for this service if you wish), that you receive source code</h2>
<h2>or can get it if you want it, that you can change the software or use pieces of it in new</h2>
<h2>free programs; and that you know you can do these things.</h2>
<p>To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights.</p>
<p>These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.</p>

<p>For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have.</p>
<p>You must make sure that they, too, receive or can get the source code</p>
<h2>and you must show them these terms so they know their rights.</h2>
<h2>We protect your rights with two steps: (1) copyright the software, and (2) offer you this</h2>
<h2>license which gives you legal permission to copy, distribute and/or modify the software.</h2>

<h2>Also, for each author's protection and ours, we want to make certain that</h2>
<h2>everyone understands that there is no warranty for this free software.</h2>
<p>If the software is modified by someone else and passed on, we want its recipients to know that what they have</p>
<p>is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.</p>

Best Regards,

guy038

guy038

Hello, @dr-ramaanand and All,

I said :

Here is the minimal syntax of the generic S/R …

However, I received a message from @terry-R, by chat, who, seemingly have found out a shorter syntax that mine which looks as the true minimal possible syntax ;-))

@terry-r, from my previous solution :

SEARCH (?-i:<h2|(?!\A)\G)(?s-i:(?!</h2).)*?\K(?-i:/p>\R<p)
REPLACE /h2>\r\n<h2

I could have slightly changed this S/R by :

SEARCH (?-i:<h2>|(?!\A)\G)(?s:(?!<).)*?\K(?-i:</p>(\R)<p)
REPLACE </h2>$1<h2>

Which, in turn, can be expressed as :

SEARCH (?-i:<h2>|(?!\A)\G)[^<]+\K(?-i:</p>(\R)<p)
REPLACE </h2>$1<h2>

If we assume that there is a single < char near the very end of each line ( remember that, AFTER the replacement, the regex position is RIGHT AFTER the string <p> )

Note that this new solution is quite closed to your solution, exposed in your chat, if we do not use non-capturing groups nor the the -i modifier to ensure the search of valid HTML tags :

SEARCH (<h2>|\G)[^<]+\K</p>(\R)<p>
REPLACE </h2>${2}<h2>

How, this new formulation works with my example ?

SEARCH   (?x) (?-i: <h2> | (?!\A) \G )    [^<]+    \K  (?-i: </p> ( \R ) <p> )
                    ----                  -----              ---------------
                    BSR                 ESR = '<'                  FR


First, as we start at VERY BEGINNING of file, the regex looks for a FIRST '<h2>' string


<h2>The licenses for most software are designed to take away your freedom to share and change it.  </p>
<h2>-----------------------------------------[^<]+-----------------------------------------------\K</p>CRLF

<p>  By contrast, the GNU General Public License is intended to guarantee your freedom to share  </p>
<p>\G-----------------------------------------[^<]+--------------------------------------------\K</p>CRLF

<p>  and change free software--to make sure the software is free for all its users.  </p>
<p>\G-----------------------------------------[^<]+--------------------------------\K</p>CRLF

<p>  This General Public License applies to most of the Free Software Foundation's software  </p>
<p>\G--------------------------------------------------------------------------------------\K</p>CRLF

<p> and to any other program whose authors commit to using it. </h2>
<p>

    ( On THIS line, impossible to find the string '</p>CRLF<p>', near the END of line, so the NEXT \G syntax is NOT true anymore
      Thus, NO MORE replacement occurs till a NEW '<h2>' string happens and the NEXT THREE lines remained UNTOUCHED ! )

<p>(Some other Free Software Foundation software is covered by the GNU Library General</p>
<p>Public License instead) You can apply it to your programs, too.</p>
<p>When we speak of free software, we are referring to freedom, not price.</p>

<h2>Our General Public Licenses are designed to make sure that you have the freedom to distribute  </p>
<h2>-----------------------------------------[^<]+-----------------------------------------------\K</p>CRLF...

On this LAST line, the RE-SYNCHRONIZATION occurs because a '<h2>' string is found and the cycle RESTARTS !

Best Regards,

guy038

dr ramaanand

@guy038 there is only one, “<h2>” and one, “</h2>” in the file, so what @PeterJones helped me, “guess”, based on what you typed in an earlier thread seems to be good enough

dr ramaanand

@PeterJones using “(?-si:<h2|(?!\A)\G)(?s-i:(?!<\/h2).)*?\K(?-si:<\/p>\R<p[^<>]*>)” in the “Find what” field and “</h2>\r\n<h2>” in the “Replace All” field is working perfectly. Thank you very much. Happy New Year!

dr ramaanand

@PeterJones @guy038 I am sorry to say that I tested this on my laptop just now as I was busy with the New Year celebrations and observed that the RegEx which Peter Jones told me to use is also replacing only one </p> and one <p.............. > on the next line instead of replacing all at once.

PeterJones

@dr-ramaanand said in How to stop searching or replacing after a string?:

@PeterJones @guy038 I am sorry to say that I tested this on my laptop just now as I was busy with the New Year celebrations and observed that the RegEx which Peter Jones told me to use is also replacing only one </p> and one <p.............. > on the next line instead of replacing all at once.

I’m sorry to say that my experience is different than yours. Replace All with the regex I suggested works all at once.

dr ramaanand

@PeterJones it looks like you took a long time to reply, so I asked at www.regex101.com and was told to use, “(?-si:<h2[^<>]*+>|\G)[^<>]*+\K<\/p>\s*+<p[^<>]*+>” which is replacing all that I want replaced, all at once!

dr ramaanand

This post is deleted!

PeterJones

@dr-ramaanand ,

It was less than an hour. if that’s a “long time” in your mind, then using a free, Community-based service is probably not the right question/answer format for you. I would suggest finding someone to pay to give you instant 24/7/365.2425 support. Because you aren’t going to find anything guaranteed faster in any free online Q&A site.

And as we’ve told you before, regex101 uses a different flavor of regex engine than Notepad++ does, so there is no guarantee that a regex suggested by that site will be compatible with Notepad++. Use at your own risk.

Further, as you have reported multiple times, and as my video showed, the regex shown does work (as you once said, “perfectly”), and you didn’t need my video to know that. If you want to use a different regex, that’s fine, go ahead and use whatever “works” for you. But don’t pretend (and publically state) that it was a defficiency in the regex already given to you or a “slow response” from me or any of the other regulars here who answer questions out of the kindness of our hearts.

dr ramaanand

@PeterJones I am sorry if it offended you. I didn’t mean to.

dr ramaanand

@dr-ramaanand said in How to stop searching or replacing after a string?:

(?-si:<h2[^<>]*+>|\G)[^<>]*+\K<\/p>\s*+<p[^<>]*+>

@guy038 or anybody else, can you please explain what the above RegEx does (since it is working on Notepad++)? It replaces all the </p> and the <p............ > immediately after that, even if it’s on the next line, all at once with </h2> and <h2> on the immediate next line, if I put </h2>\r\n<h2> in the “Replace with” field and hit “Replace All” with the Regular expression mode ticked and the matches newline unticked. The best thing about it is it does not do any replacing after the last </h2> !

mkupper

@dr-ramaanand I think the people on regex101 are using ChatGPT or other artificial intelligence applications to generate regular expressions.

The expressions you get from regex101 are extremely difficult to follow do not come with any explanation of how the logic works or why certain choices were made. It’s likely that the regex101 people also have little to no idea of what the expressions do.

That’s a large part of why using ChatGPT or other artificial intelligence applications is banned on this forum. Ideally, we call can learn from the answers we see posted on this forum.

As I posted earlier, it would be much better for you to so stick with things you know and understand well. It’s ok to push the limits a little at times as you can learn from those. Trying to make sense of what looks like AI generated content is a waste of time.