Regex: Select only the first instance of search results / first match

Alan Kilborn

Nice treatment, Peter. +1 (or more).
It’s an analysis by a thinking human being that has demonstrated he is capable of learning, adapting, and growing – wouldn’t it be great if everyone was like that?
Thank you on behalf of 32.6K readers (if I may be so bold as to speak for them) for this and your other thoughtful and thorough contributions to the Notepad++ user Community, and of course the N++ user manual.

Alan Kilborn

@PeterJones

BTW, I solved it on my own as well, for my own “pleasure”.
But I wasn’t going to post it, punishing, I guess, the 32.6K readers that were on the edge of their seat waiting for it – at the expense of those that I didn’t really want to have it. But since you let the cat out of the bag, perhaps it is instructive to see a different approach:

(?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)

It seems to work; maybe there are holes.

Terry R

@Alan-Kilborn said in Regex: Select only the first instance of search results / first match:

BTW, I solved it on my own as well, for my own “pleasure”.

I also had, using @PeterJones solution for the “first” instance, removing JUST 1 character. Maybe mine also has holes.
(?s)\A.*<tr>\s*\K.*?(\s*</tr>)
So turning a non-greedy regex into a greedy one. It firstly grabs everything, then backs up until the <tr>…</tr> sequence is true. Even the \A sequence could be removed IF the cursor were in the first position of the open file.

Terry

PeterJones

@Terry-R and @Alan-Kilborn ,

Those are so much simpler than mine! Congrats! 🎉👏👍

Anyway, I am still glad I presented my solution, as it hopefully shows future readers a thought process that can arrive at a working regex, even if it’s not the simplest or most efficient.

Alan Kilborn

@PeterJones said in Regex: Select only the first instance of search results / first match:

…so much simpler…

Well, maybe.
But nothing is going to beat your discussion of your thought process.
An important factor in a good solution.

I’ve always thought of the ((?!UNWANTED).)* construct as somewhat “expensive”, but maybe that’s just because it “feels” complicated, but it would take a true regex genius like @guy038 to discuss that.

@Terry-R

Nice one as well!

Alan Kilborn

@Terry-R

I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?

Peter’s and my regexes only matched what was inside; not sure if you were solving something Vasile wanted or not with that – not going back to read/revisit it! – but I took the liberty of tweaking yours a bit so it matches what ours does:

(?s)\A.*<tr>\K.+?(?=</tr>)

and that appears to be the shortest matching regex thus far.

Terry R

@Alan-Kilborn said in Regex: Select only the first instance of search results / first match:

I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?

As I said it was from @PeterJones solution for the first instance. Thus in his post:

FIND = (?s)\A.?<tr>\s\K.?(\s</tr>)
REPLACE = new contents$1
MODE = regular expression
REPLACE ALL
then I get

So the replacement text would have been new contents$1, again same as the first instance solution. Sorry forgot to mention that.

Terry

Vasile Caraus

This post is deleted!

Vasile Caraus

This post is deleted!

Vasile Caraus

so, conclusion. I select all regex from the las converstion:

Select and replace the first instance:

SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)(?=$)
REPLACE BY: NEW CONTENT $1

or

SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
REPLACE BY: NEW CONTENT $1

Select and replace the last instance:

SEARCH: (?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)
REPLACE BY: \r NEW CONTENS $1 \r

or

SEARCH: (?s)\A.*<tr>\K.+?(?=</tr>)
REPLACE BY: \r NEW CONTENS $1 \r

WORKS. Thanks a lot friends.

Alan Kilborn

This all seems rather “special case”.
This <tr> and </tr> junk…

To be generic, that is, a roadmap for other interested parties to use, why not specify it like this:

Match only the first occurrence in a file of a regular expression RE:

(?s)\A.*?\KRE

Match the last occurrence of a regular expression RE:

(?s)\A.*(RE).*?\K\1

Of course, clearly the RE has to be something a bit more specific than (example) .., but these seem to mostly work to achieve the goal.

guy038

Hello, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

IMPORTANT : I wrote this post, after reading posts from the banner 4 YEARS LATER till the @peterjones’s post, below :

https://community.notepad-plus-plus.org/post/62964

But I going to add a second post, after reading the last recent solutions ! Sorry for my incomplete work !

First, @vasile-caraus, I totally agree to @alan-kilbron’s comment on your attitude ! Not very fair and nice to @Terry-r, which was trying to help you :-((

Seemingly, you quite know, by now, the powerful of regexes, regarding text manipulations. And if you had studied, seriously, some regex tutorials, you would not have spoken about that regex (?s)\z.*?<tr>\s*\K.*?(\s*</tr>) which is a complete nonsense !

For instance, from the two pages of the Regular-expressions.info site, below, you had understood, at once, that the \z syntax always comes at the very end of a regex expression or, possibly, before an alternation symbol | !!

https://www.regular-expressions.info/anchors.html

https://www.regular-expressions.info/refanchors.html

Now, I slightly simplified the @peterjones’s search regex, which searches for the first element <tr> ••••• </tr>, of an HTML page :

SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>)

In return, if your replacement regex is :

The expression Here is the NEW text, you’ll get the simple text

 </tr>Here is the NEW text</tr>

The expression is \r\nHere is the NEW text\r\n the output text will be :

<tr>
Here is the NEW text
</tr>

Tick the Wrap around option
Click on the Replace All button, exclusively !

Now, to search for the last element <tr> ••••• </tr>, of an HTML page, use the following regex :

SEARCH (?s-i)<tr>\K((?!<tr>).)*?(?=</tr>((?!<tr>).)*?\z)

Note that I use exactly the scheme proposed by @Peterjones :


- find from <tr> to </tr> ( NOT included )          =>    (?s-i)<tr>\K •••••••••• (?=</tr> •••••••••• )
                                                                           ^                 ^    ^
                                                                           |                 |    |
- WITHOUT any contained <tr>                        =>    ((?!<tr>).)*? ---•                 |    |
																							 |    |
- FOLLOWED by anything that’s NOT a <tr>            =>    ((?!<tr>).)*? ---------------------•    |
																								  |
- until the VERY END of the file                    =>    \z -------------------------------------•

To All :

You could ask me : why the regex to search for the last <tr> ••••• </tr> block is more complicated than the one to search for the first one ?

This is because of the general direction used by the regex engine : from LEFT to RIGHT !

Indeed, when we search for (?s-i)\A.*?<tr>, part of the first regex, the range of any char (?s).* with the lazy quantifier ? is then extended to the first occurrence of the string <tr> and means that, necessarily, this range cannot contain any <tr> inside !
Similarly, the regex (?s).*?(?=</tr>) would search for any range of any char, possibly empty, till the nearest string </tr>, meaning, implicitly, that this range of chars cannot contain a </tr> string
Whereas, when searching the last <tr> ••••• </tr> block, as our reference is the anchor \z ( very end of current file ), we must build up the regex, using a kind of back-propagation method :
- Starting from the very end of file
- Moving back, through characters without any <tr> string
- Till a </tr> string
- Moving back, again, through characters without any <tr> string
- Till a <tr> string

Of course, I assume that any <tr> correctly ends with </tr> !

Test these two regexes against this sample, derived from Peter’s one, which contains 4 blocks </tr> •••• </tr> :

<html><body>
<table>
<tr>
get rid of stuff, in case of \A anchor, including <embedded/> <tags/>
</tr>
<tr>
keep stuff including <embedded/> <tags/>
</tr>
<tr>
keep stuff including <embedded/> <tags/>
</tr>
<tr>
get rid of stuff, in case of \z anchor, including <embedded/> <tags/>
</tr>
</table>
</body>
</html>

The first regex, with the \A syntax should replace the first block, only and the last regex, with the \z syntax, should replace the fourth and last <tr> block

Best Regards,

guy038

P.S. :

@vasile-caraus, note that I’m willing, and probably, all people involved in that discussion, to help you if you have difficulty understanding a specific part of a regex tutorial, that you have decided to study. A different perspective will certainly be very useful to you … and others ;-))

guy038

Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

My God !! Of course, the @terry-r’s regex is just magic and so simple ! Congratulations, Terry ;-)) How could we not think of it ??

If I adapt Terry concept to the regexes of my previous post, everything becomes crystal clear :

SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>) to search ( and replace ) the first <tr> ••••• </tr> block

SEARCH (?s-i)\A.*<tr>\K.*?(?=</tr>) to search ( and replace ) the last <tr> ••••• </tr> block

As usual, tick the Regular expression and Wrap around options and click on the Replace All button, exclusively

@vasile-caraus, this demonstrates, in a masterful way, that things can be skillfully solved by other people than me and moreover… by @terry-r !!

Now, @alan-kilborn you said :

Match the last occurrence of a regular expression RE:

(?s)\A.*(RE).*?\K\1

But, unless I’m mistaken, doesn’t this regex, below, do the same search ?

(?s)\A.*\KRE

Best regards,

guy038

Terry R

@guy038 said in Regex: Select only the first instance of search results / first match:

Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
My God !! Of course, the @terry-r’s regex is just magic and so simple !

I feel like I’m being rewarded for something I ~~stole~~ borrowed now. ;-)) All I did was point out the marvellous creation of @PeterJones and how by the absence of a single character it turns one thing into another.

But hey, I’m happy that collectively we can show there are many answers, all work in various ways.

Terry

Alan Kilborn

@guy038 said in Regex: Select only the first instance of search results / first match:

But, unless I’m mistaken, doesn’t this regex, below, do the same search ?
(?s)\A.*\KRE

Yes, indeed.
That’s what I get for dabbling in the area of another master! :-)

Vasile Caraus

@guy038 thanks a lot !

dr ramaanand

@Vasile-Caraus The regular expression (?s)\A.*?\Kstring(?:.*?)?> helps find the very first occurrence of a string and if you want to find the first occurrence of a tag, say TAG_2, AFTER the first occurrence of another tag, say TAG_1, my generic regex becomes :

(?s-i)\A.*?<TAG_1(?: .*?)?>.*?\K<TAG_2(?: .*?)?> as per @guy038

dr ramaanand

On testing the above, I observed that both the above regular expressions work only for tags or strings that begin with a < and end with a > - so if you are searching for a string between inverted commas, to find the first string, you should use the regular expression (?s)\A.*?\K"string(?:.*?)?"

dr ramaanand

This post is deleted!