Regex: Select only the first instance of search results / first match
-
@Alan-Kilborn said in Regex: Select only the first instance of search results / first match:
BTW, I solved it on my own as well, for my own “pleasure”.
I also had, using @PeterJones solution for the “first” instance, removing JUST 1 character. Maybe mine also has holes.
(?s)\A.*<tr>\s*\K.*?(\s*</tr>)
So turning a non-greedy regex into a greedy one. It firstly grabs everything, then backs up until the <tr>…</tr> sequence is true. Even the\A
sequence could be removed IF the cursor were in the first position of the open file.Terry
-
@Terry-R and @Alan-Kilborn ,
Those are so much simpler than mine! Congrats! 🎉👏👍
Anyway, I am still glad I presented my solution, as it hopefully shows future readers a thought process that can arrive at a working regex, even if it’s not the simplest or most efficient.
-
@PeterJones said in Regex: Select only the first instance of search results / first match:
…so much simpler…
Well, maybe.
But nothing is going to beat your discussion of your thought process.
An important factor in a good solution.I’ve always thought of the
((?!UNWANTED).)*
construct as somewhat “expensive”, but maybe that’s just because it “feels” complicated, but it would take a true regex genius like @guy038 to discuss that.Nice one as well!
-
I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?
Peter’s and my regexes only matched what was inside; not sure if you were solving something Vasile wanted or not with that – not going back to read/revisit it! – but I took the liberty of tweaking yours a bit so it matches what ours does:
(?s)\A.*<tr>\K.+?(?=</tr>)
and that appears to be the shortest matching regex thus far.
-
@Alan-Kilborn said in Regex: Select only the first instance of search results / first match:
I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?
As I said it was from @PeterJones solution for the first instance. Thus in his post:
FIND = (?s)\A.?<tr>\s\K.?(\s</tr>)
REPLACE = new contents$1
MODE = regular expression
REPLACE ALL
then I getSo the replacement text would have been
new contents$1
, again same as the first instance solution. Sorry forgot to mention that.Terry
-
This post is deleted! -
This post is deleted! -
so, conclusion. I select all regex from the las converstion:
Select and replace the first instance:
SEARCH:
(?s)\A.*?<tr>\s*\K.*?(\s*</tr>)(?=$)
REPLACE BY:NEW CONTENT $1
or
SEARCH:
(?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
REPLACE BY:NEW CONTENT $1
Select and replace the last instance:
SEARCH:
(?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)
REPLACE BY:\r NEW CONTENS $1 \r
or
SEARCH:
(?s)\A.*<tr>\K.+?(?=</tr>)
REPLACE BY:\r NEW CONTENS $1 \r
WORKS. Thanks a lot friends.
-
This all seems rather “special case”.
This<tr>
and</tr>
junk…To be generic, that is, a roadmap for other interested parties to use, why not specify it like this:
Match only the first occurrence in a file of a regular expression RE:
(?s)\A.*?\K
RE
Match the last occurrence of a regular expression RE:
(?s)\A.*(
RE).*?\K\1
Of course, clearly the RE has to be something a bit more specific than (example)
..
, but these seem to mostly work to achieve the goal. -
Hello, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
IMPORTANT : I wrote this post, after reading posts from the banner 4 YEARS LATER till the @peterjones’s post, below :
https://community.notepad-plus-plus.org/post/62964
But I going to add a second post, after reading the last recent solutions ! Sorry for my incomplete work !
First, @vasile-caraus, I totally agree to @alan-kilbron’s comment on your attitude ! Not very fair and nice to @Terry-r, which was trying to help you :-((
Seemingly, you quite know, by now, the powerful of regexes, regarding text manipulations. And if you had studied, seriously, some regex tutorials, you would not have spoken about that regex
(?s)\z.*?<tr>\s*\K.*?(\s*</tr>)
which is a complete nonsense !For instance, from the two pages of the
Regular-expressions.info
site, below, you had understood, at once, that the\z
syntax always comes at the very end of a regex expression or, possibly, before an alternation symbol|
!!https://www.regular-expressions.info/anchors.html
https://www.regular-expressions.info/refanchors.html
Now, I slightly simplified the @peterjones’s search regex, which searches for the first element
<tr> ••••• </tr>
, of anHTML
page :SEARCH
(?s-i)\A.*?<tr>\K.*?(?=</tr>)
In return, if your replacement regex is :
- The expression
Here is the NEW text
, you’ll get the simple text
</tr>Here is the NEW text</tr>
- The expression is
\r\nHere is the NEW text\r\n
the output text will be :
<tr> Here is the NEW text </tr>
-
Tick the
Wrap around
option -
Click on the
Replace All
button, exclusively !
Now, to search for the last element
<tr> ••••• </tr>
, of anHTML
page, use the following regex :SEARCH
(?s-i)<tr>\K((?!<tr>).)*?(?=</tr>((?!<tr>).)*?\z)
Note that I use exactly the scheme proposed by @Peterjones :
- find from <tr> to </tr> ( NOT included ) => (?s-i)<tr>\K •••••••••• (?=</tr> •••••••••• ) ^ ^ ^ | | | - WITHOUT any contained <tr> => ((?!<tr>).)*? ---• | | | | - FOLLOWED by anything that’s NOT a <tr> => ((?!<tr>).)*? ---------------------• | | - until the VERY END of the file => \z -------------------------------------•
To All :
You could ask me : why the regex to search for the last
<tr> ••••• </tr>
block is more complicated than the one to search for the first one ?This is because of the general direction used by the regex engine : from LEFT to RIGHT !
-
Indeed, when we search for
(?s-i)\A.*?<tr>
, part of the first regex, the range of any char(?s).*
with the lazy quantifier?
is then extended to the first occurrence of the string<tr>
and means that, necessarily, this range cannot contain any<tr>
inside ! -
Similarly, the regex
(?s).*?(?=</tr>)
would search for any range of any char, possibly empty, till the nearest string</tr>
, meaning, implicitly, that this range of chars cannot contain a</tr>
string -
Whereas, when searching the last
<tr> ••••• </tr>
block, as our reference is the anchor\z
( very end of current file ), we must build up the regex, using a kind of back-propagation method :-
Starting from the very end of file
-
Moving back, through characters without any
<tr>
string -
Till a
</tr>
string -
Moving back, again, through characters without any
<tr>
string -
Till a
<tr>
string
-
Of course, I assume that any
<tr>
correctly ends with</tr>
!Test these two regexes against this sample, derived from Peter’s one, which contains
4
blocks</tr> •••• </tr>
:<html><body> <table> <tr> get rid of stuff, in case of \A anchor, including <embedded/> <tags/> </tr> <tr> keep stuff including <embedded/> <tags/> </tr> <tr> keep stuff including <embedded/> <tags/> </tr> <tr> get rid of stuff, in case of \z anchor, including <embedded/> <tags/> </tr> </table> </body> </html>
The first regex, with the
\A
syntax should replace the first block, only and the last regex, with the\z
syntax, should replace the fourth and last<tr>
blockBest Regards,
guy038
P.S. :
@vasile-caraus, note that I’m willing, and probably, all people involved in that discussion, to help you if you have difficulty understanding a specific part of a regex tutorial, that you have decided to study. A different perspective will certainly be very useful to you … and others ;-))
- The expression
-
Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
My God !! Of course, the @terry-r’s regex is just magic and so simple ! Congratulations, Terry ;-)) How could we not think of it ??
If I adapt Terry concept to the regexes of my previous post, everything becomes crystal clear :
SEARCH
(?s-i)\A.*?<tr>\K.*?(?=</tr>)
to search ( and replace ) the first<tr> ••••• </tr>
blockSEARCH
(?s-i)\A.*<tr>\K.*?(?=</tr>)
to search ( and replace ) the last<tr> ••••• </tr>
blockAs usual, tick the
Regular expression
andWrap around
options and click on theReplace All
button, exclusively
@vasile-caraus, this demonstrates, in a masterful way, that things can be skillfully solved by other people than me and moreover… by @terry-r !!
Now, @alan-kilborn you said :
Match the last occurrence of a regular expression RE:
(?s)\A.*(
RE).*?\K\1
But, unless I’m mistaken, doesn’t this regex, below, do the same search ?
(?s)\A.*\K
REBest regards,
guy038
-
@guy038 said in Regex: Select only the first instance of search results / first match:
Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
My God !! Of course, the @terry-r’s regex is just magic and so simple !I feel like I’m being rewarded for something I
stoleborrowed now. ;-)) All I did was point out the marvellous creation of @PeterJones and how by the absence of a single character it turns one thing into another.But hey, I’m happy that collectively we can show there are many answers, all work in various ways.
Terry
-
@guy038 said in Regex: Select only the first instance of search results / first match:
But, unless I’m mistaken, doesn’t this regex, below, do the same search ?
(?s)\A.*\KREYes, indeed.
That’s what I get for dabbling in the area of another master! :-) -
@guy038 thanks a lot !
-
@Vasile-Caraus The regular expression
(?s)\A.*?\Kstring(?:.*?)?>
helps find the very first occurrence of a string and if you want to find the first occurrence of a tag, say TAG_2, AFTER the first occurrence of another tag, say TAG_1, my generic regex becomes :(?s-i)\A.*?<TAG_1(?: .*?)?>.*?\K<TAG_2(?: .*?)?>
as per @guy038 -
On testing the above, I observed that both the above regular expressions work only for tags or strings that begin with a
<
and end with a>
- so if you are searching for a string between inverted commas, to find the first string, you should use the regular expression(?s)\A.*?\K"string(?:.*?)?"
-
This post is deleted!