Regex: Select only the first instance of search results / first match

guy038

Hi Vasile,

I, first, thought that the regex (?s).*?\KText_1|.*\KText_2 would give you the exact matches that you said :

And If I want to match (in the last formulas) the first instance of Text_1 and the last instance of Text_2?

Unfortunately, when using the search functionality, only, this regex matches any string Text_1, then the last string text_2 ! And, I was not able to get the right regex, which could find, in the current file, the first instance of Text_1, then the last instance of Text_2 :-((

However, the regex (?s).*?\KText_1.*Text_2 allows us to select, in one go, all the gap, between these two specific boundaries, included !

Best Regards,

guy038

Vasile Caraus

hello again. I have many of <tr></tr> tags on a html page. I want to select with regex only this first instance of <tr> tags. I made a regex, but this formula selects both <tr>. tags. I want only the first one, not the second one with Other Code

FIND: \b<tr>[\s\S]+</tr>\b

<tr>
<td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
`</tr>

and more

<tr>
Other Code
</tr>

Terry R

@Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

FIND: \b<tr>[\s\S]+</tr>\b

The simplest change I can see is to put a ? behind the + character as your regex is greedy. I presume it is currently going to the last </tr> in the file.

Also as far as I can see the \s\S combination means every character including CR and LF one’s. The whole thing could be rewritten as (?s)\b<tr>.+?</tf>\b.

I’m not on a PC to currently check my answer so apologies if I have it slightly wrong.

Terry

Vasile Caraus

@Terry-R said in Regex: Select only the first instance of search results / first match:

(?s)\b<tr>.+?</tf>\b

your (?s)\b<tr>.+?</tr>\b is not working :(

I also try something different, also not working :( (?:^(?ms)(<tr>).*?(</tr>))

Vasile Caraus

I also try another combination, not working (?-s)(\b(?!^<tr>(.+)</tr>)

:((

<tr>
<td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
</tr>

code

<tr>
Other Code
</tr>

Terry R

@Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

your (?s)\b<tr>.+?</tr>\b is not working :(

What does it do? Does it select anything. And sorry for the typo with the tf which I see you caught.
The (?s) is necessary to cross lines. As you had \b I also included them although they could both be removed as a test.

Terry

Vasile Caraus

I believe only @guy038 can find a good answer :)

Alan Kilborn

@Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

I believe only @guy038 can find a good answer

Nice kick in the teeth for Terry, who was trying to help you…
:-) not withstanding.

Vasile Caraus

@Terry-R I want to select everything from <tr> to </tr> but only one instance, the first instance, because I have many tags starting with <tr> and close with </tr>

Alan Kilborn

@Vasile-Caraus

Wouldn’t (?s)\A.*?<tr>.*?</tr> work to get only the first?

PeterJones

@Vasile-Caraus ,

I would normally say that you should have started a new topic, rather than reviving one from 4.5 years ago. But since 4.5 years later, you still haven’t learned the lesson that Guy taught you in 2016, maybe you should be in the same single topic – but it would be better to actually learn from the dozens of different regular expressions that we have provided for you over the last 4.5 years. This forum is not a regular expression help forum – this is a Notepad++ discussion forum, where regex are only a small part of the power of Notepad++.

In your new question, you state ,

I have many of <tr></tr> tags on a html page.

That phrasing, in English, implies you have only one HTML page you are doing this to. If that’s really the case, then it’s really simple: take the simple regex you guessed, and hit FIND once and REPLACE once, and you are done. Or you could have just gone to the beginning of the document, and done a single search for <tr> and then manually done the replacement, which is even easier.

But I doubt that’s your real situation. The only reason it would make sense to ask this question is if you were really doing a Find In Files > Replace All, in order to make this single change in multiple HTML files.

As Guy explained in 2016, if you only want to replace one instance per file (the first instance), you can do that by consuming the rest of the file in the single regex. That will work for small files, but if your files are too large, it will not work, because regex has only a certain amount of capture memory.

Fortunately, since then, assuming you have updated Notepad++, the developers have fixed the \A anchor, so the beginning-of-file check works – as @Alan-Kilborn showed in his recent reply.

I will modify his (and the regex I was going to supply, which consumed all), to give an example replacement

If I have the simple file:

<html><body>
<table>
<tr>
get rid of stuff including <embedded/> <tags/>
</tr>
<tr>
keep stuff including <embedded/> <tags/>
</tr>
</table>
</body>
</html>

and I run

FIND = (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
REPLACE = new contents$1
MODE = regular expression
REPLACE ALL

then I get

<html><body>
<table>
<tr>
new contents
</tr>
<tr>
keep stuff including <embedded/> <tags/>
</tr>
</table>
</body>
</html>

This should work, even on long files. It should work the same if you’re using Find in Files instead of the single-file Replace dialog.

Compared to @Alan-Kilborn’s regex, I added the feature that it uses the \K reset to automatically keep everything up to the first <tr>. I also kept the spaces after the <tr> and the spaces before the </tr> (where “spaces” include any space character, even newlines), so that way if you have <tr>blah</tr> all on one line, your replacement will stay all on one line, but if you have the three-line version like you showed, it will stay as three lines.

Because I captured the final spaces and </tr>, I had to include $1 in the replacement to re-instate that part of the text. But it could have been done with positive lookahead instead, meaning you wouldn’t need the group text in the replacement. TIMTOWTDI.

Now that we’ve given you an answer that works for the situation you described, please take the following advice: Please remember that this isn’t a “give me a regex forum”. This isn’t even a paid support, where we are obligated to help you. This is a community to discuss Notepad++; we will answer the occasional Notepad++-related regex question, especially for new users who have never been exposed to regular expressions. But we expect people who have been around the forum for many years to participate in helping others, not just in getting free regex creation service. Please learn from the four and a half years of regular expression advice we have been providing you. Many, many times, we have linked you to the regular expression documentation. I will give you my boiler plate one more time, just in case you missed it the last however many times I’ve posted it here. But please understand that if you continue to show a disregard for our previous advice, and if you continue to just “request” that we craft regex for you, rather than truly participating in the forum, you will find fewer and fewer here who are willing to help you, and you might start noticing downvotes on your questions.

-—

Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

Vasile Caraus

This post is deleted!

Vasile Caraus

@PeterJones said in Regex: Select only the first instance of search results / first match:

<tr>
get rid of stuff including <embedded/> <tags/>
</tr>

how about for the last instance? I should use \z isn’t it ?

Like this: (?s)\z.*?<tr>\s*\K.*?(\s*</tr>)

but this does not select the last instance. what did I do wrong

PeterJones

@Vasile-Caraus ,

Think about the order of events. If \z means the end of the file, then \z.*?<tr> means match zero or more characters after the end of the file followed by <tr>. You cannot have zero or more characters followed by <tr> after the end of the file.

Vasile Caraus

@PeterJones then how would it be correct?

PeterJones

@Vasile-Caraus ,

You’ve already gotten the freebie from me on this question. You need to put more thought and effort into it.

Did you know, when I am helping anyone in this forum by coming up with a regular expression for their problem, that I don’t automatically know the solution off the top of my head? Do you know what I do? I break the problem down into little pieces, then translate each of those pieces into regex syntax, then I try the regex out; if it doesn’t work, I try to figure out why, and see if I can tweak my initial guess until it does work.

This same process will work for you, if you give it a try. I will even give you a boost, by stating your problem in the little pieces I would initially try

find <tr> to </tr>, without any contained <tr>
followed by anything that’s not a <tr>
until the end of the file

Also, although it’s not highly Notepad++ related: if you are working on a large website, where you’re going to be frequently changing the template (I assume you are changing content in boilerplate that surrounds your HTML), then I highly recommend a system with a templating language, often implemented as a CMS (content management system), so that you don’t have to be modifying the same code multiple times. In the long run, that will be more efficient than coming up with a new regex for every template change.

I hope you take to heart the advice I’ve given you here.

Vasile Caraus

@PeterJones it is easy for a developer to understand code, but is not easy for a painter to understand the code… :)

Alan Kilborn

@Vasile-Caraus

it is easy for a developer to understand code, but is not easy for a painter to understand the code…

If you can’t stand the heat, get out of the kitchen.

Q: What’s the meaning of the phrase ‘If you can’t stand the heat, get out of the kitchen’?

A: Don’t persist with a task if the pressure of it is too much for you. The implication being that, if you can’t cope, you should leave the work to someone who can.

It’s getting embarrassing for you.
It’s almost an uncomfortable thing to witness.
:-(

PeterJones

@Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

@PeterJones it is easy for a developer to understand code, but is not easy for a painter to understand the code… :)

The whole point of CMS is to make it easier for painters to make a website, almost literally. If you cannot learn the CMS, then you should probably hire an expert.

Conversely, to your statement, if I go to my painter friend and ask him to paint me a picture to hang on the wall, he might do it for free one time, but after that, he’ll tell me “either learn to paint yourself, or pay me”.

You know enough about regex to come up with some guesses, so you already know a lot of the code, and you have been pointed to the documentation to be able to learn more. Try to piece together the bits you know in the right order for the problem you have. I have given you hints as to the right order. But at some point, you’re going to have to choose to learn, or find someone you can pay to do the development for your website.

Because expecting us to be your personal free regular-expression writers for 4.5years, without ever contributing anything else to this forum, is … Nevermind.

Good luck.

Terry R

@Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

your (?s)\b<tr>.+?</tr>\b is not working :(

Now that I’m on a PC I can see the original post (recent) from you stating your regex was “finding both <tr>…</tr>” was most likely false. Copying the examples I see that by using the \b, it actually prevented the regex from capturing the example text.

So my first solution to add a ? to your regex would have fixed the issue IF your regex was actually capturing too much. My later statement to remove the \b was in fact a correct move and would/should have resolved the issue. Except your statement about capturing the “first” only “<tr>…</tr>” was a bit troubling as you didn’t have the \A anchor as @PeterJones stated was needed. I assumed (wrongly possibly) that you possibly had a large html file in which you wanted to find (and replace?) the first “<tr>…</tr>” and if using ONLY the “Find Next” or “Replace” button when cursor was at start of an open file would have found the first set.

I guess I have learnt a valuable lesson, don’t provide answers when not able to independently confirm OP statements. I was on an android tablet late in the evening and about to go to bed when I saw this post and assumed it was an easy fix given the statement about it selecting too much text.

Terry

PS thanks Alan for coming to my rescue ;-))