Match everything except the text and <br> tags

guy038

Ah, of course, if you add a <div class="left"> line, right after the first <div style="..... line, it will not work !

So, given this INPUT text, pasted in a new tab:

<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
<div class="left">
<p class=MsoNormal><b><span style='font-size:13.5pt;line-height:115%;
    font-family:"Verdana","sans-serif";color:red'>SYNONYMS </span></b>
</p>
<div class="left">

Simply, change the previous search regex by this new version :

(?s)\A.+?\R\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">

Note the différence : between #EBF4FB;">\s* and \s*<div class="left">, I changed the part .+? by .*?

I also slightly change the position of the \K feature

Ax expected, this new regex will match the two consecutive lines :

<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
<div class="left">

BR

guy038

dr ramaanand

@guy038 This RegEx: (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">.+(?=\s*+<div class="left">) would have stopped searching just before the second occurrence of <div class="left"> if the sample to be searched was like this:-

<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
<div class="left">
<div class="left">

guy038

@dr-ramaanand,

Yes, your regex does match the same amount of text as my version but my regex seems more simple and logic !

BR

guy038

dr ramaanand

@guy038 d’accord, merci beaucoup!

dr ramaanand

@guy038 your last RegEx finds the first occurrence of <div class="left"> even if there is some other text above it. Lovely!

guy038

Hi, @dr-ramaanand and All,

Again, I did not check all the possibilities before posting. Sorry for the NOISE !

So, the right regex to use should be :

(?s)\A.*?\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">

This time, it will work if you pasted this text, in a new tab

<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
<div class="left">
<div class="left">

But it will also works, if you pasted the following text, in a new tab


First non-blank line
second line

Third line before the block to match

<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
<div class="left">
<div class="left">

Best Regards,

guy038

dr ramaanand

@guy038 I am not sure if I am allowed to do it (as the solution was provided by you), so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

guy038

Hello, @dr-ramaanand and All,

You said in your previous post :

… so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

But, actually, my regex finds the first occurrence of the <div class="left"> tag, AFTER a first occurrence of the <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;"> tag !

So, to my mind, the correct way to match the first occurrence of a specific tag, in current file, is to use the generic regex :

(?s-i)\A.*?\K<TAG Name(?: .*?)?>

Just replace the generic TAG Name value with a valid HTML tag

Note that, in case of the comment tag, replace the generic TAG Name, into the above regex, by the literal string !--.*?--

Similarly, the correct way to match the last occurrence of a specific tag, in current file, is to use the generic regex :

(?s-i)\A.*\K<TAG Name(?: .*?)?>

BR

guy038

dr ramaanand

@guy038 said in Match everything except the text and <br> tags:

(?s-i)\A.\K<TAG Name(?: .?)?>

I think that that should be (?s-i)\A.*\K<TAG Name(?:.*?)?> with no spaces anywhere in the middle

guy038

Hi, @dr-ramaanand and All,

In order to use a valid INPUT text to do some tests, just open the main page of our forum. Then hit the Ctrl + U shortcut to open the HTML source page of our forum and paste its contents in a new tab

My generic regex tries to match the syntax <TAG......, till the nearest > character and must be valid for any kind of tag.

Thus, I prefer to insert a space char to verify that the tag is a valid one . Indeed, this regex will match, either, tags like <head> or for example <span style="color:blue">blue</span>

If you replace the TAG Name in the generic regex (?s-i)\A.*?\K<TAG Name(?: .*?)?>, which matches the first tag, named TAG, in current file, you get, from the examples, the regexes :

(?s-i)\A.*?\K<head(?: .*?)?>
(?s-i)\A.*?\K<span(?: .*?)?>

Just test them against the HTML code source of our forum

Now, let’s suppose, for example, that you want to find out the first input ...> tag, AFTER the first img ......> tag, in the HTML code source of our forum :

Then, from my previous post, you would have to use the following regex :

(?s-i)\A.*?<img(?: .*?)?>.*?\K<input(?: .*?)?>

which matches, as expected, the following line :

<input autocomplete="off" type="text" class="form-control hidden" name="term" placeholder="Search"/>

BR

guy038

P.S. : You also replied in an old post, regarding this extra space char. However, I’ll not reply because this topic is old and not exactly related to the present discussion !

dr ramaanand

@guy038 Okay, thank you!