Regex: Add html tags in the lines that doesn't have html tags

Robin Cruise

I have this paragraph. Also, I have a line I need someone to take me home. that doesn’t have html tags. So, I need to find this line (not others) and frame it between tags

<!-- START -->

<p class="mb-40px">I may go to cinema</p>

I need someone to take me home.

<p class="mb-40px">I can love you now</p>

<!-- FINAL -->

OUTPUT:

<!-- START -->

<p class="mb-40px">I may go to cinema</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I can love you now</p>

<!-- FINAL -->

I don’t know why my regex doesn’t work.

FIND: ^(?!)(.*?)((?!).)*$

REPLACE: \2\

Robin Cruise

This post is deleted!

Robin Cruise

ok, so, I believe, I took a step forward. Seems t work.

FIND: ^(?!)(([a-zA-Z-].+))((?!).)*$

REPLACE BY: \2

Now, I have to integrate this regex between section:
 and 

I will use this generic formula:

(?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

will become:

FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!)(([a-zA-Z-].+))((?!).)*$

REPLACE: \2

In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

guy038

Hello @robin-cruise and All,

No need to use the generic formula !

Here is my general method :

From beginning of current line, I try to find a line which does not contain :
- A string  at any position of current line
 AND
- A string  at any position of current line
 AND
 (
- A tag  at any position of current line
 OR
- A tag  at any position of current line
 )
Then I select all characters, of current line, which come :
- After a possible  tag
- Before a possible  tag

So, given this INPUT text, below, with 3 lines to change :

<!-- START -->

<p class="mb-40px">I may go to cinema</p>

I need someone to take me home.

<p class="mb-40px">I may go to cinema</p>

I need someone to take me home.</p>

<p class="mb-40px">I may go to cinema</p>

<p class="mb-40px">I need someone to take me home.

<p class="mb-40px">I can love you now</p>

<!-- FINAL -->

I use the following regex S/R :

SEARCH (?-is)^(?!.*)(?!.*)(?:(?!.*))(?:)?(?|(.+)|(.+))

REPLACE \1

And, after a click on the Replace All button, I get the expected OUTPUT text :

<!-- START -->

<p class="mb-40px">I may go to cinema</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I may go to cinema</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I may go to cinema</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I can love you now</p>

<!-- FINAL -->

Notes :

First, after the usual modifiers, the boundaries which must not be matched (?!.*)(?!.*)
Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*))
Now, after a possible (?:)?, in a non-capturing group, too, the regex select, either :
- All chars before the  tag
 OR
- All remaining chars of current line

Remark :

Note the special syntax of this non-capturing group (?|(.+)|(.+)). This allow to define all groups to the same level. Thus, you just need the \1 syntax in the replacement part
If I had used a normal non-capturing group (?:(.+)|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been \1\2, as these two groups are mutually exclusive !

Best Regards,

guy038

Robin Cruise

@guy038 super, thanks.

what should be the generic regex in this case? (because I cannot figure the last part )

(?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*))(?:)?(?|(.+)|(.+))

guy038

Hi, @robin-cruise,

You cannot use the generic regex, discussed in the topic :

https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

In order to solve your present goal. Why ?

Well, because that genric regex suppose :

First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region
Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

Best Regards,

guy038

Robin Cruise

@guy038

SEARCH: (?-is)^(?!.*)(?!.*)(?:(?!.*))(?:)?(?|(.+)|(.+))

REPLACE: \1

Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

So, I need only to change between section  and 

<html lang="en">
<head>
  <!-- Meta Tags -->
  <meta charset="utf-8"/>

Script type="application/ld+json">
{
  "@context": "https://schema.org/", 
  "@type": "Product", 
  "name": "10 media farces of big days",
  "image": "icon.jpg",
  "description": "horses of Letea Delta Danube successfully saved,",
  "brand": {
    "@type": "Brand",
    "name": "something"
  },
  "sku": "NFL",
  "gtin8": "NFL",
  "offers": {
    "@type": "Offer",
    "url": "https://something.html",
    "priceCurrency": "RON",
    "price": "0",
    "priceValidUntil": "2022-02-15",
    "availability": "https://schema.org/OnlineOnly"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "5",
    "bestRating": "5",
    "ratingCount": "6"
  },
  "review": {
    "@type": "Review",
    "reviewRating": {
      "@type": "Rating",
      "ratingValue": "5",
      "bestRating": "5"
    },
    "author": {"@type": "Person", "name": "omehing"},
    "publisher": {"@type": "Organization", "name": "omehing"}
  }
}
</script>

guy038

Hi, @robin-cruise,

Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

BR

guy038

Robin Cruise

@guy038

yes, but also I cannot copy/paste the entire html page. It is a very large html code.

guy038

Hi, @Robin-cruise

If you don’t mind, just send me your file by e-mail !

Here is my temporary mail address :

BR

guy038

guy038

Hello @robin-cruise and All,

Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

Now, as you just have one ....... zone in your HTML file, the simple thing to do is :

In search, to look for :
- Any char from the very start of file till the complete  line
OR
- Any char from the  line till the very end of your file
OR ( Scan of lines between the  and  boundaries )
- A possible  tag, beginning the current line
- Followed with a single-line range of characters :
 - Till a  tag, ending the current line
- OR
 - Till the end of current line
In replacement, to rewrite :
- ( If scan within the ......... zone, so when the group 2 is defined )
 - First, the  tag, if absent in the INPUT file ( group 1 not defined )
 - Then all the contents of current line ( $0 )
 - And, finally, the  tag, if absent in the INPUT file ( group 3 not defined )
OR
- The two ranges of chars, before the , included and after the  boundaries ( which occur when the group 2 is not defined )

For instance, from this INPUT file, below :

<!DOCTYPE html>
....
bla bla
....
blah bla

<!-- ARTICOL START -->

<p class="mb-40px">I need someone to take me home.</p>

I need someone to take me home.

I need someone to take me home.</p>

<p class="mb-40px">I need someone to take me home.

<!-- ARTICOL FINAL -->

bla bla
....
blah bla
....
</html>

The following regex S/R :

SEARCH (?s-i)^.+\R|.+|(?-s)^()?(?|(.+)()|(.+))$

REPLACE ?2(?1:)$0(?3:):$0

Should give you the expected results :

<!DOCTYPE html>
....
bla bla
....
blah bla

<!-- ARTICOL START -->

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I need someone to take me home.</p>

<p class="mb-40px">I need someone to take me home.</p>

<!-- ARTICOL FINAL -->

bla bla
....
blah bla
....
</html>

The message Replace All: 6 occurences were replaced is displayed in the status bar :

One for the part between <!DOCTYPE html> and 
One for each non-empty line between  and  ( 4 lines )
One for the part between  and the very end of file

Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

Best Regards,

guy038

Robin Cruise

@guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

SEARCH (?s-i)^.+\R|.+|(?-s)^()?(?|(.+)()|(.+))$
REPLACE ?2(?1:)$0(?3:):$0

great answer, thank you @guy038