why does this parsing not work? Replace the content of html tags between comments section
- 
 hello, I read this topic and the answer of @guy038 over HERE I have a similar request. I this html code. <!-- Form Start --> <div class="55df" > <div class="gg44"> <h4 class="header-text">United Romaketh</h4> <img src="https://sensy.com/33.png" alt="Alternate Text" /> </div> <!-- Identify --> <div class="45de"> <div class="col"> <div class="body-text"> <h4>Marcus 33</h4> <p>Can you tell me?</p> </div> <div class="yy"> <!-- Yesd. --> <form action="https://www.gre.com/" method="post" class="similar"> <!-- Stones --> <input type="hidden" name="business" value="erer@gmail.com"> <!-- button simple --> <input type="hidden" name="cmd" value="_buttons"> <!-- contribution --> <input type="hidden" name="item_name" value="Maxim"> <input type="hidden" name="item_number" value="Maxim"> <select name="amount"><option value="3.00">€3.00</option><option value="5.00">€5.00</option><option value="10.00">€10.00</option><option value="25.00">€25.00</option><option value="50.00">€50.00</option></select> <input type="hidden" name="currency" value="DOL"> <!-- Display the button. --> <input class="paypal-img" type="image" src="https://www.pitt.com/hh.gif" border="0" name="submit" title="yy" alt="button" /> </form> </div> </div> <div class="col"> <div class="body-text"> <h4>I am here</h4> <p>My text here</p> </div> <div class="gono"> <form action="https://www.concate.com/donate" method="post" target="_top"> <input type="hidden" name="444" value="7Z7JBUL" /> <input class="r-img" type="image" src="https://www.rer.com/" border="0" name="submit" title="d" alt="sdsd" /> <img alt="" border="0" src="https://www.dd.com/pixel.gif" width="1" height="1" /> </form> </div> </div> </div> </div> <div class="Home Store"> <h4>I am here</h4> <p>my dauther loves me</p> <div class="text-sdsg"> Love Me tender </div> </div> <!-- Form Final -->The problem: I must replace all <p></p>with<p class="STAR-ONE"></p>from the section<!-- Form Start -->to<!-- Form Final-->My regex seems ok, but is not replacing too good. Instead of <p class="STAR-ONE"></p>it gives me<p><p class="STAR-ONE"></p>I really don’t know where is the mistake on my regex…?! Search: (?:.*?<!-- Form Start -->|\G).*?\K(<p>).*?(?=</p>.*?<!-- Form Final -->)
 Replace by:\1<p class="STAR-ONE">\3
- 
 @Robin-Cruise said in why does this parsing not work? Replace the content of html tags between comments section: (?:.?<!-- Form Start -->|\G).?\K(<p>).?(?=</p>.?<!-- Form Final -->) I find the answer, thanks: FIND: (?:.*?<!-- Form Start -->|\G).*?\K<p>(.*?)(?=</p>.*?<!-- Form Final -->)
 Replace by:\2<p class="STAR-ONE">\1\3CHECK Wrap around 
 CHECK Regular expression
 CHECK . matches newline
- 
 Hello, @robin-cruise and All, Firstly, I just cannot understand your last regex S/R : SEARCH (?:.*?<!-- Form Start -->|\G).*?\K<p>(.*?)(?=</p>.*?<!-- Form Final -->)
 1
 REPLACE\2<p class="STAR-ONE">\1\3Indeed, in replacement, you have three groups 1,2and3but your search regex contains only one group(.*?)!? I verified that, after each replace operation, the groups2and3are always empty !
 Secondly, you should have expressed your initial goal as : " I have some <p>Some Text</p>zones and I would like to change them into<p class="STAR-ONE">SAME Text</p>"It’s a little bit clearer ! 
 Thirdly : - 
You could had added the (?s)modifier, at beginning of your regex, in order to not care about the. matches newlineoption !
- 
You could had used used the more accurate ^\h*syntax, instead of.*?, at beginning of the non-capturing group
- 
You could had added the negative look-around (?!\A), right before the\Gassertion. Indeed, as theWrap aroundoption is set, the regex engine starts the replacement process from the very beginning of file, whatever the current caret location, and the(?!\A)syntax ensures that the regex engine will not used the second alternative\Gbut will look, instead, for a<!-- Form Start -->line, first !
 
 Fourthly, your regex cannot work with, for instance, the text below, where I duplicated your initial example and, in between, I inserted the same section, without the boundaries <!-- Form Start -->and<!-- Form Final --><!-- Form Start --> <div class="55df" > <div class="gg44"> <h4 class="header-text">United Romaketh</h4> <img src="https://sensy.com/33.png" alt="Alternate Text" /> </div> <!-- Identify --> <div class="45de"> <div class="col"> <div class="body-text"> <h4>Marcus 33</h4> <p>Can you tell me?</p> </div> <div class="yy"> <!-- Yesd. --> <form action="https://www.gre.com/" method="post" class="similar"> <!-- Stones --> <input type="hidden" name="business" value="erer@gmail.com"> <!-- button simple --> <input type="hidden" name="cmd" value="_buttons"> <!-- contribution --> <input type="hidden" name="item_name" value="Maxim"> <input type="hidden" name="item_number" value="Maxim"> <select name="amount"><option value="3.00">€3.00</option><option value="5.00">€5.00</option><option value="10.00">€10.00</option><option value="25.00">€25.00</option><option value="50.00">€50.00</option></select> <input type="hidden" name="currency" value="DOL"> <!-- Display the button. --> <input class="paypal-img" type="image" src="https://www.pitt.com/hh.gif" border="0" name="submit" title="yy" alt="button" /> </form> </div> </div> <div class="col"> <div class="body-text"> <h4>I am here</h4> <p>My text here</p> </div> <div class="gono"> <form action="https://www.concate.com/donate" method="post" target="_top"> <input type="hidden" name="444" value="7Z7JBUL" /> <input class="r-img" type="image" src="https://www.rer.com/" border="0" name="submit" title="d" alt="sdsd" /> <img alt="" border="0" src="https://www.dd.com/pixel.gif" width="1" height="1" /> </form> </div> </div> </div> </div> <div class="Home Store"> <h4>I am here</h4> <p>my dauther loves me</p> <div class="text-sdsg"> Love Me tender </div> </div> <!-- Form Final --> <!-- ------------------------------------------------------------------------------------------------------------------------------- --> <div class="55df" > <div class="gg44"> <h4 class="header-text">United Romaketh</h4> <img src="https://sensy.com/33.png" alt="Alternate Text" /> </div> <!-- Identify --> <div class="45de"> <div class="col"> <div class="body-text"> <h4>Marcus 33</h4> <p>Can you tell me?</p> </div> <div class="yy"> <!-- Yesd. --> <form action="https://www.gre.com/" method="post" class="similar"> <!-- Stones --> <input type="hidden" name="business" value="erer@gmail.com"> <!-- button simple --> <input type="hidden" name="cmd" value="_buttons"> <!-- contribution --> <input type="hidden" name="item_name" value="Maxim"> <input type="hidden" name="item_number" value="Maxim"> <select name="amount"><option value="3.00">€3.00</option><option value="5.00">€5.00</option><option value="10.00">€10.00</option><option value="25.00">€25.00</option><option value="50.00">€50.00</option></select> <input type="hidden" name="currency" value="DOL"> <!-- Display the button. --> <input class="paypal-img" type="image" src="https://www.pitt.com/hh.gif" border="0" name="submit" title="yy" alt="button" /> </form> </div> </div> <div class="col"> <div class="body-text"> <h4>I am here</h4> <p>My text here</p> </div> <div class="gono"> <form action="https://www.concate.com/donate" method="post" target="_top"> <input type="hidden" name="444" value="7Z7JBUL" /> <input class="r-img" type="image" src="https://www.rer.com/" border="0" name="submit" title="d" alt="sdsd" /> <img alt="" border="0" src="https://www.dd.com/pixel.gif" width="1" height="1" /> </form> </div> </div> </div> </div> <div class="Home Store"> <h4>I am here</h4> <p>my dauther loves me</p> <div class="text-sdsg"> Love Me tender </div> </div> <!-- ------------------------------------------------------------------------------------------------------------------------------- --> <!-- Form Start --> <div class="55df" > <div class="gg44"> <h4 class="header-text">United Romaketh</h4> <img src="https://sensy.com/33.png" alt="Alternate Text" /> </div> <!-- Identify --> <div class="45de"> <div class="col"> <div class="body-text"> <h4>Marcus 33</h4> <p>Can you tell me?</p> </div> <div class="yy"> <!-- Yesd. --> <form action="https://www.gre.com/" method="post" class="similar"> <!-- Stones --> <input type="hidden" name="business" value="erer@gmail.com"> <!-- button simple --> <input type="hidden" name="cmd" value="_buttons"> <!-- contribution --> <input type="hidden" name="item_name" value="Maxim"> <input type="hidden" name="item_number" value="Maxim"> <select name="amount"><option value="3.00">€3.00</option><option value="5.00">€5.00</option><option value="10.00">€10.00</option><option value="25.00">€25.00</option><option value="50.00">€50.00</option></select> <input type="hidden" name="currency" value="DOL"> <!-- Display the button. --> <input class="paypal-img" type="image" src="https://www.pitt.com/hh.gif" border="0" name="submit" title="yy" alt="button" /> </form> </div> </div> <div class="col"> <div class="body-text"> <h4>I am here</h4> <p>My text here</p> </div> <div class="gono"> <form action="https://www.concate.com/donate" method="post" target="_top"> <input type="hidden" name="444" value="7Z7JBUL" /> <input class="r-img" type="image" src="https://www.rer.com/" border="0" name="submit" title="d" alt="sdsd" /> <img alt="" border="0" src="https://www.dd.com/pixel.gif" width="1" height="1" /> </form> </div> </div> </div> </div> <div class="Home Store"> <h4>I am here</h4> <p>my dauther loves me</p> <div class="text-sdsg"> Love Me tender </div> </div> <!-- Form Final -->
 Finally, we should use, from this post, the generic regex S/R, below : SEARCH (?-i:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-i:FR)REPLACE RR with : - 
BSR ( Begin Search-region Regex ) = ^\h*<!-- Form Start -->
- 
ESR ( End Search-region Regex ) = ^\h*<!-- Form Final -->
- 
FR ( Find Regex ) = <p>
- 
FR ( Find Regex ) = <p class="STAR-ONE">
 This leads to the effective regex : - 
SEARCH (?-i:^\h*<!-- Form Start -->|(?!\A)\G)(?s-i:(?!^\h*<!-- Form Final -->).)*?\K(?-i:<p>)
- 
REPLACE <p class="STAR-ONE">
 But, as the -imodifier is used everywhere and as the(?s)dot is used for a single., only, we can even simplify the S/R as :- 
SEARCH (?s-i)(?:^\h*<!-- Form Start -->|(?!\A)\G)(?:(?!^\h*<!-- Form Final -->).)*?\K<p>
- 
REPLACE <p class="STAR-ONE">
 which correctly matches 6occurrences in my above example ( =3 x 2zones<!-- Form Start -->•••••<!-- Form Final -->) !Best Regards, guy038 
- 
- 
 thank you. also, maybe @guy038 can help me with a similar problem: I have this 4 lines: <link rel="canonical" href="https://website.com/en/camera.html" />and <div class="somers"><a href="https://othersite/fffffon.html" class="flags bg" hreflang="bg" title="bk"></a> <a href="https://roberta.com/test-lofet.html" class="flags sk" hreflang="sk" title="sk"></a> <a href="https://cameleon.com/america.html" class="flags uk" hreflang="uk" title="uk"></a>I want to copy https://website.com/en/camera.htmlfrom canonical tag, and copy/replace those 3 links on other line with it.My regex change only the first of the first three, don’t know why :( Search: (?s)<link rel="canonical" href="(.*?)"\h/>.*?<a href="\K.*?(?="\hclass="flags)
 Replace by:\1The pattern I follow is: FIND: (?s)PART-A(.*?)PART-B.*?SECOND-A\K.*?(?=SECOND-2)
 REPLACE BY:\1The output: <div class="somers"><a href="https://website.com/en/camera.html" class="flags bg" hreflang="bg" title="bk"></a> <a href="https://website.com/en/camera.html" class="flags sk" hreflang="sk" title="sk"></a> <a href="https://website.com/en/camera.html" class="flags uk" hreflang="uk" title="uk"></a>
- 
 Hi @robin-cruise, - 
Can you specify if the line <link rel="canonical" href="https://website.com/en/camera.html" />occurs only once, in eachHTMLfile ?
- 
Does this line come always before the different <a href="•••••••••••••••" class="flagsexpressions ?
 TIA, Cheers guy038 
- 
- 
 hello @guy038 yes, <link rel="canonical" href="https://website.com/en/camera.html" />occurs only once, in each HTML file.and yes, that line come always before the different <a href="•••••••••••••••" class="flagsexpressions.canonicalline is at about the beginning of the file
 all those<a href="•••••••••••••••" class="flagsare at the end of the files, in the footer.
- 
 can you help me @guy038 ? 
- 
 Hello @robin-cruise , Sorry, I spent a lot of time with the @xaviermdq’s problem ! Refer here ! I won’t be long. I’ve already imagined something which should work ! BR guy038 
- 
 Hi, @robin-cruise and All, The general problem is how to modify some lines with an expression ( https://website.com/en/camera.html), located before these lines ? Somehow, we need to rewrite the address, in the<link••••• />tag somewhere, after the lines to modify !
 So I propose to decompose the problem in two smaller ones : - Firstly, we store in a comment, at the very end of each HTMLfile, the address located in the<link ••••• />tag with this regex S/R :
 SEARCH (?-is)<link rel="canonical" href="(.+?)"(?s).+\KREPLACE \r\n<!-- \1 -->- 
Secondly : - 
We replace the address, in each <a href="••••••••••" class="flags••••••••••></a>tag found, with the stored address in the last comment of the file, at the very end of file
- 
We delete this temporary comment, as well 
 
- 
 With this regex S/R : SEARCH (?-is)<a href="\K.*?(?="\h+class="flags(?s).+<!-- (.+) -->\z)|(?-s)<!--.+\zREPLACE ?1\1Best Regards, guy038 
- Firstly, we store in a comment, at the very end of each 
- 
 thanks @guy038 
- 
 I found a solution that works with PowerShell, that will replace all lines with the canonical link tag: $sourcedir = "C:\Folder1\" $resultsdir = "C:\Folder1\" Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object { $content = Get-Content -Path $_.FullName -Raw $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value $content = $content -replace 'https:\/\/.+.html',$replaceValue Set-Content -Path $resultsdir\$($_.name) $content }
