Hello, @hellena-crainicu and All,
Before giving, in the second part of this post, the specific solution for @hellena-crainicu, here is a simple example to show you the difficulties I had to face !
Let’s start with this text :
---000000---DEF---
START---12345---6789---1111199999---DEF---STOP
---GHI---000
00000---
START---123---PQR---456---789STOP
---00000000---00---GHI---0000---
START987---AAA---654
---ZZZ---321---STOP
---0000---000000000
And let’s suppose that we want to rewrite all numbers, between the boundaries START and STOP, each on a new line
If, in addition, we want to add a line-break, after the START opening section, we need, from the generic regex, discussed before by @peterjones, to slightly modify this regex, as we search for two independent strings, simultaneously. It leads to the regex S/R :
(A) SEARCH (?sx-i)(?: (START) | (?!\A)\G ) (?: (?!STOP). )*? (\d+ (\R)? )
(A) REPLACE (?1\1\r\n\r\n)\2(?3:\r\n)
and would change the initial text as :
---000000---DEF---
START
12345
6789
1111199999
---DEF---STOP
---GHI---000
00000---
START
123
456
789
STOP
---00000000---00---GHI---0000---
START
987
654
321
---STOP
---0000---000000000
As you can see :
The START boundary is clearly defined
The different numbers, located between START and STOP are correctly rewritten one per line and extra stuff is deleted
However, between the last number and the closing boundary STOP, some extra characters are still not deleted :-(
No problem, we may modify this S/R to include the search of STOP, too, within a non-capturing group, giving :
(B) SEARCH (?sx-i)(?: (START) | (?!\A)\G ) (?: (?!STOP). )*? (?: ( \d+ (\R)? ) | (STOP) )
(B) REPLACE (?1\1\r\n\r\n)(?2\2(?3:\r\n))(?4\r\n\4)
And we will take the opportunity to add a line-break, right before the closing section STOP
Thus, we obtain :
---000000---DEF---
START
12345
6789
1111199999
STOP000
00000
123
456
789
STOP00000000
00
0000
987
654
321
STOP0000
000000000
Unfortunately, it seems that the 0 digits are also processed like the other numbers, although they are not part of a START •••••STOP region :-((
Indeed, after matching some stuff ending with STOP, the search process restarts immediately and considers the following characters as we have specified the (?s) modifier ! So, how to tell the regex engine, to directly jump to the next START boundary ?
I had the idea to only search for the beginning of the STOP string, for instance the string ST and add a negative look-behind (?!OP), executed once only, after the START string or location of the previous match
So :
First, extra chars before STOP as well as ST are changed as the string \r\nST
Now, the regex engine is located right before the OP string of the word STOP. However, due to the look-ahead (?!OP), it must advance of one position in order that the condition (?!OP) is true. As this new match do not start where the previous match ends, the \G assertion forces the failure of the match attempt !
Thus, the string OP and further stuff should not be modified and the new match would necessarily catch an other string START, so the beginning of an other allowed region !
(C) SEARCH (?sx-i)(?: (START) | (?!\A)\G ) (?!OP) (?: (?!STOP). )*? (?: ( \d+ (\R)? ) | (ST) )
(C) REPLACE (?1\1\r\n\r\n)(?2\2(?3:\r\n))(?4\r\n\4)
After replacement, we get :
---000000---DEF---
START
12345
6789
1111199999
STOP
---GHI---000
00000---
START
123
456
789
STOP
---00000000---00---GHI---0000---
START
987
654
321
STOP
---0000---000000000
This time, it easy to see that the parts of text :
Before the first START boundary
After a STOP boundary and before a START boundary
After the last STOP boundary
Are not modified at all by the replacement, as expected !
Now, @hellena-crainicu, as promised, here is the regex S/R to achieve what you want :
SEARCH (?s)(?:^\h*(<!-- MAIN START -->)(?:\h*\R)+|(?!\A)\G)(?!->)(?:(?!<!-- MAIN FINAL -->).)*?(?:^\h*(<p class=".+?</p>(\R)?)|^(?:\h*\R)*\h*(<!-- MAIN FINAL -))
REPLACE (?1\1\r\n\r\n)\2(?3:\r\n)\4
You may test it against this sample text, below, containing two sections <!-- MAIN START --> ••••• <!-- MAIN FINAL -->, embedded into three other sections !
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN START -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p> <p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN FINAL -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN START -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN FINAL -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
You should get the expected text :
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN START -->
<p class="my_2">I love myself</p>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
<!-- MAIN FINAL -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
<!-- MAIN START -->
<p class="my_2">I love myself</p>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
<!-- MAIN FINAL -->
<div align="center">
<table width="33" border="0">
<tr>
<td>
<h1 class="tre" itemprop="sfe">Text here</h1>
</td>
</tr>
<tr>
<td class="rest">Something, by Author</td>
</tr>
</table>
<h2 class="blast2"><img src="sfa.jpg" alt="hip" />
<map name="goon" id="m2_34">
<p class="my_2">I love myself</p>
<area shape="rect" coords="45,74,582" href="#plata" alt="" />
</map>
</h2>
<p class="my_2">Why this text text?</p>
<p class="my_3">test text text</p>
<p class="my_2">test text text</p>
<p class="my_3">test text text</p>
</div>
<p align="justify" class="justify_em">Yes</p>
Using the free-spacing mode, the search regex can be re-expressed as :
(?xs-i) # FREE-SPACING mode, regex DOT match ANY character and search is SENSITIVE to CASE
(?: # START of the 1st NON-CAPTURING group
^\h* # Any LEADING BLANK characters, followed with ...
(<!--[ ]MAIN[ ]START[ ]-->) # The string '<!-- MAIN START -->', STORED as group 1
(?:\h*\R)+ # And followed with BLANK or EMPTY lines, in a NON-CAPTURING group
| # OR
(?!\A)\G # The EMPTY location RIGHT AFTER a previous MATCH
) # END of the 1st NON-CAPTURING group
(?!->) # If the TWO NEXT chars are DIFFERENT from the string '->'
(?: # START of the 2nd NON-CAPTURING group
(?!<!--[ ]MAIN[ ]FINAL[ ]-->). # If CURRENT character is NOT the BEGINNING of the string '<!-- MAIN FINAL -->'
) # END of the 2nd NON-CAPTURING group
*? # The SHORTEST, possibly EMPTY, range of ANY character, till... : See •, below
(?: # START of the 3rd NON-CAPTURING group
^\h* # • Any LEADING BLANK characters, followed with ...
( # START of group 2
<p[ ]class=".+?</p> # The SHORTEST, NON EMPTY, range of characters between the strings '<p class="' and '</p>'
(\R)? # And followed with an OPTIONAL line-break, STORED as group 3
) # END of group 2
| # OR
^(?:\h*\R)*\h* # • An OPTIONAL range of BLANK or EMPTY lines, followed with OPTIONAL HORIZONTAL BLANK chars
(<!--[ ]MAIN[ ]FINAL[ ]-) # And followed with the string '<!-- MAIN FINAL -', STORED as group 4
) # END of the 3rd NON-CAPTURING group
Notes :
In this mode, any literal space char must be escaped with the \ character or written [ ] !
Following the same method, as previously described, we just search for the ending string <!-- MAIN FINAL - and the last two chars -> are inserted in the negative look-ahead (?!->)
Best Regards,
guy038