<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Generic Regex : How to use the couple of &quot;Backtracking Control&quot; verbs (*SKIP)(*FAIL) or (*SKIP)(*F) in regexes]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto">In this post, I will present the <strong><code>(*SKIP)(*FAIL)</code></strong> ( or <strong><code>(*SKIP)(*F)</code></strong> ) <strong>powerful</strong> feature, used within regexes and made of these two <strong>Backtracking Control</strong> verbs.</p>
<p dir="auto">Some <strong>general</strong> considerations :</p>
<p dir="auto"><strong>Backtracking control</strong> verbs can be described as <strong>zero-width</strong> assertions, absolutely <strong>invisible</strong> when the regex engine looks <strong>forward</strong>, in <strong>current</strong> regex pattern.</p>
<p dir="auto">Now, if we consider the <strong>general</strong> pattern <strong><code>Regex_A(*VERB)Regex_B</code></strong>, note that <strong>backtracking</strong> is allowed, both, inside the <strong><code>Regex_A</code></strong> pattern and/or inside the <strong><code>Regex_B</code></strong> pattern. However, the regex engine <strong>cannot</strong> backtrack from the <strong><code>Regex_B</code></strong> part to the <strong><code>Regex_A</code></strong> part, crossing <strong>backward</strong> the <strong><code>(*VERB)</code></strong> assertion !</p>
<p dir="auto">I will describe <strong>six</strong> different examples, using the <strong><code>(*SKIP)(*FAIL)</code></strong> functionality. And, regarding the <strong>first</strong> example, I’ll also show the <strong>difference</strong> introduced if we <strong>omit</strong>, on purpose, either the <strong><code>(*SKIP)</code></strong> or the <strong><code>(*FAIL)</code></strong> <strong>Backtracking Control</strong> verb.</p>
<p dir="auto">The associated <strong>generic</strong> regex is :</p>
<p dir="auto"><strong>Regex A<code>(*SKIP)(*F)|</code>Regex B</strong></p>
<p dir="auto">From the different examples, below, you should be able to <strong>guess</strong> the contents of, both, the <strong><code>Regex A</code></strong> and <strong><code>Regex B</code></strong> parts !</p>
<hr />
<p dir="auto">The <strong>first</strong> example is a <strong>derived</strong> example from this article, shown in <strong><code>rexegg.com</code></strong> site :</p>
<p dir="auto"><a href="https://rexegg.com/backtracking-control-verbs.php#skipfail" rel="nofollow ugc">https://rexegg.com/backtracking-control-verbs.php#skipfail</a></p>
<p dir="auto">Suppose that we want to match <strong>single</strong> words as long as they are <em>NOT</em> <strong>embedded</strong> between <strong>two curly</strong> braces <strong><code>{...}</code></strong> and surround them with two <strong>slashs</strong>. We can use this S/R :</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>{\w+}(*SKIP)(*FAIL)|\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>/$0/</code></strong></p>
</li>
</ul>
<p dir="auto">Against this <em>INPUT</em> text :</p>
<pre><code class="language-diff">This {is} a simple text to {see} how this {regex} works
</code></pre>
<p dir="auto">This S/R would return the following <em>OUTPUT</em> text :</p>
<pre><code class="language-diff">/This/ {is} /a/ /simple/ /text/ /to/ {see} /how/ /this/ {regex} /works/
</code></pre>
<p dir="auto">Here is how this <strong>pattern</strong> works :</p>
<ul>
<li>
<p dir="auto">First, the regex part <strong><code>{\w+}</code></strong> tries to match any <strong>single</strong> word between a set of <strong>curly braces</strong>.</p>
</li>
<li>
<p dir="auto">As the first word <strong><code>This</code></strong> is not within <strong>curly</strong> braces, it skips the <strong>left</strong> branch of the <strong><code>|</code></strong> alternation and, obviously, matches the <strong>right</strong> branch <strong><code>\w+</code></strong>  =&gt; After replacement, we get the string <strong><code>/This/</code></strong>.</p>
</li>
<li>
<p dir="auto">Now, the regex engine position is <strong>right</strong> after the <strong><code>/</code></strong>. As the <strong>space</strong> character cannot match any branch of the regex, the regex engine advances to the <strong>next</strong> position, right before the string <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">Again, it tries to match the <strong>left</strong> branch <strong><code>{\w+}</code></strong> which, indeed, is the <strong>successful</strong> match <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">But it encounters the <strong><code>(*SKIP)</code></strong> verb which <strong>always</strong> matches and, then, the <strong><code>(*FAIL)</code></strong> verb which <strong>always</strong> fails.</p>
</li>
<li>
<p dir="auto">So, the combination of the two <strong>backtracking control</strong> verbs <strong><code>(*SKIP)(*FAIL)</code></strong>  :</p>
<ul>
<li>
<p dir="auto">Do <strong>not</strong> allow any <strong>backtracking</strong> process</p>
</li>
<li>
<p dir="auto"><strong>Cancels</strong> the <strong>current</strong> match <strong><code>{is}</code></strong>, so far =&gt; The contents <strong><code>{is}</code></strong>, that we want to avoid, has been <strong>skipped</strong>.</p>
</li>
<li>
<p dir="auto"><strong>Resets</strong> the regex engine position to the position right before <strong><code>(*SKIP)</code></strong>.</p>
</li>
</ul>
</li>
<li>
<p dir="auto">So, the position of the regex engine is now <strong>right</strong> after the <strong>closing</strong> curly brace of the string <strong><code>{is}</code></strong></p>
</li>
<li>
<p dir="auto">Again, the next <strong>space</strong> character <strong>cannot</strong> be matched at all and the regex engine advances to the <strong>next</strong> position right before the article <strong><code>a</code></strong></p>
</li>
<li>
<p dir="auto">As this word <strong><code>a</code></strong> is not within <strong>curly</strong> braces, the regex engine skips the <strong>left</strong> branch of the <strong><code>|</code></strong> alternation and, obviously, matches the <strong>right</strong> branch <strong><code>\w+</code></strong>  =&gt; After replacement, we get the string <strong><code>/a/</code></strong></p>
</li>
</ul>
<p dir="auto">And so on …</p>
<hr />
<p dir="auto">Note that the <strong>only</strong> possibility for the pattern to <strong>succeed</strong>, against the example text, is that the <strong>left</strong> branch search <strong>fails</strong> before <strong><code>(*SKIP)</code></strong>, in order to allow the <strong>right</strong> branch to be tested. That is the case, for instance, at the <strong>very beginning</strong> against the first word <strong><code>This</code></strong>.</p>
<p dir="auto">But, as soon as the <strong>left</strong> branch can be <strong>matched</strong>, the combination <strong><code>(*SKIP)(*FAIL)</code></strong> just <strong>discards</strong> the match, so far and <strong>restarts</strong> the regex engine at the <strong>position</strong> right before <strong><code>(*SKIP)</code></strong> within the text to examine.</p>
<hr />
<p dir="auto">Note also that, generally, the <strong><code>(*SKIP)(*FAIL)</code></strong> syntax is not <strong>mandatory</strong>. We could have achieved the <strong>same</strong> result with any of the <strong>three</strong> following S/R :</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>(?&lt;=[^{])\w++(?=[^}])</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>/$0/</code></strong></p>
</li>
</ul>
<p dir="auto">Or :</p>
<ul>
<li>
<p dir="auto">FIND  <strong><code>{\w+}|(\w+)</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE  <strong><code>?1/$0/:$0</code></strong></p>
</li>
</ul>
<p dir="auto">Or :</p>
<ul>
<li>
<p dir="auto">SEARCH  <strong><code>(?:{\w+}.*?)?\K\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>/$0/</code></strong></p>
</li>
</ul>
<p dir="auto">The <strong>three</strong> S/R, above, <strong>do</strong> give the <strong>same</strong> <em>OUTPUT</em> text as the regex with the <strong><code>(*SKIP)(*FAIL)</code></strong> syntax, i.e. :</p>
<pre><code class="language-/diff/">/This/ {is} /a/ /simple/ /text/ /to/ {see} /how/ /this/ {regex} /works/
</code></pre>
<hr />
<p dir="auto">Now, let’s go back to our <strong>first</strong> syntax and get rid of the <strong><code>(*SKIP)</code></strong> <strong>backtracking control</strong> verb :</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>{\w+}(*FAIL)|\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>/$0/</code></strong></p>
</li>
</ul>
<p dir="auto">And let’s try to imagine how this <strong>new</strong> pattern works :</p>
<ul>
<li>
<p dir="auto">First, the regex part <strong><code>{\w+}</code></strong> tries to match any <strong>single</strong> word between a set of <strong>curly braces</strong>.</p>
</li>
<li>
<p dir="auto">As the first word <strong><code>This</code></strong> is not within <strong>curly</strong> braces, it skips the <strong>left</strong> branch of the <strong><code>|</code></strong> alternation and, obviously, matches the <strong>right</strong> branch <strong><code>\w+</code></strong>  =&gt; After replacement, we get the string <strong><code>/This/</code></strong>.</p>
</li>
<li>
<p dir="auto">Now, the regex engine position is <strong>right</strong> after the <strong><code>/</code></strong>. As the <strong>space</strong> character cannot match any branch of the regex, the regex engine advances to the <strong>next</strong> position, right before the string <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">Again, it tries to match the <strong>left</strong> branch <strong><code>{\w+}</code></strong> which, indeed, is the <strong>successful</strong> match <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">Then, it encounters the <strong><code>(*FAIL)</code></strong> verb which <strong>always</strong> fails.</p>
</li>
<li>
<p dir="auto">Because of the <strong><code>(*FAIL)</code></strong> control verb, the current match <strong><code>{is}</code></strong> is <strong>discarded</strong> and, as the <strong>right</strong> branch of the alternation <strong>cannot</strong> match either, the whole regex fails and the position of the regex engine advances to the <strong>next</strong> position, right <strong>after</strong> the <strong>opening</strong> curly brace of the string <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">This time, the word <strong><code>is</code></strong> can match the <strong>second</strong> branch of the alternative <strong><code>\w+</code></strong>  =&gt; So, after replacement, we get the string <strong><code>{/is/}</code></strong>.</p>
</li>
</ul>
<p dir="auto">And so on …</p>
<p dir="auto">As <strong>expected</strong>, it would produce this kind of <em>OUTPUT</em> :</p>
<pre><code class="language-diff">/This/ {/is/} /a/ /simple/ /text/ /to/ {/see/} /how/ /this/ {/regex/} /works/
</code></pre>
<hr />
<p dir="auto">Again, let’s go back to our <strong>first</strong> syntax and get rid of the <strong><code>(*FAIL)</code></strong> <strong>backtracking control</strong> verb :</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>{\w+}(*SKIP)|\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>/$0/</code></strong></p>
</li>
</ul>
<p dir="auto">And let’s try to imagine how this <strong>other</strong> pattern works :</p>
<ul>
<li>
<p dir="auto">First, the regex part <strong><code>{\w+}</code></strong> tries to match any <strong>single</strong> word between a set of <strong>curly braces</strong>.</p>
</li>
<li>
<p dir="auto">As the first word <strong><code>This</code></strong> is not within <strong>curly</strong> braces, it skips the <strong>left</strong> branch of the <strong><code>|</code></strong> alternation and, obviously, matches the <strong>right</strong> branch <strong><code>\w+</code></strong>  =&gt; After replacement, we get the string <strong><code>/This/</code></strong>.</p>
</li>
<li>
<p dir="auto">Now, the regex engine position is <strong>right</strong> after the <strong><code>/</code></strong>. As the <strong>space</strong> character <strong>cannot</strong> match any branch of the regex, the regex engine advances to the <strong>next</strong> position, right before the string <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">Again, it tries to match the <strong>left</strong> branch <strong><code>{\w+}</code></strong> which, indeed, is the <strong>successful</strong> match <strong><code>{is}</code></strong>.</p>
</li>
<li>
<p dir="auto">Then, it encounters the <strong><code>(*SKIP)</code></strong> verb which <strong>always</strong> matches.</p>
</li>
<li>
<p dir="auto">So far, as no <strong><code>(*FAIL)</code></strong> occurs, the part <strong><code>{\w+}(*SKIP)</code></strong> is a <strong>true</strong> match which, after replacement, returns the string <strong><code>/{is}/</code></strong></p>
</li>
</ul>
<p dir="auto">( Note that if a <strong>successful</strong> match has <em>NOT</em> been found, as the <strong><code>(*SKIP)</code></strong> verb does <em>NOT</em> allow any backtracking, the regex engine would have <strong>discarded</strong> the current match attempt and would have <strong>advanced</strong> to the position right after the string <strong><code>{is}</code></strong> )</p>
<ul>
<li>
<p dir="auto">Now, the regex engine position is <strong>right</strong> after the last <strong><code>/</code></strong>. As the <strong>space</strong> character cannot match any branch of the regex, the regex engine advances to the <strong>next</strong> position, right before the string <strong><code>a</code></strong></p>
</li>
<li>
<p dir="auto">As this word is not within <strong>curly</strong> braces, it will simply match the <strong>right</strong> branch of the alternation <strong><code>\w+</code></strong> which, after replacement, gives the string <strong><code>/a/</code></strong></p>
</li>
</ul>
<p dir="auto">And so on …</p>
<p dir="auto">So the <em>OUTPUT</em> text becomes :</p>
<pre><code class="language-diff">/This/ /{is}/ /a/ /simple/ /text/ /to/ /{see}/ /how/ /this/ /{regex}/ /works/
</code></pre>
<p dir="auto"><strong>Two</strong> remarks :</p>
<ul>
<li>
<p dir="auto">Given the present <em>INPUT</em> text, the <strong><code>{\w+}</code></strong> and <strong><code>\w+</code></strong> are equally matched. So, the <strong>search</strong> regex could have been expressed, without any <strong>backtracking control</strong> verb, as simply :</p>
<ul>
<li>FIND <strong><code>{\w+}|\w+</code></strong></li>
</ul>
</li>
<li>
<p dir="auto">See the subtle <strong>difference</strong> with the preceding example, regarding the words embedded in <strong>curly</strong> braces ( <strong><code>{/is/}</code></strong> vs <strong><code>/{is}/</code></strong> ! )</p>
</li>
</ul>
<hr />
<p dir="auto">Now, let’s show a <strong>second</strong> example of the <strong><code>(*SKIP)(*F)</code></strong> feature with a table, where the <strong>third</strong> delimiter <strong><code>|</code></strong> has been changed into the <strong><code>#</code></strong> symbol :</p>
<pre><code class="language-z">| abc | def # ghi | jkl | mno |
| abc | def # ghi | jkl | mno |
</code></pre>
<p dir="auto">Then, the following <strong>regex</strong> S/R would change any <strong>word</strong>, located after the <strong><code>#</code></strong> character, with the <strong>all-uppercase</strong> same word  :</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>(?-s).*#(*SKIP)(*F)|\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\U$0</code></strong></p>
</li>
</ul>
<pre><code class="language-z">| abc | def ! GHI | JKL | MNO |
| abc | def ! GHI | JKL | MNO |
</code></pre>
<p dir="auto">However, note that we can use, either, the following regex S/R, <strong>without</strong> any <strong>backtracking control</strong> verb, for the <strong>same</strong> result !</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>(?-s)(?!.*#)\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE  <strong><code>\U$0</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">Let’s go on with this <strong>third</strong> example, which run an S/R from after a <strong>specific</strong> and <strong>unique</strong> character till the <strong>very end</strong> of file.</p>
<p dir="auto">The following S/R will change all words <strong>uppercase</strong>, <strong>after</strong> an <strong>unique</strong> <strong><code>#</code></strong> symbol, located near the <strong>middle</strong> of the file, till the <strong>very end</strong> of file.</p>
<ul>
<li>
<p dir="auto">FIND <strong><code>(?s)\A^.*#(*SKIP)(*F)|\w+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\U$0</code></strong></p>
</li>
</ul>
<p dir="auto">So, from this <em>INPUT</em> text, placed in a <strong>new</strong> tab ( <em>IMPORTANT</em> )</p>
<pre><code class="language-diff">    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    #
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see &lt;https://www.gnu.org/licenses/&gt;.
</code></pre>
<p dir="auto">We get the <em>OUTPUT</em> text, below :</p>
<pre><code class="language-diff">    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    #
    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.  SEE THE
    GNU GENERAL PUBLIC LICENSE FOR MORE DETAILS.

    YOU SHOULD HAVE RECEIVED A COPY OF THE GNU GENERAL PUBLIC LICENSE
    ALONG WITH THIS PROGRAM.  IF NOT, SEE &lt;HTTPS://WWW.GNU.ORG/LICENSES/&gt;.
</code></pre>
<p dir="auto">Again, note that the <strong>shorter</strong> search syntax, <strong><code>(?s)\A^.*#\K|\w+</code></strong>, <strong>without</strong> any <strong>backtracking control</strong> verb, would produce the <strong>same</strong> result !</p>
<hr />
<p dir="auto">This <strong>fourth</strong> example is a <strong>derived</strong> example from this article on the <strong><code>stackoverflow.com</code></strong> site :</p>
<p dir="auto"><a href="https://stackoverflow.com/questions/24534782/" rel="nofollow ugc">https://stackoverflow.com/questions/24534782/</a></p>
<p dir="auto">It matches <strong>any</strong> line of text, which does <em>NOT</em> contain the <strong>upper-case</strong> <strong><code>ABC</code></strong> string, anywhere in <strong>current</strong> line, but <em>DO</em> contain an <strong>upper-case</strong> <strong><code>DEF</code></strong> string, if and only if, it’s followed, further on, with the <strong>upper-case</strong> <strong><code>XYZ</code></strong> string. This can be achieved with :</p>
<p dir="auto">MARK <strong><code>(?-is).*ABC(*SKIP)(*F)|^.*DEF(?=.*XYZ).*</code></strong></p>
<p dir="auto">If we consider the following <em>INPUT</em> text :</p>
<pre><code class="language-diff">test
DEF test XYZ
DEF test
test XYZ
DEFXYZ

ABC test
ABC DEF test XYZ
ABC DEF test
ABC test XYZ
ABC DEFXYZ

test ABC
DEF test ABC XYZ
DEF test ABC
test ABC XYZ
DEFABCXYZ

test ABC
DEF test XYZ ABC
DEF test ABC
test XYZ ABC
DEFXYZABC

test OK
DEF test XYZ OK
DEF test OK
test XYZ OK
DEFXYZ OK

ABC test OK
ABC DEF test XYZ OK
ABC DEF test OK
ABC test XYZ OK
ABC DEFXYZ OK

test ABC OK
DEF test ABC XYZ OK
DEF test ABC OK
test ABC XYZ OK
DEFABCXYZ OK

test ABC OK
DEF test XYZ ABC OK
DEF test ABC OK
test XYZ ABC OK
DEFXYZABC OK

this test
this DEF test XYZ
this DEF test
this test XYZ
this DEFXYZ

this ABC test
this ABC DEF test XYZ
this ABC DEF test
this ABC test XYZ
This ABC DEFXYZ

this test ABC
this DEF test ABC XYZ
this DEF test ABC
this test ABC XYZ
This DEFABCXYZ

this test ABC
this DEF test XYZ ABC
this DEF test ABC
this test XYZ ABC
this DEFXYZABC

this test OK
this DEF test XYZ OK
this DEF test OK
this test XYZ OK
this DEFXYZ OK

this ABC test OK
this ABC DEF test XYZ OK
this ABC DEF test OK
this ABC test XYZ OK
this ABCDEFXYZ OK

this test ABC OK
this DEF test ABC OK XYZ
this DEF test ABC OK
this test ABC XYZ OK
this DEFABCXYZ OK

this test ABC OK
this DEF test XYZ ABC OK
this DEF test ABC OK
this test XYZ ABC OK
this DEFXYZABC OK
</code></pre>
<p dir="auto">The <strong>search</strong> regex would <strong>mark</strong> all these lines of our <em>INPUT</em> text :</p>
<pre><code class="language-diff">DEF test XYZ
DEFXYZ
DEF test XYZ OK
DEFXYZ OK
this DEF test XYZ
this DEFXYZ
this DEF test XYZ OK
this DEFXYZ OK
</code></pre>
<p dir="auto">And, as <strong>expected</strong>, all these lines  :</p>
<ul>
<li>
<p dir="auto">Do <em>NOT</em> contain any string <strong><code>ABC</code></strong></p>
</li>
<li>
<p dir="auto"><em>DO</em> contain a string <strong><code>DEF</code></strong></p>
</li>
<li>
<p dir="auto"><em>DO</em> contain a string <strong><code>XYZ</code></strong>, located <strong>after</strong> the string <strong><code>DEF</code></strong></p>
</li>
</ul>
<p dir="auto">Once again, note that the regex <strong><code>(?-is)^((?!ABC).)*DEF(?=.*XYZ)((?!ABC).)*$</code></strong>, without any <strong><code>(*SKIP)(*F)</code></strong> syntax, would also mark the <strong>same</strong> results. But I admit that it’s a bit <strong>harder</strong> to understand !</p>
<hr />
<p dir="auto">In a nutshell, from the examples above, we can express the <strong><code>(*SKIP)(*F)</code></strong> feature as follows :</p>
<pre><code class="language-diff">    ( Regex A ) (*SKIP)(*F) |  ( Regex B )
                            |
    &lt;---- Part SKIPPED ----&gt;|&lt;- Part MATCHED -&gt;
</code></pre>
<p dir="auto">When the regex to match is simply <strong>duplicated</strong> at the <strong>left</strong> of the <strong><code>(SKIP)</code></strong> <strong>Backtracking Control`</strong> verb, this diagram becomes :</p>
<pre><code class="language-diff">    ( Regex A ) (*SKIP)(*F) |  ( Regex A )
                            |
    &lt;---- Part SKIPPED ----&gt;|&lt;- Part MATCHED -&gt;
</code></pre>
<p dir="auto">Or, more interesting, the template :</p>
<pre><code class="language-diff">    &lt;------------------------------------------ Regex A ------------------------------------------&gt;              |  &lt;----- Regex B -----&gt;
	                                                                                                             |
    ( Cond 1 | Cond 2 | .... |  Cond N ) (  MAIN Search Regex ) ( Cond 1 | Cond 2 | .... | Cond M ) (*SKIP)(*F)  |  ( MAIN Search Regex )
                                                                                                                 |
    &lt;--------------------------------------------- What is SKIPPED ---------------------------------------------&gt;|&lt;-- What is MATCHED --&gt;
</code></pre>
<p dir="auto">Meaning that :</p>
<ul>
<li>
<p dir="auto">If the <strong>main</strong> regex to match is <strong>preceded</strong> by a condition, from <strong><code>1</code></strong> to <strong><code>N</code></strong> and <strong>followed</strong> by a condition, from <strong><code>1</code></strong> to <strong><code>M</code></strong>, <em>NOT</em> wanted, this <strong>whole</strong> regex/match is <strong>discarded</strong></p>
</li>
<li>
<p dir="auto">In <em>ALL</em> other cases, the <strong>main</strong> regex, in the <strong>right</strong> branch of the alternation is <strong>always</strong> matched !</p>
</li>
</ul>
<p dir="auto">Note that this <strong>last</strong> template, with <strong>multiple</strong> conditions to avoid, can <strong>hardly</strong> be realized <strong>without</strong> the use of the <strong><code>(*SKIP)(*F)</code></strong> part !!</p>
<p dir="auto">Continuation and end on <strong>next</strong> post !</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/26812/generic-regex-how-to-use-the-couple-of-backtracking-control-verbs-skip-fail-or-skip-f-in-regexes</link><generator>RSS for Node</generator><lastBuildDate>Fri, 08 May 2026 08:51:49 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/26812.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 26 Apr 2025 11:34:19 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Generic Regex : How to use the couple of &quot;Backtracking Control&quot; verbs (*SKIP)(*FAIL) or (*SKIP)(*F) in regexes on Mon, 28 Apr 2025 11:13:29 GMT]]></title><description><![CDATA[<p dir="auto">I was feeling deja vu as I was reading this blog, and the feeling was not misplaced.<br />
These verbs were discussed previously as part of the more generic discussion here <a href="https://community.notepad-plus-plus.org/topic/19632">FAQ: Regex “Backtracking Control Verbs”</a>; follow the link and the use the “find” function of your browser to jump to the text <code>_______________ (*SKIP)(*FAIL) _______________</code>.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/101289</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/101289</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 28 Apr 2025 11:13:29 GMT</pubDate></item><item><title><![CDATA[Reply to Generic Regex : How to use the couple of &quot;Backtracking Control&quot; verbs (*SKIP)(*FAIL) or (*SKIP)(*F) in regexes on Sat, 26 Apr 2025 16:07:06 GMT]]></title><description><![CDATA[<p dir="auto">Hello <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: alan-kilborn">@<bdi>alan-kilborn</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">Sorry for the <strong>typo</strong>. The correct link is :</p>
<p dir="auto"><a href="https://stackoverflow.com/questions/24534782/" rel="nofollow ugc">https://stackoverflow.com/questions/24534782/</a></p>
<p dir="auto">I also edit my <strong>previous</strong> post</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/101270</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/101270</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 26 Apr 2025 16:07:06 GMT</pubDate></item><item><title><![CDATA[Reply to Generic Regex : How to use the couple of &quot;Backtracking Control&quot; verbs (*SKIP)(*FAIL) or (*SKIP)(*F) in regexes on Sat, 26 Apr 2025 15:29:04 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> said :</p>
<blockquote>
<p dir="auto"><a href="https://stackoverflow.com/questions/27534782/" rel="nofollow ugc">https://stackoverflow.com/questions/27534782/</a></p>
</blockquote>
<p dir="auto">This results in “Page not found” for me.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/101269</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/101269</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Sat, 26 Apr 2025 15:29:04 GMT</pubDate></item><item><title><![CDATA[Reply to Generic Regex : How to use the couple of &quot;Backtracking Control&quot; verbs (*SKIP)(*FAIL) or (*SKIP)(*F) in regexes on Sun, 27 Apr 2025 00:49:00 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto">For instance, let’s imagine this <strong>fifth</strong> example with the regex :</p>
<ul>
<li>MARK <strong><code>(?-i)(?:before_1|before_2)\u+(?:after_1|after_2)(*SKIP)(*F)|\u+</code></strong></li>
</ul>
<p dir="auto">From this regex, we deduce that :</p>
<ul>
<li>
<p dir="auto">If an <strong>upper-case</strong> word <strong><code>\u+</code></strong> is, <strong>both</strong> :</p>
<ul>
<li>
<p dir="auto"><strong>Preceded</strong> by the string <strong><code>before_1</code></strong> <em>OR</em> the string <strong><code>before_2</code></strong></p>
</li>
<li>
<p dir="auto"><em>AND</em></p>
</li>
<li>
<p dir="auto"><strong>Followed</strong> by the string <strong><code>after_1</code></strong> <em>OR</em> the string <strong><code>after_2</code></strong></p>
</li>
</ul>
</li>
</ul>
<p dir="auto">=&gt; This match will be <strong>discarded</strong>, due to the combination <strong><code>(*SKIP)(*F)</code></strong></p>
<ul>
<li>In <strong>all</strong> other cases, the <strong>right</strong> branch of the alternation <strong><code>\u+</code></strong> is used and an <strong>upper-case</strong> word will be <strong>matched</strong> !</li>
</ul>
<p dir="auto">You may verify my <strong>hypotheses</strong>, against the <em>INPUT</em> text, below, pasted in a <strong>new</strong> tab</p>
<pre><code class="language-diff">before_XYZafter_
before_XYZafter_1
before_XYZafter_2

before_1XYZafter
before_1XYZafter_1
before_1XYZafter_2

before_2XYZafter
before_2XYZafter_1
before_2XYZafter_2
</code></pre>
<hr />
<p dir="auto">Finally, this <strong>sixth</strong> example, will take in account, both, the <strong>recursion</strong> and the <strong><code>(*SKIP)(*F)</code></strong> syntax. Again, it’s a <strong>derived</strong> example from this article, on the <strong><code>stackoverflow.com</code></strong> site  :</p>
<p dir="auto"><a href="https://stackoverflow.com/questions/70216280" rel="nofollow ugc">https://stackoverflow.com/questions/70216280</a></p>
<p dir="auto">The regex below tries to match <strong>any</strong> line with <strong>unbalanced</strong> level of parentheses. This kind of search needs the use of <strong>recursion</strong> to be achieved !</p>
<p dir="auto">MARK <strong><code>(\((?:[^()\r\n]++|(?1))*\))(*SKIP)(*F)|[()]</code></strong></p>
<p dir="auto">Some explanations :</p>
<ul>
<li>
<p dir="auto">The <strong><code>Group 1</code></strong> contents is the string <strong><code>\((?:[^()\r\n]++)*\)</code></strong> which represents a correct <strong>balanced</strong> level of parentheses ( i.e. an <strong>atomic</strong> group of characters, <strong>different</strong> from parentheses and line-breaks, <em>surrounded</em> with a couple of <strong>parentheses</strong> ).</p>
</li>
<li>
<p dir="auto">The <strong>recursion</strong> is then realized by the insertion of the <strong>group 1</strong>, so the <strong><code>(?1)</code></strong> syntax, as an alternative, <strong>within</strong> the contents of the <strong><code>Group 1</code></strong> <strong>itself</strong>.</p>
</li>
<li>
<p dir="auto">If <strong>correct</strong> sets of parentheses have been found in current line, the match is then <strong>discarded</strong>, because of the <strong><code>(*SKIP)(*F)</code></strong> part.</p>
</li>
<li>
<p dir="auto">If an <strong>incorrect</strong> set is found, the <strong>right</strong> branch of the alternation <strong><code>[()]</code></strong>, after the <strong><code>(*SKIP)(*F)</code></strong> part, will match any <strong>parenthesis</strong>.</p>
</li>
</ul>
<p dir="auto">After running this regex against the <em>INPUT</em> text below that you’ll paste in a <strong>new</strong> tab :</p>
<pre><code class="language-diff">(abc)
((abc)
(ab(c)))
((a)bc)
(((((a)(b)(c))
(a(b)c)
((a))bc)
(ab(c))
(a((b)c)
(a(bc))
((ab)c)
((a)(b)(c))
((a((bc))
((ab))))c)
</code></pre>
<p dir="auto">You should find <strong><code>12</code></strong> matches in the <strong><code>7</code></strong> lines below :</p>
<pre><code class="language-diff">((abc)
(ab(c)))
(((((a)(b)(c))
((a))bc)
(a((b)c)
((a((bc))
((ab))))c)
</code></pre>
<p dir="auto">Of course, for any <strong>marked</strong> line, with <strong>unbalanced</strong> levels of parentheses in lines, you must study <strong>where</strong> are the <strong>excess</strong> parentheses to be <strong>removed</strong> in order to get <strong>correct</strong> sets of parentheses !</p>
<p dir="auto">For example :</p>
<ul>
<li>
<p dir="auto">The <strong>unbalanced</strong> expression <strong><code>(a((b)c)</code></strong> can be interpreted, either, as the <strong>correct</strong> sets <strong><code>a((b)c)</code></strong> or <strong><code>(a(b)c)</code></strong> !</p>
</li>
<li>
<p dir="auto">The <strong>unbalanced</strong> expression <strong><code>((ab))))c)</code></strong> can be interpreted, either, as the <strong>correct</strong> sets <strong><code>((ab))c</code></strong> or <strong><code>((ab)c)</code></strong></p>
</li>
</ul>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> : Yet, another example of the <strong><code>(*SKIP)(*F)</code></strong> technique in this article, on the <strong><code>stackoverflow.com</code></strong> site  :</p>
<p dir="auto"><a href="https://stackoverflow.com/questions/53066132" rel="nofollow ugc">https://stackoverflow.com/questions/53066132</a></p>
<p dir="auto">FIND <strong><code>(?i)\b(?:county coast|at the|grant pass)\b(*SKIP)(*F)|\b(?:coast|the|pass)\b</code></strong></p>
<p dir="auto">Globally, this regex searches, <strong>whatever</strong> the case, for :</p>
<ul>
<li>Any word <strong>coast</strong>, if <em>NOT</em> preceded by the word <strong>county</strong></li>
</ul>
<p dir="auto">Or</p>
<ul>
<li>Any word <strong>the</strong>, if <em>NOT</em> preceded by the word <strong>at</strong></li>
</ul>
<p dir="auto">Or</p>
<ul>
<li>Any word <strong>pass</strong>, if <em>NOT</em> preceded by the word <strong>grant</strong></li>
</ul>
]]></description><link>https://community.notepad-plus-plus.org/post/101266</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/101266</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 27 Apr 2025 00:49:00 GMT</pubDate></item></channel></rss>