Hi, All,
For instance, let’s imagine this fifth example with the regex :
MARK
(?-i)(?:before_1|before_2)\u+(?:after_1|after_2)(*SKIP)(*F)|\u+
From this regex, we deduce that :
If an upper-case word \u+ is, both :
Preceded by the string before_1 or the string before_2
AND
Followed by the string after_1 or the string after_2
=> This match will be discarded, due to the combination (*SKIP)(*F)
In
all other cases, the
right branch of the alternation
\u+ is used and an
upper-case word will be
matched !
You may verify my hypotheses, against the INPUT text, below, pasted in a new tab
before_XYZafter_
before_XYZafter_1
before_XYZafter_2
before_1XYZafter
before_1XYZafter_1
before_1XYZafter_2
before_2XYZafter
before_2XYZafter_1
before_2XYZafter_2
Finally, this sixth example, will take in account, both, the recursion and the (*SKIP)(*F) syntax. Again, it’s a derived example from this article, on stackoverflow.com :
https://stackoverflow.com/questions/70216280
The regex below tries to match any line with unbalanced level of parentheses. This kind of search needs the use of recursion to be achieved !
MARK (\((?:[^()\r\n]++|(?1))*\))(*SKIP)(*F)|[()]
Some explanations :
The Group 1 contents is the string \((?:[^()\r\n]++)*\) which represents a correct balanced level of parentheses ( i.e. an atomic group of characters, different from parentheses and line-breaks, surrounded with a couple of parentheses )
The recursion is then realized by the insertion of the group 1, so the (?1) syntax, as an alternative, within the contents of the Group 1 itself
If correct sets of parentheses have been found in current line, the match is then discarded
Ia an incorrect set is found, the right branch of the alternation [()], after the (*SKIP)(*F) part will match any parenthesis
After running this regex against the INPUT text below that you’ll paste in a new tab :
(abc)
((abc)
(ab(c)))
((a)bc)
(((((a)(b)(c))
(a(b)c)
((a))bc)
(ab(c))
(a((b)c)
(a(bc))
((ab)c)
((a)(b)(c))
((a((bc))
((ab))))c)
You should find 12 matches in the 7 lines below :
((abc)
(ab(c)))
(((((a)(b)(c))
((a))bc)
(a((b)c)
((a((bc))
((ab))))c)
Of course, for any marked line, with unbalanced levels of parentheses in lines, you must study where are the excess parentheses to be removed in order to get correct sets of parentheses !
For example :
The unbalanced expression (a((b)c) can be interpreted, either, as the correct sets a((b)c) or (a(b)c) !
The unbalanced expression ((ab))))c) can be interpreted, either, as the correct sets ((ab))c`` or ((ab)c)
Best Regards,
guy038
P.S. : Yet, another example of the (*SKIP)(*F) technique in this article, on stackoverflow.com :
https://stackoverflow.com/questions/53066132
FIND (?i)\b(?:county coast|at the|grant pass)\b(*SKIP)(*F)|\b(?:coast|the|pass)\b
Globally, this regex searches, whatever the case, for :
Any word
coast, if NOT preceded by the word
county
Or
Any word
the, if NOT preceded by the word
at
Or
Any word
pass, if NOT preceded by the word
grant