Generic Regex: Logic Gates for Regular Expressions
PeterJones last edited by PeterJones
There are times that you want specific logic in your regular expression, to handle combining two or more conditions in certain ways. For example, you might want to match only lines that contain
cookie, but that don’t also contain
biscuit; logically, this would be expressed as “match (
cookie) AND (NOT
The following expressions use positive lookaheads for saying “it must match” and negative lookaheads for saying “it must not match” (logical “NOT”). By using lookaheads with
.*at the start, it means that it can be found anywhere on the line (or file) in any order. But it also doesn’t consume anything, so alone, these expressions are “zero width”.
☐ . matches newlineis not checked (☐), or if
(?-s)is enabled in your regex, then those expressions will mean “a single line matches the logic expression chosen”. if
☑ . matches newlineis checked (☑) (or
(?s)is enabled in your regex), then those expressions will mean “the whole file matches the logic expression chosen”.
All these expresions require
Search Mode = ☑ Regular Expressionto work.
logic two-term expression n-term expression notes OR
must match at least one AND
must match all XOR
too complicated match one or the other, but not both NOR
matches neither one nor the other NOR
second syntax for the same concept NAND
may match zero or one of the terms, but not both
In case it’s not obvious,
nnnare meant as placeholders for the terms or regex sub-expressions that you’re actually trying to combine.
If you actually want to select the whole line, you cannot just use the zero-width forms above. Instead, you would use
(?-s)(?:×××)^.*$(where the ××× is the two-term or n-term expression): the
(?-s)will force it to contain the match on a single-line (by making the
.wildcard not match newlines), the
(?:×××)will add the logic expression that you derived from the table, and the
^.*$will cause it to actually match the whole line, rather than the zero-width sequence at the start of the line.
Similarly, if you want to Find in Files to figure out which files match the logic expression, you can use
.match newlines, so the
.*in each of the logic terms can match anywhere in the file; then the
\Awill match only at the start of the file, the
(×××)will do the logic on the whole file; and the
.will mean that it actually marks/matches the first character in each of the files it finds (which will make it actually report something in the Find Results after running Find in Files).
(?-s)(?:×××)^.*$given above, if you chose the “two-term expression” and had the literal
bbbas your search terms, the following “truth table” will show a T on the conditions that would match:
text | OR | AND | XOR | NOR | NOR | NAND --------------|----+-----+-----+-----+-----+------ aaa other aaa | T | | T | | | T aaa other bbb | T | T | | | | aaa other ccc | T | | T | | | T bbb other aaa | T | T | | | | bbb other bbb | T | | T | | | T bbb other ccc | T | | T | | | T ccc other aaa | T | | T | | | T ccc other bbb | T | | T | | | T ccc other ccc | | | | T | T | T
In other words, the text
aaa other aaawill be matched by the OR expression
(?=.*aaa|.*bbb), or by the XOR expression
(?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb), or by the NAND expression
(?!(?=.*aaa)(?=.*bbb)). (Similar interpretation for the other rows in the truth table.)
For a more concrete example,
I baked a cake. The Brits call some cookies "biscuits". I ate a cookie. The fridge contains one cake and one cookie.
With the expression
(?-s)(?:(?=.*cake|.*cookie)(?!.*biscuit))^.*$, says “(
cookie) AND NOT
biscuit”, which was my first example of what you might want to implement. In this text, the first, third, and fourth lines would match, but the second line would not match (because it contains
biscuit). This also give an example of how to combine multiple of the logic conditions together in one expression
And @Alan-Kilborn came up with the following example:
Say you want to match some particular combination of
Tedon a line – here’s examples on how to do that:
Logic Expression to use Match entire line when… OR
Ted(or both) is present, in either order
Tedare present, in either order
Tedis present, but not when both are present
Tedare present (form 1)
Tedare present (form 2)
neither are present or one is present, but not when both are present
Originally developed in developing generic regex sequences and subsequent discussion
For other generic expressions, see FAQ Desk: Generic Regular Expression (regex) Formulas
Lycan Thrope last edited by
Thank you for this synopsis. This will take a little while to digest, but at least there is a succinct description we neophytes can come and learn the way you explain things in detail. Thanks again.