Generic Regex: Logic Gates for Regular Expressions
-
There are times that you want specific logic in your regular expression, to handle combining two or more conditions in certain ways. For example, you might want to match only lines that contain
cake
orcookie
, but that don’t also containbiscuit
; logically, this would be expressed as “match (cake
ORcookie
) AND (NOTbiscuit
)”.The following expressions use positive lookaheads for saying “it must match” and negative lookaheads for saying “it must not match” (logical “NOT”). By using lookaheads with
.*
at the start, it means that it can be found anywhere on the line (or file) in any order. But it also doesn’t consume anything, so alone, these expressions are “zero width”.If
☐ . matches newline
is not checked (☐), or if(?-s)
is enabled in your regex, then those expressions will mean “a single line matches the logic expression chosen”. if☑ . matches newline
is checked (☑) (or(?s)
is enabled in your regex), then those expressions will mean “the whole file matches the logic expression chosen”.All these expresions require
Search Mode = ☑ Regular Expression
to work.logic two-term expression n-term expression notes OR (?=.*aaa|.*bbb)
(?=.*aaa|.*bbb|...|.*nnn)
must match at least one AND (?=.*aaa)(?=.*bbb)
(?=.*aaa)(?=.*bbb)...(?=.*nnn)
must match all XOR (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
too complicated match one or the other, but not both NOR (?!.*aaa)(?!.*bbb)
(?!.*aaa)(?!.*bbb)...(?!.*nnn)
matches neither one nor the other NOR (?!.*(aaa|bbb))
(?!.*(aaa|bbb|...|nnn))
second syntax for the same concept NAND (?!(?=.*aaa)(?=.*bbb))
(?!(?=.*aaa)(?=.*bbb)...(?=.*nnn))
may match zero or one of the terms, but not both In case it’s not obvious,
aaa
andbbb
andnnn
are meant as placeholders for the terms or regex sub-expressions that you’re actually trying to combine.If you actually want to select the whole line, you cannot just use the zero-width forms above. Instead, you would use
(?-s)(?:×××)^.*$
(where the ××× is the two-term or n-term expression): the(?-s)
will force it to contain the match on a single-line (by making the.
wildcard not match newlines), the(?:×××)
will add the logic expression that you derived from the table, and the^.*$
will cause it to actually match the whole line, rather than the zero-width sequence at the start of the line.Similarly, if you want to Find in Files to figure out which files match the logic expression, you can use
(?s)\A^(×××).
, where(?s)
will make.
match newlines, so the.*
in each of the logic terms can match anywhere in the file; then the\A
will match only at the start of the file, the(×××)
will do the logic on the whole file; and the.
will mean that it actually marks/matches the first character in each of the files it finds (which will make it actually report something in the Find Results after running Find in Files).Examples
With the
(?-s)(?:×××)^.*$
given above, if you chose the “two-term expression” and had the literalaaa
andbbb
as your search terms, the following “truth table” will show a T on the conditions that would match:text | OR | AND | XOR | NOR | NOR | NAND --------------|----+-----+-----+-----+-----+------ aaa other aaa | T | | T | | | T aaa other bbb | T | T | | | | aaa other ccc | T | | T | | | T bbb other aaa | T | T | | | | bbb other bbb | T | | T | | | T bbb other ccc | T | | T | | | T ccc other aaa | T | | T | | | T ccc other bbb | T | | T | | | T ccc other ccc | | | | T | T | T
In other words, the text
aaa other aaa
will be matched by the OR expression(?=.*aaa|.*bbb)
, or by the XOR expression(?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
, or by the NAND expression(?!(?=.*aaa)(?=.*bbb))
. (Similar interpretation for the other rows in the truth table.)For a more concrete example,
I baked a cake. The Brits call some cookies "biscuits". I ate a cookie. The fridge contains one cake and one cookie.
With the expression
(?-s)(?:(?=.*cake|.*cookie)(?!.*biscuit))^.*$
, says “(cake
ORcookie
) AND NOTbiscuit
”, which was my first example of what you might want to implement. In this text, the first, third, and fourth lines would match, but the second line would not match (because it containsbiscuit
). This also give an example of how to combine multiple of the logic conditions together in one expressionAnd @Alan-Kilborn came up with the following example:
Say you want to match some particular combination ofBob
andTed
on a line – here’s examples on how to do that:Logic Expression to use Match entire line when… OR (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z)
Bob
orTed
(or both) is present, in either orderAND (?-s)(?:(?=.*Bob)(?=.*Ted))^.*(?:\R|\z)
both Bob
andTed
are present, in either orderXOR (?-s)(?:(?=.*Bob)(?!.*Ted)|(?!.*Bob)(?=.*Ted))^.*(?:\R|\z)
Bob
orTed
is present, but not when both are presentNOR-1 (?-s)(?:(?!.*Bob)(?!.*Ted))^.*(?:\R|\z)
neither Bob
/Ted
are present (form 1)NOR-2 (?-s)(?:(?!.*(Bob|Ted)))^.*(?:\R|\z)
neither Bob
/Ted
are present (form 2)NAND (?-s)(?:(?!(?=.*Bob)(?=.*Ted)))^.*(?:\R|\z)
neither are present or one is present, but not when both are present References
Originally developed in developing generic regex sequences and subsequent discussion
For other generic expressions, see FAQ Desk: Generic Regular Expression (regex) Formulas
-
-
@peterjones ,
Thank you for this synopsis. This will take a little while to digest, but at least there is a succinct description we neophytes can come and learn the way you explain things in detail. Thanks again. -
-
-
-
-