Generic Regex: Logic Gates for Regular Expressions
-
There are times that you want specific logic in your regular expression, to handle combining two or more conditions in certain ways. For example, you might want to match only lines that contain
cakeorcookie, but that don’t also containbiscuit; logically, this would be expressed as “match (cakeORcookie) AND (NOTbiscuit)”.The following expressions use positive lookaheads for saying “it must match” and negative lookaheads for saying “it must not match” (logical “NOT”). By using lookaheads with
.*at the start, it means that it can be found anywhere on the line (or file) in any order. But it also doesn’t consume anything, so alone, these expressions are “zero width”.If
☐ . matches newlineis not checked (☐), or if(?-s)is enabled in your regex, then those expressions will mean “a single line matches the logic expression chosen”. if☑ . matches newlineis checked (☑) (or(?s)is enabled in your regex), then those expressions will mean “the whole file matches the logic expression chosen”.All these expresions require
Search Mode = ☑ Regular Expressionto work.logic two-term expression n-term expression notes OR (?=.*aaa|.*bbb)(?=.*aaa|.*bbb|...|.*nnn)must match at least one AND (?=.*aaa)(?=.*bbb)(?=.*aaa)(?=.*bbb)...(?=.*nnn)must match all XOR (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)too complicated match one or the other, but not both NOR (?!.*aaa)(?!.*bbb)(?!.*aaa)(?!.*bbb)...(?!.*nnn)matches neither one nor the other NOR (?!.*(aaa|bbb))(?!.*(aaa|bbb|...|nnn))second syntax for the same concept NAND (?!(?=.*aaa)(?=.*bbb))(?!(?=.*aaa)(?=.*bbb)...(?=.*nnn))may match zero or one of the terms, but not both In case it’s not obvious,
aaaandbbbandnnnare meant as placeholders for the terms or regex sub-expressions that you’re actually trying to combine.If you actually want to select the whole line, you cannot just use the zero-width forms above. Instead, you would use
(?-s)(?:×××)^.*$(where the ××× is the two-term or n-term expression): the(?-s)will force it to contain the match on a single-line (by making the.wildcard not match newlines), the(?:×××)will add the logic expression that you derived from the table, and the^.*$will cause it to actually match the whole line, rather than the zero-width sequence at the start of the line.Similarly, if you want to Find in Files to figure out which files match the logic expression, you can use
(?s)\A^(×××)., where(?s)will make.match newlines, so the.*in each of the logic terms can match anywhere in the file; then the\Awill match only at the start of the file, the(×××)will do the logic on the whole file; and the.will mean that it actually marks/matches the first character in each of the files it finds (which will make it actually report something in the Find Results after running Find in Files).Examples
With the
(?-s)(?:×××)^.*$given above, if you chose the “two-term expression” and had the literalaaaandbbbas your search terms, the following “truth table” will show a T on the conditions that would match:text | OR | AND | XOR | NOR | NOR | NAND --------------|----+-----+-----+-----+-----+------ aaa other aaa | T | | T | | | T aaa other bbb | T | T | | | | aaa other ccc | T | | T | | | T bbb other aaa | T | T | | | | bbb other bbb | T | | T | | | T bbb other ccc | T | | T | | | T ccc other aaa | T | | T | | | T ccc other bbb | T | | T | | | T ccc other ccc | | | | T | T | TIn other words, the text
aaa other aaawill be matched by the OR expression(?=.*aaa|.*bbb), or by the XOR expression(?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb), or by the NAND expression(?!(?=.*aaa)(?=.*bbb)). (Similar interpretation for the other rows in the truth table.)For a more concrete example,
I baked a cake. The Brits call some cookies "biscuits". I ate a cookie. The fridge contains one cake and one cookie.With the expression
(?-s)(?:(?=.*cake|.*cookie)(?!.*biscuit))^.*$, says “(cakeORcookie) AND NOTbiscuit”, which was my first example of what you might want to implement. In this text, the first, third, and fourth lines would match, but the second line would not match (because it containsbiscuit). This also give an example of how to combine multiple of the logic conditions together in one expressionAnd @Alan-Kilborn came up with the following example:
Say you want to match some particular combination ofBobandTedon a line – here’s examples on how to do that:Logic Expression to use Match entire line when… OR (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z)BoborTed(or both) is present, in either orderAND (?-s)(?:(?=.*Bob)(?=.*Ted))^.*(?:\R|\z)both BobandTedare present, in either orderXOR (?-s)(?:(?=.*Bob)(?!.*Ted)|(?!.*Bob)(?=.*Ted))^.*(?:\R|\z)BoborTedis present, but not when both are presentNOR-1 (?-s)(?:(?!.*Bob)(?!.*Ted))^.*(?:\R|\z)neither Bob/Tedare present (form 1)NOR-2 (?-s)(?:(?!.*(Bob|Ted)))^.*(?:\R|\z)neither Bob/Tedare present (form 2)NAND (?-s)(?:(?!(?=.*Bob)(?=.*Ted)))^.*(?:\R|\z)neither are present or one is present, but not when both are present References
Originally developed in developing generic regex sequences and subsequent discussion
For other generic expressions, see FAQ Desk: Generic Regular Expression (regex) Formulas
-
P PeterJones referenced this topic on
-
@peterjones ,
Thank you for this synopsis. This will take a little while to digest, but at least there is a succinct description we neophytes can come and learn the way you explain things in detail. Thanks again. -
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on