Community
    • Login

    Generic Regex: Logic Gates for Regular Expressions

    Scheduled Pinned Locked Moved Blogs
    regexgeneric
    2 Posts 2 Posters 2.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones
      last edited by PeterJones

      There are times that you want specific logic in your regular expression, to handle combining two or more conditions in certain ways. For example, you might want to match only lines that contain cake or cookie, but that don’t also contain biscuit; logically, this would be expressed as “match (cake OR cookie) AND (NOT biscuit)”.

      The following expressions use positive lookaheads for saying “it must match” and negative lookaheads for saying “it must not match” (logical “NOT”). By using lookaheads with .* at the start, it means that it can be found anywhere on the line (or file) in any order. But it also doesn’t consume anything, so alone, these expressions are “zero width”.

      If ☐ . matches newline is not checked (☐), or if (?-s) is enabled in your regex, then those expressions will mean “a single line matches the logic expression chosen”. if ☑ . matches newline is checked (☑) (or (?s) is enabled in your regex), then those expressions will mean “the whole file matches the logic expression chosen”.

      All these expresions require Search Mode = ☑ Regular Expression to work.

      logic two-term expression n-term expression notes
      OR (?=.*aaa|.*bbb) (?=.*aaa|.*bbb|...|.*nnn) must match at least one
      AND (?=.*aaa)(?=.*bbb) (?=.*aaa)(?=.*bbb)...(?=.*nnn) must match all
      XOR (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb) too complicated match one or the other, but not both
      NOR (?!.*aaa)(?!.*bbb) (?!.*aaa)(?!.*bbb)...(?!.*nnn) matches neither one nor the other
      NOR (?!.*(aaa|bbb)) (?!.*(aaa|bbb|...|nnn)) second syntax for the same concept
      NAND (?!(?=.*aaa)(?=.*bbb)) (?!(?=.*aaa)(?=.*bbb)...(?=.*nnn)) may match zero or one of the terms, but not both

      In case it’s not obvious, aaa and bbb and nnn are meant as placeholders for the terms or regex sub-expressions that you’re actually trying to combine.

      If you actually want to select the whole line, you cannot just use the zero-width forms above. Instead, you would use (?-s)(?:×××)^.*$ (where the ××× is the two-term or n-term expression): the (?-s) will force it to contain the match on a single-line (by making the . wildcard not match newlines), the (?:×××) will add the logic expression that you derived from the table, and the ^.*$ will cause it to actually match the whole line, rather than the zero-width sequence at the start of the line.

      Similarly, if you want to Find in Files to figure out which files match the logic expression, you can use (?s)\A^(×××)., where (?s) will make . match newlines, so the .* in each of the logic terms can match anywhere in the file; then the \A will match only at the start of the file, the (×××) will do the logic on the whole file; and the . will mean that it actually marks/matches the first character in each of the files it finds (which will make it actually report something in the Find Results after running Find in Files).

      Examples

      With the (?-s)(?:×××)^.*$ given above, if you chose the “two-term expression” and had the literal aaa and bbb as your search terms, the following “truth table” will show a T on the conditions that would match:

      text          | OR | AND | XOR | NOR | NOR | NAND
      --------------|----+-----+-----+-----+-----+------
      aaa other aaa | T  |     | T   |     |     | T
      aaa other bbb | T  | T   |     |     |     |
      aaa other ccc | T  |     | T   |     |     | T
      bbb other aaa | T  | T   |     |     |     | 
      bbb other bbb | T  |     | T   |     |     | T
      bbb other ccc | T  |     | T   |     |     | T
      ccc other aaa | T  |     | T   |     |     | T
      ccc other bbb | T  |     | T   |     |     | T
      ccc other ccc |    |     |     | T   | T   | T
      

      In other words, the text aaa other aaa will be matched by the OR expression (?=.*aaa|.*bbb), or by the XOR expression (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb), or by the NAND expression (?!(?=.*aaa)(?=.*bbb)). (Similar interpretation for the other rows in the truth table.)

      For a more concrete example,

      I baked a cake.
      The Brits call some cookies "biscuits".
      I ate a cookie.
      The fridge contains one cake and one cookie.
      

      With the expression (?-s)(?:(?=.*cake|.*cookie)(?!.*biscuit))^.*$, says “(cake OR cookie) AND NOT biscuit”, which was my first example of what you might want to implement. In this text, the first, third, and fourth lines would match, but the second line would not match (because it contains biscuit). This also give an example of how to combine multiple of the logic conditions together in one expression

      And @Alan-Kilborn came up with the following example:
      Say you want to match some particular combination of Bob and Ted on a line – here’s examples on how to do that:

      Logic Expression to use Match entire line when…
      OR (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z) Bob or Ted (or both) is present, in either order
      AND (?-s)(?:(?=.*Bob)(?=.*Ted))^.*(?:\R|\z) both Bob and Ted are present, in either order
      XOR (?-s)(?:(?=.*Bob)(?!.*Ted)|(?!.*Bob)(?=.*Ted))^.*(?:\R|\z) Bob or Ted is present, but not when both are present
      NOR-1 (?-s)(?:(?!.*Bob)(?!.*Ted))^.*(?:\R|\z) neither Bob/Ted are present (form 1)
      NOR-2 (?-s)(?:(?!.*(Bob|Ted)))^.*(?:\R|\z) neither Bob/Ted are present (form 2)
      NAND (?-s)(?:(?!(?=.*Bob)(?=.*Ted)))^.*(?:\R|\z) neither are present or one is present, but not when both are present

      References

      Originally developed in developing generic regex sequences and subsequent discussion

      For other generic expressions, see FAQ Desk: Generic Regular Expression (regex) Formulas

      Lycan ThropeL 1 Reply Last reply Reply Quote 2
      • PeterJonesP PeterJones referenced this topic on
      • Lycan ThropeL
        Lycan Thrope @PeterJones
        last edited by

        @peterjones ,
        Thank you for this synopsis. This will take a little while to digest, but at least there is a succinct description we neophytes can come and learn the way you explain things in detail. Thanks again.

        1 Reply Last reply Reply Quote 0
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors