Developing generic regex sequences



  • @PeterJones said in Developing generic regex sequences:

    • FIND = (?i-s)\A(^(?!DNM).*$\R*)+\z

    Per @guy038’s improvement in the other thread, the $ shouldn’t really be outside of the DNM, so let’s just rework the generic to

    • FIND = (?i-s)\A(^(?!DNM).*\R?)+\z


  • I had been thinking about this one for a while; I thought I had mentioned at least the OR vs AND in another recent discussion, but “OR” and “AND” are hard words to search for, so I couldn’t find it. ;-)

    This recent discussion gave me the impetus to flesh it out into a generic table.

    Logic Gates for Regular Expressions

    OR      (?=.*aaa|.*bbb)
    AND     (?=.*aaa)(?=.*bbb)
    XOR     (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
    NOR     (?!.*aaa)(?!.*bbb)
    NOR     (?!.*(aaa|bbb))
    NAND    (?!(?=.*aaa)(?=.*bbb))
    

    Depending on . matches newline setting, these expressions either mean “this line” matches or “this file” matches

    For example, if you wrap each of those inside (?-s)(×××)^.*$ (ie, use the expression above instead of ×××), it will select each of the lines marked T (TRUE), depending on which expression you use

    text          | OR | AND | XOR | NOR | NOR | NAND
    --------------|----+-----+-----+-----+-----+------
    aaa other aaa | T  |     | T   |     |     | T
    aaa other bbb | T  | T   |     |     |     |
    aaa other ccc | T  |     | T   |     |     | T
    bbb other aaa | T  | T   |     |     |     | 
    bbb other bbb | T  |     | T   |     |     | T
    bbb other ccc | T  |     | T   |     |     | T
    ccc other aaa | T  |     | T   |     |     | T
    ccc other bbb | T  |     | T   |     |     | T
    ccc other ccc |    |     |     | T   | T   | T
    

    Similarly, wrapped as (?s)\A^(×××). it will select/match the first character of every file that matches the logic expression



  • @PeterJones

    Hmm, I’m disturbed that some of your aaa in the first code block appears in italics – how does this happen in a code block?

    Usually we see it if someone tries to do regular expressions without a code block, then the * turn some parts of the text into italics.



  • @Alan-Kilborn said in Developing generic regex sequences:

    I’m disturbed that some of your aaa in the first code block appears in italics – how does this happen in a code block?

    It appears NodeBB isn’t treating all code blocks the same. But while it was italicizing, it fortunately wasn’t taking any characters away, so those were the expressions I meant to convey.

    Giving the explicit txt filetype for the block:

    OR      (?=.*aaa|.*bbb)
    AND     (?=.*aaa)(?=.*bbb)
    XOR     (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
    NOR     (?!.*aaa)(?!.*bbb)
    NOR     (?!.*(aaa|bbb))
    NAND    (?!(?=.*aaa)(?=.*bbb))
    


  • Updating with n-term rather than just two-term:

    logic two-term expression n-term expression notes
    OR (?=.*aaa|.*bbb) (?=.*aaa|.*bbb|...|.*nnn) must match at least one
    AND (?=.*aaa)(?=.*bbb) (?=.*aaa)(?=.*bbb)...(?=.*nnn) must match all
    XOR (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb) too complicated match one or the other, but not both
    NOR (?!.*aaa)(?!.*bbb) (?!.*aaa)(?!.*bbb)...(?!.*nnn) matches neither one nor the other
    NOR (?!.*(aaa|bbb)) (?!.*(aaa|bbb|...|nnn)) second syntax for the same concept
    NAND (?!(?=.*aaa)(?=.*bbb)) (?!(?=.*aaa)(?=.*bbb)...(?=.*nnn)) may match zero or one of the terms, but not both


  • Hi @peterjones, @alan-kilborn and All,

    I gave a similar answer to @vijay-s ( refer here ), but Peter BRILLIANTLY beat me at it and gives us a complete panel of the look-aheads to use in order to simulate the main logical combinations !


    Now, Peter, I think it would be worth, in the general case, to add a ^ anchor, right in front of all these formulas !

    Best Regards,

    guy038



  • @guy038 said in Developing generic regex sequences:

    Now, Peter, I think it would be worth, in the general case, to add a ^ anchor, right in front of all these formulas !

    The logic itself is independent of what you anchor it in. My two usage examples in my first post about “Logic Gates for Regular Expressions” show that you can use these standalone generic anchored either per-line or per-file depending on what you wrap around them. With generic, you could even stick these after some other match on the line, saying “after some prefix, match aaa or bbb” or similar. Hence, I didn’t want to specify the anchors in my generic expressions.



  • @PeterJones said in Developing generic regex sequences:

    Updating with n-term rather than just two-term:

    Nice use of a table in a posting here, as well. :-)
    Seriously, valuable information here. Kudos.



  • So as I often do, I dug in a bit deeper to what Peter presented.
    My conclusion is that pointing novices at regular expressions here and expecting them to solve their own related problems may not be super-successful.
    It isn’t that all the needed info isn’t here – it is – it just may require some base knowledge to be applicable, without readers saying “Huh?”.

    So maybe some really concrete examples help. In that light, my contribution will be how to match entire lines meeting the logic criteria that Peter brought to the table.

    Say you want to match some particular combination of Bob and Ted on a line – here’s information on doing that:

    Logic Expression to use Match entire line when…
    OR (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z) Bob or Ted (or both) is present, in either order
    AND (?-s)(?:(?=.*Bob)(?=.*Ted))^.*(?:\R|\z) both Bob and Ted are present, in either order
    XOR (?-s)(?:(?=.*Bob)(?!.*Ted)|(?!.*Bob)(?=.*Ted))^.*(?:\R|\z) Bob or Ted is present, but not when both are present
    NOR-1 (?-s)(?:(?!.*Bob)(?!.*Ted))^.*(?:\R|\z) neither Bob/Ted are present (form 1)
    NOR-2 (?-s)(?:(?!.*(Bob|Ted)))^.*(?:\R|\z) neither Bob/Ted are present (form 2)
    NAND (?-s)(?:(?!(?=.*Bob)(?=.*Ted)))^.*(?:\R|\z) neither are present or one is present, but not when both are present

    I took a little liberty with Peter’s original “notes” table column; changed it up a bit. Also, obviously I only did a “two term” example.

    Maybe I’m off-base and this doesn’t provide additional insight on exactly how to use Peter’s info, but hopefully it does.



  • @Alan-Kilborn said in Developing generic regex sequences:

    My conclusion is that pointing novices at regular expressions here and expecting them to solve their own related problems may not be super-successful.

    That’s why I posted here, rather than separately. This thread is for “developing” the generic expressions, with lots of back and forth. The “final version” will be published to its own separate thread. (I probably shouldn’t’ve posted a link back to here from the inspiration thread, because this one wasn’t ready yet)

    I think your table is a good practical example of how to use it.


Log in to reply