• Login
Community
  • Login

Developing generic regex sequences

Scheduled Pinned Locked Moved Blogs
22 Posts 6 Posters 6.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    PeterJones
    last edited by Mar 25, 2021, 8:46 PM

    Regulars Community Forum contributors,

    Let’s use this thread when we want to spin off from a normal question to develop a generic regex sequence.

    After going through some debug and back-and-forth here, a “truly generic” regex can be spun off into its own blog post, where it can be described, and an example can be shown of how to use the generic to get specific results => this spun-off blog could then have a format similar to @guy038’s generic “change matching data between a given starting and ending sequence”, but standalone rather than a part of a thread.

    P 3 Replies Last reply Mar 25, 2021, 8:51 PM Reply Quote 3
    • P
      PeterJones @PeterJones
      last edited by Mar 25, 2021, 8:51 PM

      As my first, derived from this recent post

      Find a file that never contains a line that matches a specific expression

      • FIND = (?i-s)\A(^(?!DNM).*$\R*)+\z
      • DNM means “does not match”, and is any regular expression which should not be found as a complete line anywhere in the file

      As far as I could tell (in that other thread), this would work to look for a whole line that matches DNM and then fail the search because of it.

      For my small tests, this formula worked. But I am sure there are gotchas I’ve missed.

      For example, I haven’t tried, but I bet that larger files will not work, because of how much memory gets gobbled by this expression. Is there a way to improve this so that it becomes large-file agnostic?

      P 1 Reply Last reply Mar 26, 2021, 5:35 PM Reply Quote 3
      • P
        PeterJones @PeterJones
        last edited by Mar 25, 2021, 8:55 PM

        @PeterJones said in Developing generic regex sequences:

        can be spun off into its own blog post

        or FAQ Desk entry, or post in a FAQ desk entry… however we end up wanting to structure the “publish the generic regex sequence” entries.

        My thoughts on that:

        • I don’t want yet another category in the forum. Either Blogs or FAQ Desk should be sufficient
        • If it’s in Blogs, I would lobby for each generic sequence to get its own Topic
        • If it’s in FAQ Desk,
          • I could accept a single Topic with multiple Replies, one reply per description. But that would quickly get overwhelming, I think.
          • But even better, I think is a single Topic with a link to each of the individual blogs; after someone posts a generic expression Blog, then Guy or another moderator could edit the FAQ Desk entry to point to the new Blog entry
        1 Reply Last reply Reply Quote 0
        • E
          Ekopalypse
          last edited by Mar 26, 2021, 11:15 AM

          Yes, I agree that a blog that is a kind of index for the other, own blogs, is the most suitable for this.

          1 Reply Last reply Reply Quote 1
          • P
            PeterJones @PeterJones
            last edited by PeterJones Mar 26, 2021, 5:35 PM Mar 26, 2021, 5:35 PM

            @PeterJones said in Developing generic regex sequences:

            • FIND = (?i-s)\A(^(?!DNM).*$\R*)+\z

            Per @guy038’s improvement in the other thread, the $ shouldn’t really be outside of the DNM, so let’s just rework the generic to

            • FIND = (?i-s)\A(^(?!DNM).*\R?)+\z
            P 1 Reply Last reply Jun 17, 2021, 6:50 PM Reply Quote 0
            • P
              PeterJones @PeterJones
              last edited by PeterJones Jun 17, 2021, 6:52 PM Jun 17, 2021, 6:50 PM

              I had been thinking about this one for a while; I thought I had mentioned at least the OR vs AND in another recent discussion, but “OR” and “AND” are hard words to search for, so I couldn’t find it. ;-)

              This recent discussion gave me the impetus to flesh it out into a generic table.

              Logic Gates for Regular Expressions

              OR      (?=.*aaa|.*bbb)
              AND     (?=.*aaa)(?=.*bbb)
              XOR     (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
              NOR     (?!.*aaa)(?!.*bbb)
              NOR     (?!.*(aaa|bbb))
              NAND    (?!(?=.*aaa)(?=.*bbb))
              

              Depending on . matches newline setting, these expressions either mean “this line” matches or “this file” matches

              For example, if you wrap each of those inside (?-s)(×××)^.*$ (ie, use the expression above instead of ×××), it will select each of the lines marked T (TRUE), depending on which expression you use

              text          | OR | AND | XOR | NOR | NOR | NAND
              --------------|----+-----+-----+-----+-----+------
              aaa other aaa | T  |     | T   |     |     | T
              aaa other bbb | T  | T   |     |     |     |
              aaa other ccc | T  |     | T   |     |     | T
              bbb other aaa | T  | T   |     |     |     | 
              bbb other bbb | T  |     | T   |     |     | T
              bbb other ccc | T  |     | T   |     |     | T
              ccc other aaa | T  |     | T   |     |     | T
              ccc other bbb | T  |     | T   |     |     | T
              ccc other ccc |    |     |     | T   | T   | T
              

              Similarly, wrapped as (?s)\A^(×××). it will select/match the first character of every file that matches the logic expression

              A 1 Reply Last reply Jun 17, 2021, 7:38 PM Reply Quote 1
              • A
                Alan Kilborn @PeterJones
                last edited by Jun 17, 2021, 7:38 PM

                @PeterJones

                Hmm, I’m disturbed that some of your aaa in the first code block appears in italics – how does this happen in a code block?

                Usually we see it if someone tries to do regular expressions without a code block, then the * turn some parts of the text into italics.

                P 1 Reply Last reply Jun 17, 2021, 7:43 PM Reply Quote 0
                • P
                  PeterJones @Alan Kilborn
                  last edited by Jun 17, 2021, 7:43 PM

                  @Alan-Kilborn said in Developing generic regex sequences:

                  I’m disturbed that some of your aaa in the first code block appears in italics – how does this happen in a code block?

                  It appears NodeBB isn’t treating all code blocks the same. But while it was italicizing, it fortunately wasn’t taking any characters away, so those were the expressions I meant to convey.

                  Giving the explicit txt filetype for the block:

                  OR      (?=.*aaa|.*bbb)
                  AND     (?=.*aaa)(?=.*bbb)
                  XOR     (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb)
                  NOR     (?!.*aaa)(?!.*bbb)
                  NOR     (?!.*(aaa|bbb))
                  NAND    (?!(?=.*aaa)(?=.*bbb))
                  
                  P 1 Reply Last reply Jun 18, 2021, 4:24 PM Reply Quote 1
                  • P
                    PeterJones @PeterJones
                    last edited by Jun 18, 2021, 4:24 PM

                    Updating with n-term rather than just two-term:

                    logic two-term expression n-term expression notes
                    OR (?=.*aaa|.*bbb) (?=.*aaa|.*bbb|...|.*nnn) must match at least one
                    AND (?=.*aaa)(?=.*bbb) (?=.*aaa)(?=.*bbb)...(?=.*nnn) must match all
                    XOR (?=.*aaa)(?!.*bbb)|(?!.*aaa)(?=.*bbb) too complicated match one or the other, but not both
                    NOR (?!.*aaa)(?!.*bbb) (?!.*aaa)(?!.*bbb)...(?!.*nnn) matches neither one nor the other
                    NOR (?!.*(aaa|bbb)) (?!.*(aaa|bbb|...|nnn)) second syntax for the same concept
                    NAND (?!(?=.*aaa)(?=.*bbb)) (?!(?=.*aaa)(?=.*bbb)...(?=.*nnn)) may match zero or one of the terms, but not both
                    A 1 Reply Last reply Jun 18, 2021, 6:02 PM Reply Quote 4
                    • G
                      guy038
                      last edited by Jun 18, 2021, 4:39 PM

                      Hi @peterjones, @alan-kilborn and All,

                      I gave a similar answer to @vijay-s ( refer here ), but Peter BRILLIANTLY beat me at it and gives us a complete panel of the look-aheads to use in order to simulate the main logical combinations !


                      Now, Peter, I think it would be worth, in the general case, to add a ^ anchor, right in front of all these formulas !

                      Best Regards,

                      guy038

                      P 1 Reply Last reply Jun 18, 2021, 4:48 PM Reply Quote 1
                      • P
                        PeterJones @guy038
                        last edited by Jun 18, 2021, 4:48 PM

                        @guy038 said in Developing generic regex sequences:

                        Now, Peter, I think it would be worth, in the general case, to add a ^ anchor, right in front of all these formulas !

                        The logic itself is independent of what you anchor it in. My two usage examples in my first post about “Logic Gates for Regular Expressions” show that you can use these standalone generic anchored either per-line or per-file depending on what you wrap around them. With generic, you could even stick these after some other match on the line, saying “after some prefix, match aaa or bbb” or similar. Hence, I didn’t want to specify the anchors in my generic expressions.

                        1 Reply Last reply Reply Quote 2
                        • A
                          Alan Kilborn @PeterJones
                          last edited by Jun 18, 2021, 6:02 PM

                          @PeterJones said in Developing generic regex sequences:

                          Updating with n-term rather than just two-term:

                          Nice use of a table in a posting here, as well. :-)
                          Seriously, valuable information here. Kudos.

                          1 Reply Last reply Reply Quote 0
                          • A
                            Alan Kilborn
                            last edited by Alan Kilborn Jun 18, 2021, 8:36 PM Jun 18, 2021, 8:34 PM

                            So as I often do, I dug in a bit deeper to what Peter presented.
                            My conclusion is that pointing novices at regular expressions here and expecting them to solve their own related problems may not be super-successful.
                            It isn’t that all the needed info isn’t here – it is – it just may require some base knowledge to be applicable, without readers saying “Huh?”.

                            So maybe some really concrete examples help. In that light, my contribution will be how to match entire lines meeting the logic criteria that Peter brought to the table.

                            Say you want to match some particular combination of Bob and Ted on a line – here’s information on doing that:

                            Logic Expression to use Match entire line when…
                            OR (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z) Bob or Ted (or both) is present, in either order
                            AND (?-s)(?:(?=.*Bob)(?=.*Ted))^.*(?:\R|\z) both Bob and Ted are present, in either order
                            XOR (?-s)(?:(?=.*Bob)(?!.*Ted)|(?!.*Bob)(?=.*Ted))^.*(?:\R|\z) Bob or Ted is present, but not when both are present
                            NOR-1 (?-s)(?:(?!.*Bob)(?!.*Ted))^.*(?:\R|\z) neither Bob/Ted are present (form 1)
                            NOR-2 (?-s)(?:(?!.*(Bob|Ted)))^.*(?:\R|\z) neither Bob/Ted are present (form 2)
                            NAND (?-s)(?:(?!(?=.*Bob)(?=.*Ted)))^.*(?:\R|\z) neither are present or one is present, but not when both are present

                            I took a little liberty with Peter’s original “notes” table column; changed it up a bit. Also, obviously I only did a “two term” example.

                            Maybe I’m off-base and this doesn’t provide additional insight on exactly how to use Peter’s info, but hopefully it does.

                            P 1 Reply Last reply Jun 18, 2021, 8:39 PM Reply Quote 3
                            • P
                              PeterJones @Alan Kilborn
                              last edited by Jun 18, 2021, 8:39 PM

                              @Alan-Kilborn said in Developing generic regex sequences:

                              My conclusion is that pointing novices at regular expressions here and expecting them to solve their own related problems may not be super-successful.

                              That’s why I posted here, rather than separately. This thread is for “developing” the generic expressions, with lots of back and forth. The “final version” will be published to its own separate thread. (I probably shouldn’t’ve posted a link back to here from the inspiration thread, because this one wasn’t ready yet)

                              I think your table is a good practical example of how to use it.

                              P 1 Reply Last reply Mar 4, 2022, 2:32 PM Reply Quote 2
                              • A
                                Alan Kilborn
                                last edited by Alan Kilborn Oct 4, 2021, 3:25 PM Oct 4, 2021, 3:24 PM

                                So a note on “practicality” here…
                                Recently I had cause to implement some “OR” searches as described above.
                                I pulled up this thread for the “formula”, put my specific use-case data in, and pressed Find All in Current Document, and, well, …, waited, a loooong time for results to come back.
                                It turns out that the regexes specified above are fine for “small” data, but are rather inefficient for “bigger” data, or at least the size/type of data I had.

                                Here’s an example:
                                The original “match entire line OR regex” above is (?-s)(?:(?=.*Bob|.*Ted))^.*(?:\R|\z)
                                For my data, that one took between one and two minutes to run.
                                If I change the regex to (?-s)^(?=.*?(?:Bob|Ted)).+, that one runs so quickly that it is hard to time, except to say maybe it takes a second or so.

                                Probably all of the regexes I presented in my table above could be better optimized. :-(

                                1 Reply Last reply Reply Quote 4
                                • P PeterJones referenced this topic on Mar 4, 2022, 2:04 PM
                                • P PeterJones referenced this topic on Mar 4, 2022, 2:28 PM
                                • P PeterJones referenced this topic on Mar 4, 2022, 2:29 PM
                                • P
                                  PeterJones @PeterJones
                                  last edited by Mar 4, 2022, 2:32 PM

                                  A year later, I finally got around to making the “table of contents” post in the FAQ: “FAQ Desk: Generic Regular Expresion (regex) Formulas”

                                  For now, it’s linking to the in-thread versions of these generic expressions… but I highly encourage the developers of the expression to spin off a new blog post for each generic regex.

                                  1 Reply Last reply Reply Quote 1
                                  • P PeterJones referenced this topic on Mar 4, 2022, 2:33 PM
                                  • G guy038 referenced this topic on Mar 5, 2022, 11:10 AM
                                  • G guy038 referenced this topic on Mar 5, 2022, 11:11 AM
                                  • P PeterJones referenced this topic on Mar 5, 2022, 8:40 PM
                                  • P PeterJones referenced this topic on Mar 5, 2022, 8:45 PM
                                  • P PeterJones referenced this topic on Mar 5, 2022, 8:46 PM
                                  • P PeterJones referenced this topic on Mar 9, 2022, 12:48 AM
                                  • P PeterJones referenced this topic on May 27, 2022, 1:07 PM
                                  • P PeterJones referenced this topic on May 27, 2022, 1:08 PM
                                  • C BaccaC
                                    C Bacca
                                    last edited by Jun 4, 2022, 1:44 PM

                                    Hi all,
                                    I’m wondering if this needs to be added to the Regex FAQ or another thread. @PeterJones @guy038

                                    There are some good regex tester sites out there. Here’s one and a search for others. It really helps in debugging regular expressions. Generally you add in test data and the regex, and the site will highlight the strings it matches.

                                    Build and test regular expressions regex. Make a free account here to save your regexes. https://regex101.com/
                                    Search for more: https://search.brave.com/search?q=free+account+test+regular+expression&source=web

                                    I hope this is helpful!

                                    C BaccaC P A 3 Replies Last reply Jun 4, 2022, 1:52 PM Reply Quote 0
                                    • C BaccaC
                                      C Bacca @C Bacca
                                      last edited by Jun 4, 2022, 1:52 PM

                                      @c-bacca Ok this search is case sensitive.

                                      Also, here’s an example with example data. https://regex101.com/r/Hfly86/1

                                      1 Reply Last reply Reply Quote 0
                                      • P
                                        PeterJones @C Bacca
                                        last edited by Jun 4, 2022, 2:21 PM

                                        @c-bacca said in Developing generic regex sequences:

                                        I’m wondering if this needs to be added to the Regex FAQ or another thread.

                                        Why would it need to be added where it already exists?

                                        Or were you not aware that we have two different regex FAQ entries?

                                        The second was the one discussed in this original topic: a table of contents of “generic” regex at https://community.notepad-plus-plus.org/topic/22673/faq-desk-generic-regular-expression-regex-formulas

                                        But the much earlier regex FAQ explains where to get regex help, including links to the Notepad++ regex documentation, plus links to a lot more “regex tester” sites than you mentioned: https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regular-expressions-regex-documentation

                                        1 Reply Last reply Reply Quote 1
                                        • A
                                          Alan Kilborn @C Bacca
                                          last edited by Jun 4, 2022, 7:28 PM

                                          @c-bacca said in Developing generic regex sequences:

                                          There are some good regex tester sites out there. Here’s one and a search for others. It really helps in debugging regular expressions. Generally you add in test data and the regex, and the site will highlight the strings it matches.
                                          Build and test regular expressions regex. Make a free account here to save your regexes. https://regex101.com/

                                          I suppose, but I don’t believe it uses the same regular expression engine as Notepad++, so it is of limited usefulness if you are going to use Notepad++ to do your regular expression searches and replacements.

                                          Lycan ThropeL 1 Reply Last reply Jun 4, 2022, 8:29 PM Reply Quote 1
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors