Developing generic regex sequences

  • Regulars Community Forum contributors,

    Let’s use this thread when we want to spin off from a normal question to develop a generic regex sequence.

    After going through some debug and back-and-forth here, a “truly generic” regex can be spun off into its own blog post, where it can be described, and an example can be shown of how to use the generic to get specific results => this spun-off blog could then have a format similar to @guy038’s generic “change matching data between a given starting and ending sequence”, but standalone rather than a part of a thread.

  • As my first, derived from this recent post

    Find a file that never contains a line that matches a specific expression

    • FIND = (?i-s)\A(^(?!DNM).*$\R*)+\z
    • DNM means “does not match”, and is any regular expression which should not be found as a complete line anywhere in the file

    As far as I could tell (in that other thread), this would work to look for a whole line that matches DNM and then fail the search because of it.

    For my small tests, this formula worked. But I am sure there are gotchas I’ve missed.

    For example, I haven’t tried, but I bet that larger files will not work, because of how much memory gets gobbled by this expression. Is there a way to improve this so that it becomes large-file agnostic?

  • @PeterJones said in Developing generic regex sequences:

    can be spun off into its own blog post

    or FAQ Desk entry, or post in a FAQ desk entry… however we end up wanting to structure the “publish the generic regex sequence” entries.

    My thoughts on that:

    • I don’t want yet another category in the forum. Either Blogs or FAQ Desk should be sufficient
    • If it’s in Blogs, I would lobby for each generic sequence to get its own Topic
    • If it’s in FAQ Desk,
      • I could accept a single Topic with multiple Replies, one reply per description. But that would quickly get overwhelming, I think.
      • But even better, I think is a single Topic with a link to each of the individual blogs; after someone posts a generic expression Blog, then Guy or another moderator could edit the FAQ Desk entry to point to the new Blog entry

  • Yes, I agree that a blog that is a kind of index for the other, own blogs, is the most suitable for this.

  • @PeterJones said in Developing generic regex sequences:

    • FIND = (?i-s)\A(^(?!DNM).*$\R*)+\z

    Per @guy038’s improvement in the other thread, the $ shouldn’t really be outside of the DNM, so let’s just rework the generic to

    • FIND = (?i-s)\A(^(?!DNM).*\R?)+\z

Log in to reply