• Login
Community
  • Login

best way to stop matching of escaped parentheses/brackets in regular expression.

Scheduled Pinned Locked Moved General Discussion
regexlexerbrackets
7 Posts 6 Posters 1.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    Mark Olson
    last edited by Mar 31, 2023, 11:20 PM

    As I’ve noted elsewhere, I really like the fact that NPP does bracket matching in strings and regexes that is separate from the bracket matching outside strings and regexes.
    However, I’m less happy about the fact that escaped parentheses and brackets are still matched. Obviously I recognize that this would be a rather irritating feature to implement and would presumably need to be implemented on a per-lexer basis, but does anyone have thoughts about the best way to achieve this?

    I will probably submit a feature request in Scintilla or Lexilla, but I was hoping some of the veterans here could give their thoughts.

    cb64e333-4e7a-416d-afc5-0237674a0c02-image.png

    E M 2 Replies Last reply Apr 1, 2023, 8:47 AM Reply Quote 0
    • E
      Ekopalypse @Mark Olson
      last edited by Apr 1, 2023, 8:47 AM

      @Mark-Olson

      I think that bracket matching should only work on the grammar
      level of the main language, but yes, that would mean that every
      lexer would have to implement this. Of course, only if a grammar defines it.
      But that would also mean that in the above example the match group would not be highlighted.
      If this would be wanted, then the lexer has the problem that it
      must understand in which context a bracket is to be understood.
      What if a language allows to introduce a DSL (domain specific language) that brings its own syntax?
      The more I think about it the more difficult a general solution seems … but,
      then someone comes along, has a simple solution and gives the others the lie …
      A lot of text to say … no idea if this is easily possible.

      1 Reply Last reply Reply Quote 2
      • G
        guy038
        last edited by guy038 Apr 2, 2023, 12:37 PM Apr 1, 2023, 5:58 PM

        Hello, @mark-olson, @ekopalypse and All,

        There are a solution with recursive regexes ! Here are 3 kinds of recursive regex which do find paired groups of NON-escaped parentheses !

        • The regex A looks, from cursor position, for the greatest group of NON-escaped paired parentheses

        • The regex B looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses

        • The regex C looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses

        • Regex A : (?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A

        • Regex B : (?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B

        • Regex C : (?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C

        Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !


        To test these regexes,:

        • Paste the text below in a new tab

        • Put the cursor , on the last line, right before the word This

        • Run, successively, the regexes A, B and C

        C     -------------------------------------------------------------------------- -----------------------------------------------------
        B     ------------------------|------------------------------------------------- -----------------------------------------------------
        A         --------------       --------------------------------------                     ---------------------------------------
             x    1 2        1 0       1            2                     1 0           x         1 2       1                           0
        This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \)  ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
        

        In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :

        This 
        ( is (
         ( a very ) )
         small ( tes
         t \( to 
         ( verify \( 
        if 
        \)
         all ) ) these \)
          ( regexes ( 
         ( match ) NON-
        escaped \)
         parentheses
        ) ONLY
        

        The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :

        these \
        )  ( regexes
        

        Best Regards,

        guy038

        P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !

        M L 2 Replies Last reply Apr 1, 2023, 6:57 PM Reply Quote 3
        • M
          Mark Olson @guy038
          last edited by Apr 1, 2023, 6:57 PM

          @guy038
          Thanks! That’s a really cool solution. I think that Regex A will probably be most useful to me, so I bound it to a macro. This doesn’t really replace a lexer feature like I described, but it will still be helpful for sure.

          1 Reply Last reply Reply Quote 2
          • L
            lossmark70 @guy038
            last edited by Apr 15, 2023, 11:23 AM

            @guy038 said in best way to stop matching of escaped parentheses/brackets in regular expression.:

            Hello, @mark-olson, @ekopalypse and All,

            There are a solution with recursive regexes ! Here are 3 kinds of recursive regex which do find paired groups of NON-escaped parentheses !

            • The regex A looks, from cursor position, for the greatest group of NON-escaped paired parentheses

            • The regex B looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses

            • The regex C looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses

            • Regex A : (?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A

            • Regex B : (?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B

            • Regex C : (?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C

            Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !


            To test these regexes,:

            • Paste the text below in a new tab

            • Put the cursor , on the last line, right before the word This

            • Run, successively, the regexes A, B and C

            C     -------------------------------------------------------------------------- -----------------------------------------------------
            B     ------------------------|------------------------------------------------- -----------------------------------------------------
            A         --------------       --------------------------------------                     ---------------------------------------
                 x    1 2        1 0       1            2                     1 0           x         1 2       1                           0
            This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \)  ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
            

            In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :

            This 
            ( is (
             ( a very ) )
             small ( tes
             t \( to 
             ( verify \( 
            if 
            \)
             all ) ) these \)
              ( regexes ( 
             ( match ) NON-
            escaped \)
             parentheses
            ) ONLY
            

            The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :

            these \
            )  ( regexes
            

            Best Regards,

            guy038

            P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !

            thanks for the awesome information.

            mark LossM 1 Reply Last reply Apr 24, 2023, 12:39 PM Reply Quote 0
            • mark LossM
              mark Loss @lossmark70
              last edited by Apr 24, 2023, 12:39 PM

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • M
                Mugal Aukhan @Mark Olson
                last edited by Aug 1, 2023, 11:15 AM

                This post is deleted!
                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors