Community
    • Login

    best way to stop matching of escaped parentheses/brackets in regular expression.

    Scheduled Pinned Locked Moved General Discussion
    regexlexerbrackets
    7 Posts 6 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mark OlsonM
      Mark Olson
      last edited by

      As I’ve noted elsewhere, I really like the fact that NPP does bracket matching in strings and regexes that is separate from the bracket matching outside strings and regexes.
      However, I’m less happy about the fact that escaped parentheses and brackets are still matched. Obviously I recognize that this would be a rather irritating feature to implement and would presumably need to be implemented on a per-lexer basis, but does anyone have thoughts about the best way to achieve this?

      I will probably submit a feature request in Scintilla or Lexilla, but I was hoping some of the veterans here could give their thoughts.

      cb64e333-4e7a-416d-afc5-0237674a0c02-image.png

      EkopalypseE Mugal AukhanM 2 Replies Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse @Mark Olson
        last edited by

        @Mark-Olson

        I think that bracket matching should only work on the grammar
        level of the main language, but yes, that would mean that every
        lexer would have to implement this. Of course, only if a grammar defines it.
        But that would also mean that in the above example the match group would not be highlighted.
        If this would be wanted, then the lexer has the problem that it
        must understand in which context a bracket is to be understood.
        What if a language allows to introduce a DSL (domain specific language) that brings its own syntax?
        The more I think about it the more difficult a general solution seems … but,
        then someone comes along, has a simple solution and gives the others the lie …
        A lot of text to say … no idea if this is easily possible.

        1 Reply Last reply Reply Quote 2
        • guy038G
          guy038
          last edited by guy038

          Hello, @mark-olson, @ekopalypse and All,

          There are a solution with recursive regexes ! Here are 3 kinds of recursive regex which do find paired groups of NON-escaped parentheses !

          • The regex A looks, from cursor position, for the greatest group of NON-escaped paired parentheses

          • The regex B looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses

          • The regex C looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses

          • Regex A : (?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A

          • Regex B : (?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B

          • Regex C : (?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C

          Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !


          To test these regexes,:

          • Paste the text below in a new tab

          • Put the cursor , on the last line, right before the word This

          • Run, successively, the regexes A, B and C

          C     -------------------------------------------------------------------------- -----------------------------------------------------
          B     ------------------------|------------------------------------------------- -----------------------------------------------------
          A         --------------       --------------------------------------                     ---------------------------------------
               x    1 2        1 0       1            2                     1 0           x         1 2       1                           0
          This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \)  ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
          

          In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :

          This 
          ( is (
           ( a very ) )
           small ( tes
           t \( to 
           ( verify \( 
          if 
          \)
           all ) ) these \)
            ( regexes ( 
           ( match ) NON-
          escaped \)
           parentheses
          ) ONLY
          

          The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :

          these \
          )  ( regexes
          

          Best Regards,

          guy038

          P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !

          Mark OlsonM lossmark70L 2 Replies Last reply Reply Quote 3
          • Mark OlsonM
            Mark Olson @guy038
            last edited by

            @guy038
            Thanks! That’s a really cool solution. I think that Regex A will probably be most useful to me, so I bound it to a macro. This doesn’t really replace a lexer feature like I described, but it will still be helpful for sure.

            1 Reply Last reply Reply Quote 2
            • lossmark70L
              lossmark70 @guy038
              last edited by

              @guy038 said in best way to stop matching of escaped parentheses/brackets in regular expression.:

              Hello, @mark-olson, @ekopalypse and All,

              There are a solution with recursive regexes ! Here are 3 kinds of recursive regex which do find paired groups of NON-escaped parentheses !

              • The regex A looks, from cursor position, for the greatest group of NON-escaped paired parentheses

              • The regex B looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses

              • The regex C looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses

              • Regex A : (?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A

              • Regex B : (?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B

              • Regex C : (?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C

              Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !


              To test these regexes,:

              • Paste the text below in a new tab

              • Put the cursor , on the last line, right before the word This

              • Run, successively, the regexes A, B and C

              C     -------------------------------------------------------------------------- -----------------------------------------------------
              B     ------------------------|------------------------------------------------- -----------------------------------------------------
              A         --------------       --------------------------------------                     ---------------------------------------
                   x    1 2        1 0       1            2                     1 0           x         1 2       1                           0
              This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \)  ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
              

              In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :

              This 
              ( is (
               ( a very ) )
               small ( tes
               t \( to 
               ( verify \( 
              if 
              \)
               all ) ) these \)
                ( regexes ( 
               ( match ) NON-
              escaped \)
               parentheses
              ) ONLY
              

              The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :

              these \
              )  ( regexes
              

              Best Regards,

              guy038

              P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !

              thanks for the awesome information.

              mark LossM 1 Reply Last reply Reply Quote 0
              • mark LossM
                mark Loss @lossmark70
                last edited by

                This post is deleted!
                1 Reply Last reply Reply Quote 0
                • Mugal AukhanM
                  Mugal Aukhan @Mark Olson
                  last edited by

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors