best way to stop matching of escaped parentheses/brackets in regular expression.
-
As I’ve noted elsewhere, I really like the fact that NPP does bracket matching in strings and regexes that is separate from the bracket matching outside strings and regexes.
However, I’m less happy about the fact that escaped parentheses and brackets are still matched. Obviously I recognize that this would be a rather irritating feature to implement and would presumably need to be implemented on a per-lexer basis, but does anyone have thoughts about the best way to achieve this?I will probably submit a feature request in Scintilla or Lexilla, but I was hoping some of the veterans here could give their thoughts.
-
I think that bracket matching should only work on the grammar
level of the main language, but yes, that would mean that every
lexer would have to implement this. Of course, only if a grammar defines it.
But that would also mean that in the above example the match group would not be highlighted.
If this would be wanted, then the lexer has the problem that it
must understand in which context a bracket is to be understood.
What if a language allows to introduce a DSL (domain specific language) that brings its own syntax?
The more I think about it the more difficult a general solution seems … but,
then someone comes along, has a simple solution and gives the others the lie …
A lot of text to say … no idea if this is easily possible. -
Hello, @mark-olson, @ekopalypse and All,
There are a solution with recursive regexes ! Here are
3
kinds of recursive regex which do find paired groups of NON-escaped parentheses !-
The regex
A
looks, from cursor position, for the greatest group of NON-escaped paired parentheses -
The regex
B
looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses -
The regex
C
looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses -
Regex
A
:(?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A
-
Regex
B
:(?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B
-
Regex
C
:(?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C
Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !
To test these regexes,:
-
Paste the text below in a new tab
-
Put the cursor , on the last line, right before the word
This
-
Run, successively, the regexes
A
,B
andC
C -------------------------------------------------------------------------- ----------------------------------------------------- B ------------------------|------------------------------------------------- ----------------------------------------------------- A -------------- -------------------------------------- --------------------------------------- x 1 2 1 0 1 2 1 0 x 1 2 1 0 This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \) ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :
This ( is ( ( a very ) ) small ( tes t \( to ( verify \( if \) all ) ) these \) ( regexes ( ( match ) NON- escaped \) parentheses ) ONLY
The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :
these \ ) ( regexes
Best Regards,
guy038
P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !
-
-
@guy038
Thanks! That’s a really cool solution. I think that Regex A will probably be most useful to me, so I bound it to a macro. This doesn’t really replace a lexer feature like I described, but it will still be helpful for sure. -
@guy038 said in best way to stop matching of escaped parentheses/brackets in regular expression.:
Hello, @mark-olson, @ekopalypse and All,
There are a solution with recursive regexes ! Here are
3
kinds of recursive regex which do find paired groups of NON-escaped parentheses !-
The regex
A
looks, from cursor position, for the greatest group of NON-escaped paired parentheses -
The regex
B
looks, from cursor position, for the greatest group of NON-escaped paired parentheses, surrounded by text different from NON-escaped parentheses -
The regex
C
looks, from cursor position, for the greatest range of characters, containing one or several group(s) of NON-escaped paired parentheses, each of them being surrounded by text different from NON-escaped parentheses -
Regex
A
:(?x) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \) # Regex A
-
Regex
B
:(?x) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* # Regex B
-
Regex
C
:(?x) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+ # Regex C
Important : Sometimes the regex engine needs to go further on, in order to get a new paired group of parentheses to match !
To test these regexes,:
-
Paste the text below in a new tab
-
Put the cursor , on the last line, right before the word
This
-
Run, successively, the regexes
A
,B
andC
C -------------------------------------------------------------------------- ----------------------------------------------------- B ------------------------|------------------------------------------------- ----------------------------------------------------- A -------------- -------------------------------------- --------------------------------------- x 1 2 1 0 1 2 1 0 x 1 2 1 0 This ( is ( ( a very ) ) small ( test \( to ( verify \( if \) all ) ) these \) ( regexes ( ( match ) NON-escaped \) parentheses) ONLY
In the new tab, you may perfectly spread over your text in many lines without any problem, as shown below :
This ( is ( ( a very ) ) small ( tes t \( to ( verify \( if \) all ) ) these \) ( regexes ( ( match ) NON- escaped \) parentheses ) ONLY
The regexes will still work ! Just one restriction : You cannot, of course, split an escaped parenthesis in two parts, like below :
these \ ) ( regexes
Best Regards,
guy038
P.S. : Of course, if you change the starting position of the search, these recursive regular expressions will certainly find very different results in value and scope !
thanks for the awesome information.
-
-
This post is deleted! -
This post is deleted!