Can't Figure what is wrong in selecting text within the parentheses.
-
Dear all,
Some help needed.
Text:
The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).Want to select the quotes about references:
The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).
Use the RegExp :
\(See .+\)But the result is all the text between two set of parentheses are selected.
The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).
instead of
The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).Examine the syntax and cannot find out why.
Thanks in advance.
-
Your regex as specified is “greedy”. Use
.+?instead of.+ -
Oh, yes it works.
so
.+?means to make it lazy, now I understand what lazy and greedy means.But another issue now comes up.
Text:
In patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See “Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control”, section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)Intended to select the everything in the “outermost parenthesis”, but when I use the lazy syntax, it selects less when there is another pair of parenthesis inside.
So when I use the code:
.+?, the selection ends unexpectedlyIn patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See "Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control", section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)
This would not happy if I use the Greedy one but in the text, but it will select too much in other situation.
-
Okay, so not so much my strong suit, but for that kind of processing you need something called a “recursive regular expression”. You can google that and do some reading, but here’s a link that deals with nested parenthesis processing with a regular expression: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
From that I derived the following regex that seems to do what you need, as long as all the parentheses are balanced:
(?=\(See)(\((?>[^()]+|(?1))*\))That’s my shot at it; if you need something more complicated, you should now have the tools (after you read and learn) to get where you need to go yourself. :-D
-
Hello David,
Scott is right about it. You need a recursive regex pattern.
The more simple recursive regex, that I’ve found, is :
SEARCH
\(([^()\r\n]|(?0))*\)This regex matches the longest range of text, in a same line, containing well-balanced parentheses, enclosed by a couple of final parentheses, also included
Just test it against the text, below :
(This)--()sen(tence)con(tains(a lot)of)paren(theses)and the ((regex))matches((the())longest)range((of))well((()))balanced(((((((parentheses), enclosed )inside two) final )parenthesesNotes :
-
The regex try to match, first, an opening round bracket
\( -
The part
[^()\r\n]matches any character, different from a parenthese and an EOL character -
The part
(?0)is a reference to the whole regex\(([^()\r\n]|(?0))*\), that is to say, an second form(.....) -
As this reference
(?0)is located inside the group to which it refers ( i.e. the whole regex ), this regex turns, automatically, to be a recursive regex -
The two sub-regexes
[^()\r\n]and(?0)are the two parts of an alternative, which can be repeated, from 0 to n times* -
Finally, the regex matches an ending round bracket
\)
Remark :
If your text and parentheses may be on several lines, prefer the recursive regex, below :
SEARCH
\(([^()]|(?0))*\)Best regards
guy038
P.S. :
If you consider, for instance, the regex
((\d+)[a-z])([aeiouy])(?2)\3:-
The first group contains the regex
(\d+)[a-z] -
The second group contains the regex
\d+( an integer ) -
The third group contains the regex
[aeiouy]( a vowel ) -
The reference
(?2), located outside the regex to which it refers\d+, is called, in that case, a subroutine call ( instead of a recursive subpattern ) and we could have replaced(?2)by the pattern of group 2, i.e.\d+ -
Finally, the back-reference
\3refers to the value of the regex[aeiouy]
This regex matches expressions like :
- 123ai4567i
- 78zu12345u
- 999ha999a
but would fail to match :
- 123ai4567e
- 78zu12345y
As I said above, the two regexes
((\d+)[a-z])([aeiouy])(?2)\3and((\d+)[a-z])([aeiouy])\d+\3are strictly identical !
Beware of the main difference between the regexes
(\d)(?1)( =(\d)\d) and(\d)\1:-
The regex
(\d)(?1)would match any two digits integer from 00 to 99 -
The regex
(\d)\1would match any two digits integer, which contains two times a same digit
Test these two regexes against the following list :
10 11 13 27 34 40 44 63 66 98 99 -
-
Dear Scott and Guy
Thank you for your help and detailed explanation.
Scott one works for me as I need "See " in the beginning of the parenthesis.
I try to modify Guy’s one to work for me but not working it out.Thanks will continue to study it.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login