Community
    • Login

    Can't Figure what is wrong in selecting text within the parentheses.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 2.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • David ChiuD
      David Chiu
      last edited by

      Dear all,

      Some help needed.

      Text:
      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Want to select the quotes about references:

      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Use the RegExp : \(See .+\)

      But the result is all the text between two set of parentheses are selected.

      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).
      instead of
      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Examine the syntax and cannot find out why.

      Thanks in advance.

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @David Chiu
        last edited by

        @David-Chiu

        Your regex as specified is “greedy”. Use .+? instead of .+

        1 Reply Last reply Reply Quote 0
        • David ChiuD
          David Chiu
          last edited by

          Oh, yes it works.

          so .+? means to make it lazy, now I understand what lazy and greedy means.

          But another issue now comes up.
          Text:
          In patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See “Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control”, section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)

          Intended to select the everything in the “outermost parenthesis”, but when I use the lazy syntax, it selects less when there is another pair of parenthesis inside.

          So when I use the code: .+? , the selection ends unexpectedly

          In patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See "Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control", section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)

          This would not happy if I use the Greedy one but in the text, but it will select too much in other situation.

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner @David Chiu
            last edited by

            @David-Chiu

            Okay, so not so much my strong suit, but for that kind of processing you need something called a “recursive regular expression”. You can google that and do some reading, but here’s a link that deals with nested parenthesis processing with a regular expression: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

            From that I derived the following regex that seems to do what you need, as long as all the parentheses are balanced:
            (?=\(See)(\((?>[^()]+|(?1))*\))

            That’s my shot at it; if you need something more complicated, you should now have the tools (after you read and learn) to get where you need to go yourself.    :-D

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by

              Hello David,

              Scott is right about it. You need a recursive regex pattern.

              The more simple recursive regex, that I’ve found, is :

              SEARCH \(([^()\r\n]|(?0))*\)

              This regex matches the longest range of text, in a same line, containing well-balanced parentheses, enclosed by a couple of final parentheses, also included

              Just test it against the text, below :

              (This)--()sen(tence)con(tains(a lot)of)paren(theses)and the ((regex))matches((the())longest)range((of))well((()))balanced(((((((parentheses), enclosed )inside two) final )parentheses
              

              Notes :

              • The regex try to match, first, an opening round bracket \(

              • The part [^()\r\n] matches any character, different from a parenthese and an EOL character

              • The part (?0) is a reference to the whole regex \(([^()\r\n]|(?0))*\), that is to say, an second form (.....)

              • As this reference (?0) is located inside the group to which it refers ( i.e. the whole regex ), this regex turns, automatically, to be a recursive regex

              • The two sub-regexes [^()\r\n] and (?0) are the two parts of an alternative, which can be repeated, from 0 to n times *

              • Finally, the regex matches an ending round bracket \)

              Remark :

              If your text and parentheses may be on several lines, prefer the recursive regex, below :

              SEARCH \(([^()]|(?0))*\)

              Best regards

              guy038

              P.S. :

              If you consider, for instance, the regex ((\d+)[a-z])([aeiouy])(?2)\3 :

              • The first group contains the regex (\d+)[a-z]

              • The second group contains the regex \d+ ( an integer )

              • The third group contains the regex [aeiouy] ( a vowel )

              • The reference (?2), located outside the regex to which it refers \d+, is called, in that case, a subroutine call ( instead of a recursive subpattern ) and we could have replaced (?2) by the pattern of group 2, i.e. \d+

              • Finally, the back-reference \3 refers to the value of the regex [aeiouy]

              This regex matches expressions like :

              • 123ai4567i
              • 78zu12345u
              • 999ha999a

              but would fail to match :

              • 123ai4567e
              • 78zu12345y

              As I said above, the two regexes ((\d+)[a-z])([aeiouy])(?2)\3 and ((\d+)[a-z])([aeiouy])\d+\3 are strictly identical !


              Beware of the main difference between the regexes (\d)(?1) ( = (\d)\d ) and (\d)\1 :

              • The regex (\d)(?1) would match any two digits integer from 00 to 99

              • The regex (\d)\1 would match any two digits integer, which contains two times a same digit

              Test these two regexes against the following list :

              10
              11
              13
              27
              34
              40
              44
              63
              66
              98
              99
              
              1 Reply Last reply Reply Quote 0
              • David ChiuD
                David Chiu
                last edited by

                Dear Scott and Guy

                Thank you for your help and detailed explanation.
                Scott one works for me as I need "See " in the beginning of the parenthesis.
                I try to modify Guy’s one to work for me but not working it out.

                Thanks will continue to study it.

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors