Community
    • Login

    Can't Figure what is wrong in selecting text within the parentheses.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 3.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • David ChiuD Offline
      David Chiu
      last edited by

      Dear all,

      Some help needed.

      Text:
      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Want to select the quotes about references:

      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Use the RegExp : \(See .+\)

      But the result is all the text between two set of parentheses are selected.

      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).
      instead of
      The odds ratio is 1.4 (See Reference 12) and NTT is 5 (See Reference 13).

      Examine the syntax and cannot find out why.

      Thanks in advance.

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS Offline
        Scott Sumner @David Chiu
        last edited by

        @David-Chiu

        Your regex as specified is “greedy”. Use .+? instead of .+

        1 Reply Last reply Reply Quote 0
        • David ChiuD Offline
          David Chiu
          last edited by

          Oh, yes it works.

          so .+? means to make it lazy, now I understand what lazy and greedy means.

          But another issue now comes up.
          Text:
          In patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See “Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control”, section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)

          Intended to select the everything in the “outermost parenthesis”, but when I use the lazy syntax, it selects less when there is another pair of parenthesis inside.

          So when I use the code: .+? , the selection ends unexpectedly

          In patients with recurrent cellulitis due to S. aureus, attempting decolonization is reasonable; this is discussed further separately. (See "Methicillin-resistant Staphylococcus aureus (MRSA) in adults: Prevention and control", section on ‘Decolonization’ and “Methicillin-resistant Staphylococcus aureus in children: Prevention and control”.)

          This would not happy if I use the Greedy one but in the text, but it will select too much in other situation.

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS Offline
            Scott Sumner @David Chiu
            last edited by

            @David-Chiu

            Okay, so not so much my strong suit, but for that kind of processing you need something called a “recursive regular expression”. You can google that and do some reading, but here’s a link that deals with nested parenthesis processing with a regular expression: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

            From that I derived the following regex that seems to do what you need, as long as all the parentheses are balanced:
            (?=\(See)(\((?>[^()]+|(?1))*\))

            That’s my shot at it; if you need something more complicated, you should now have the tools (after you read and learn) to get where you need to go yourself.    :-D

            1 Reply Last reply Reply Quote 0
            • guy038G Offline
              guy038
              last edited by

              Hello David,

              Scott is right about it. You need a recursive regex pattern.

              The more simple recursive regex, that I’ve found, is :

              SEARCH \(([^()\r\n]|(?0))*\)

              This regex matches the longest range of text, in a same line, containing well-balanced parentheses, enclosed by a couple of final parentheses, also included

              Just test it against the text, below :

              (This)--()sen(tence)con(tains(a lot)of)paren(theses)and the ((regex))matches((the())longest)range((of))well((()))balanced(((((((parentheses), enclosed )inside two) final )parentheses
              

              Notes :

              • The regex try to match, first, an opening round bracket \(

              • The part [^()\r\n] matches any character, different from a parenthese and an EOL character

              • The part (?0) is a reference to the whole regex \(([^()\r\n]|(?0))*\), that is to say, an second form (.....)

              • As this reference (?0) is located inside the group to which it refers ( i.e. the whole regex ), this regex turns, automatically, to be a recursive regex

              • The two sub-regexes [^()\r\n] and (?0) are the two parts of an alternative, which can be repeated, from 0 to n times *

              • Finally, the regex matches an ending round bracket \)

              Remark :

              If your text and parentheses may be on several lines, prefer the recursive regex, below :

              SEARCH \(([^()]|(?0))*\)

              Best regards

              guy038

              P.S. :

              If you consider, for instance, the regex ((\d+)[a-z])([aeiouy])(?2)\3 :

              • The first group contains the regex (\d+)[a-z]

              • The second group contains the regex \d+ ( an integer )

              • The third group contains the regex [aeiouy] ( a vowel )

              • The reference (?2), located outside the regex to which it refers \d+, is called, in that case, a subroutine call ( instead of a recursive subpattern ) and we could have replaced (?2) by the pattern of group 2, i.e. \d+

              • Finally, the back-reference \3 refers to the value of the regex [aeiouy]

              This regex matches expressions like :

              • 123ai4567i
              • 78zu12345u
              • 999ha999a

              but would fail to match :

              • 123ai4567e
              • 78zu12345y

              As I said above, the two regexes ((\d+)[a-z])([aeiouy])(?2)\3 and ((\d+)[a-z])([aeiouy])\d+\3 are strictly identical !


              Beware of the main difference between the regexes (\d)(?1) ( = (\d)\d ) and (\d)\1 :

              • The regex (\d)(?1) would match any two digits integer from 00 to 99

              • The regex (\d)\1 would match any two digits integer, which contains two times a same digit

              Test these two regexes against the following list :

              10
              11
              13
              27
              34
              40
              44
              63
              66
              98
              99
              
              1 Reply Last reply Reply Quote 0
              • David ChiuD Offline
                David Chiu
                last edited by

                Dear Scott and Guy

                Thank you for your help and detailed explanation.
                Scott one works for me as I need "See " in the beginning of the parenthesis.
                I try to modify Guy’s one to work for me but not working it out.

                Thanks will continue to study it.

                1 Reply Last reply Reply Quote 1

                Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                With your input, this post could be even better 💗

                Register Login
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors