Community
    • Login

    (?(backreference)true-regex|false-regex)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    14 Posts 5 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • wing yangW
      wing yang
      last edited by

      123-456-7890 //line1
      (123)456-7890 //line2
      (123)-456-7890 //line3
      (123-456-7890 //line4
      1234567890 //line5
      123 456 7890 //line6

      (()?\d{3}(?(1))|-)\d{3}-\d{4}
      I expected that it matchs line1 or lin2, but line4

      Is there some error ?

      PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @wing yang
        last edited by PeterJones

        Here is the same example text and regex, inside a code block, so that people can see the actual data and regex

        123-456-7890           				//line1
        (123)456-7890					//line2
        (123)-456-7890					//line3
        (123-456-7890					//line4
        1234567890						//line5
        123 456 7890						//line6
        
        
        (\()?\d{3}(?(1)\)|-)\d{3}-\d{4}         
        

        ----

        Useful References

        • Please Read Before Posting
        • Template for Search/Replace Questions
        • FAQ: Where to find regular expressions (regex) documentation
        • Notepad++ Online User Manual: Searching/Regex
        wing yangW 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @wing yang
          last edited by

          @wing-yang said in (?(backreference)true-regex|false-regex):

          I expected that it matchs line1 or lin2, but line4

          I think you mean: “I expected that it matches line1 or line2, but not line4” ?

          I guess you’d have to say what it is about line4 that makes you not want to match it?

          wing yangW 1 Reply Last reply Reply Quote 1
          • MAPJe71M
            MAPJe71
            last edited by MAPJe71

            Why not use (\(\d{3}\)|\d{3}-)\d{3}-\d{4}

            wing yangW 1 Reply Last reply Reply Quote 1
            • wing yangW
              wing yang @PeterJones
              last edited by

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • wing yangW
                wing yang @Alan Kilborn
                last edited by

                @Alan-Kilborn
                yes . but not line4.
                thank you

                1 Reply Last reply Reply Quote 0
                • wing yangW
                  wing yang @MAPJe71
                  last edited by

                  @MAPJe71
                  I want to exercise backreference
                  thank you

                  PeterJonesP 1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @wing yang
                    last edited by PeterJones

                    @wing-yang ,

                    I want to exercise backreference

                    Your original expression was doing the backreference correctly. The problem is the expression was not anchored, so line four was matching the non-parenthesis side of the conditional starting on the 1 after failing to match the paren version starting on the (. If you anchor to the beginning of the line, ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}, then the expression just matches lines 1 and 2.

                    wing yangW 2 Replies Last reply Reply Quote 0
                    • wing yangW
                      wing yang @PeterJones
                      last edited by wing yang

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • wing yangW
                        wing yang @PeterJones
                        last edited by

                        @PeterJones
                        123-456-7890 //line1
                        (123)456-7890 //line2
                        (123)-456-7890 //line3
                        (123-456-7890 //line4
                        1234567890 //line5
                        123 456 7890 //line6
                        blankblank(123-456-7890 //line7
                        blankblank123-456-7890 //line8

                        ^(()?\d{3}(?(1))|-)\d{3}-\d{4} miss line8

                        PeterJonesP 1 Reply Last reply Reply Quote 0
                        • PeterJonesP
                          PeterJones @wing yang
                          last edited by PeterJones

                          @wing-yang ,

                          Have you not noticed that when you type a regular expression here, not all the characters show up? Do you not realize that this makes it harder to answer your questions? Do you really want us to have to do extra work just to understand your question, let alone answer it? Have you not read the “Please Read Before Posting” and other FAQ entries that tell you how to format forum posts, especially FAQ: Template for Search&Replace Questions, which explains in excruciating detail how to ask successful search-and-replace questions?

                          When you are supplying data or regex, you can select the text and hit the </> button, which will put a line of ``` before and after your selected text. It will look like this while you are editing:

                          ```
                          123-456-7890 //line1
                          (123)456-7890 //line2
                          (123)-456-7890 //line3
                          (123-456-7890 //line4
                          1234567890 //line5
                          123 456 7890 //line6
                          blankblank(123-456-7890 //line7
                          blankblank123-456-7890 //line8
                          
                          ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8
                          ```
                          

                          And then it will render like this:

                          123-456-7890 //line1
                          (123)456-7890 //line2
                          (123)-456-7890 //line3
                          (123-456-7890 //line4
                          1234567890 //line5
                          123 456 7890 //line6
                          blankblank(123-456-7890 //line7
                          blankblank123-456-7890 //line8
                          
                          ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8
                          

                          Notice that now, we can read your whole regular expression that you tried, without the characters being missing.

                          Now, back to your problem statement:

                          ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8
                          

                          Of course it misses line 8, because you changed the definition of what you want it to match. We’ve given you enough to answer your original question. At some point, you have to try to understand what you’ve been given, and try to modify it for your own needs. But I will give you one last freebie.

                          All your example data had your text that you wanted to match starting at the beginning of the line, so we included a ^ anchor to make it start at the beginning of the line, because that was one way to prevent line 4 from matching. Now line 8 does not start the phone number at the beginning of the line. Take out that character, so that you have the regex (\()?\d{3}(?(1)\)|-)\d{3}-\d{4}, and it will match line 8.

                          At that point, you are going to complain that “(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} is what I originally had: it also matches line 4 and line 7 (and I don’t want that)”. And this is because it tried to match (###) and couldn’t, so then it went to the alternation which wants to match ###- which it could do.

                          To fix that, since we can no longer anchor to the beginning of the line, we need to instead tell it that ( cannot come before, using a negative lookbehind: (?<!\() says that “the character before the match cannot be a literal (”.

                          The final regular expression I am going to give you is thus

                          • (?<!\()(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}

                          Per the screenshot, it matches lines 1, 2, and 8 without matching the other lines.

                          ef86e154-5363-4324-a121-929f2787c6e1-image.png

                          From here, you can use the references below to start learning more about the individual components of the regex yourself. And please note that we are not a generic “help me learn regex” site: we focus on Notepad++, and while regexes are a part of Notepad++, they are only a part, and we will stop answering regex questions if it gets in the way of actually talking about Notepad++ itself.

                          ----

                          Useful References

                          • Please Read Before Posting
                          • Template for Search/Replace Questions
                          • FAQ: Where to find regular expressions (regex) documentation
                          • Notepad++ Online User Manual: Searching/Regex

                          ----

                          Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

                          wing yangW 1 Reply Last reply Reply Quote 0
                          • wing yangW
                            wing yang @PeterJones
                            last edited by

                            @PeterJones
                            I see. I apprecate your help.

                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038

                              Hello, @wing-yang, @peterjones, @mapje71 and All,

                              Just back from holidays, in Brittany : one of the few temperate regions in the world !! We can fully rest, during the nights, with temperatures always under 20° !


                              @wing-yang, when you said to @mapje71 :

                              I want to exercise backreference
                              thank you

                              Your statement is not totally exact ! Indeed, you want to exercise the conditional regex feature, whose definition is :

                              (?(...) THEN part [ | ELSE part ] )

                              where the condition (...) can be, either :

                              • (N) : A group number, previously created

                              • (<Name>) or ('Name') : A named group, previously created

                              • (?=...) or (?!...) : A positive or negative look-ahead

                              • (?<=...) or (?<!...) : A positive or negative look-behind

                              • (R) : A recursive reference to the whole regex

                              • (Rn) : A recursive reference to the unnamed group number n

                              • (R&Name) : A recursive reference to the named group name

                              And, of course, the ?(1) part of your regex refers to the optional syntax (\()?, created before, which describe two cases :

                              • (\(){1} : defined group 1, containing an opening parenthese => The part THEN is used

                              • (\(){0} : non-defined group 1, as group 1 contains nothing => The part ELSE is used


                              Now, the last formulation of @peterjones, expressed with the free-spacing mode :

                              (?x) (?<!\() (\()? \d{3} (?(1) \) | -) \d{3}-\d{4}

                              works nicely against this test list :

                              123-456-7890               //  Line_01   OK
                              (123)456-7890              //  Line_02   OK
                              (123)-456-7890             //  Line_03
                              (123-456-7890              //  Line_04
                              1234567890                 //  Line_05
                              123 456 7890               //  Line_06
                              blankblank(123-456-7890    //  Line_07
                              blankblank123-456-7890     //  Line_08   OK
                              blankblank((123)-456-7890  //  Line_09
                              blankblank(123)456-7890    //  Line_10   OK
                              blankblank)123-456-7890    //  Line_11   OK
                                (123)456-7890            //  Line_12   OK
                                123-456-7890             //  Line_13   OK
                              blankblank(123))-456-7890  //  Line_14
                              blankblank((123))-456-7890 //  Line_15
                              blankblank((123))-456-7890 //  Line_16
                                (123))-456-7890          //  Line_17
                                ((123))-456-7890         //  Line_18
                                ((123))-456-7890         //  Line_19
                              

                              Note that if you don’t want to use a conditional statemment, we can improve the @mapje71 solution, and get the equivalent regex :

                              (?x) ^ [^(\r\n]* (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}

                              where :

                              • ^ [^(\r\n]* is a leading range of characters, possibly null, different from an opening parenthese and a line-break char

                              • (?: \( (\d{3}) \) | (?1) - ) represents a non-capturing group, containing the two alternatives

                              • The (?1) syntax represents a subroutine call to the regex part \d[3}

                              Best Regards,

                              guy038

                              1 Reply Last reply Reply Quote 2
                              • guy038G
                                guy038
                                last edited by

                                Hi, @wing-yang, @peterjones, @mapje71 and All,

                                In the last part of my previous post, I proposed a regex which does not use any conditional statement :

                                (?x) ^ [^(\r\n]* (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}

                                However, this regex can simply be improved by using the @peterjones’s look-behind structure, instead of the negative class character :

                                (?x) (?<!\() (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}

                                which allows to get several occurences in a single line, as below :

                                abc (123)456-7890 def 123-456-7890 ghi
                                

                                BR

                                guy038

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors