(?(backreference)true-regex|false-regex)
-
Here is the same example text and regex, inside a code block, so that people can see the actual data and regex
123-456-7890 //line1 (123)456-7890 //line2 (123)-456-7890 //line3 (123-456-7890 //line4 1234567890 //line5 123 456 7890 //line6 (\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
----
Useful References
-
@wing-yang said in (?(backreference)true-regex|false-regex):
I expected that it matchs line1 or lin2, but line4
I think you mean: “I expected that it matches line1 or line2, but not line4” ?
I guess you’d have to say what it is about line4 that makes you not want to match it?
-
Why not use
(\(\d{3}\)|\d{3}-)\d{3}-\d{4}
-
This post is deleted! -
@Alan-Kilborn
yes . but not line4.
thank you -
@MAPJe71
I want to exercise backreference
thank you -
I want to exercise backreference
Your original expression was doing the backreference correctly. The problem is the expression was not anchored, so line four was matching the non-parenthesis side of the conditional starting on the
1
after failing to match the paren version starting on the(
. If you anchor to the beginning of the line,^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
, then the expression just matches lines 1 and 2. -
This post is deleted! -
@PeterJones
123-456-7890 //line1
(123)456-7890 //line2
(123)-456-7890 //line3
(123-456-7890 //line4
1234567890 //line5
123 456 7890 //line6
blankblank(123-456-7890 //line7
blankblank123-456-7890 //line8^(()?\d{3}(?(1))|-)\d{3}-\d{4} miss line8
-
Have you not noticed that when you type a regular expression here, not all the characters show up? Do you not realize that this makes it harder to answer your questions? Do you really want us to have to do extra work just to understand your question, let alone answer it? Have you not read the “Please Read Before Posting” and other FAQ entries that tell you how to format forum posts, especially FAQ: Template for Search&Replace Questions, which explains in excruciating detail how to ask successful search-and-replace questions?
When you are supplying data or regex, you can select the text and hit the
</>
button, which will put a line of```
before and after your selected text. It will look like this while you are editing:``` 123-456-7890 //line1 (123)456-7890 //line2 (123)-456-7890 //line3 (123-456-7890 //line4 1234567890 //line5 123 456 7890 //line6 blankblank(123-456-7890 //line7 blankblank123-456-7890 //line8 ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8 ```
And then it will render like this:
123-456-7890 //line1 (123)456-7890 //line2 (123)-456-7890 //line3 (123-456-7890 //line4 1234567890 //line5 123 456 7890 //line6 blankblank(123-456-7890 //line7 blankblank123-456-7890 //line8 ^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8
Notice that now, we can read your whole regular expression that you tried, without the characters being missing.
Now, back to your problem statement:
^(\()?\d{3}(?(1)\)|-)\d{3}-\d{4} miss line8
Of course it misses line 8, because you changed the definition of what you want it to match. We’ve given you enough to answer your original question. At some point, you have to try to understand what you’ve been given, and try to modify it for your own needs. But I will give you one last freebie.
All your example data had your text that you wanted to match starting at the beginning of the line, so we included a
^
anchor to make it start at the beginning of the line, because that was one way to prevent line 4 from matching. Now line 8 does not start the phone number at the beginning of the line. Take out that character, so that you have the regex(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
, and it will match line 8.At that point, you are going to complain that “
(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
is what I originally had: it also matches line 4 and line 7 (and I don’t want that)”. And this is because it tried to match(###)
and couldn’t, so then it went to the alternation which wants to match###-
which it could do.To fix that, since we can no longer anchor to the beginning of the line, we need to instead tell it that
(
cannot come before, using a negative lookbehind:(?<!\()
says that “the character before the match cannot be a literal(
”.The final regular expression I am going to give you is thus
(?<!\()(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}
Per the screenshot, it matches lines 1, 2, and 8 without matching the other lines.
From here, you can use the references below to start learning more about the individual components of the regex yourself. And please note that we are not a generic “help me learn regex” site: we focus on Notepad++, and while regexes are a part of Notepad++, they are only a part, and we will stop answering regex questions if it gets in the way of actually talking about Notepad++ itself.
----
Useful References
- Please Read Before Posting
- Template for Search/Replace Questions
- FAQ: Where to find regular expressions (regex) documentation
- Notepad++ Online User Manual: Searching/Regex
----
Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.
-
@PeterJones
I see. I apprecate your help. -
Hello, @wing-yang, @peterjones, @mapje71 and All,
Just back from holidays, in Brittany : one of the few temperate regions in the world !! We can fully rest, during the nights, with temperatures always under 20° !
@wing-yang, when you said to @mapje71 :
I want to exercise backreference
thank youYour statement is not totally exact ! Indeed, you want to exercise the conditional regex feature, whose definition is :
(?(...) THEN part [ | ELSE part ] )
where the condition
(...)
can be, either :-
(N)
: A group number, previously created -
(<Name>)
or('Name')
: A named group, previously created -
(?=...)
or(?!...)
: A positive or negative look-ahead -
(?<=...)
or(?<!...)
: A positive or negative look-behind -
(R)
: A recursive reference to the whole regex -
(Rn)
: A recursive reference to the unnamed group number n -
(R&Name)
: A recursive reference to the named group name
And, of course, the
?(1)
part of your regex refers to the optional syntax(\()?
, created before, which describe two cases :-
(\(){1}
: defined group1
, containing an opening parenthese => The part THEN is used -
(\(){0}
: non-defined group1
, as group1
contains nothing => The part ELSE is used
Now, the last formulation of @peterjones, expressed with the free-spacing mode :
(?x) (?<!\() (\()? \d{3} (?(1) \) | -) \d{3}-\d{4}
works nicely against this test list :
123-456-7890 // Line_01 OK (123)456-7890 // Line_02 OK (123)-456-7890 // Line_03 (123-456-7890 // Line_04 1234567890 // Line_05 123 456 7890 // Line_06 blankblank(123-456-7890 // Line_07 blankblank123-456-7890 // Line_08 OK blankblank((123)-456-7890 // Line_09 blankblank(123)456-7890 // Line_10 OK blankblank)123-456-7890 // Line_11 OK (123)456-7890 // Line_12 OK 123-456-7890 // Line_13 OK blankblank(123))-456-7890 // Line_14 blankblank((123))-456-7890 // Line_15 blankblank((123))-456-7890 // Line_16 (123))-456-7890 // Line_17 ((123))-456-7890 // Line_18 ((123))-456-7890 // Line_19
Note that if you don’t want to use a conditional statemment, we can improve the @mapje71 solution, and get the equivalent regex :
(?x) ^ [^(\r\n]* (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}
where :
-
^ [^(\r\n]*
is a leading range of characters, possibly null, different from an opening parenthese and a line-break char -
(?: \( (\d{3}) \) | (?1) - )
represents a non-capturing group, containing the two alternatives -
The
(?1)
syntax represents a subroutine call to the regex part\d[3}
Best Regards,
guy038
-
-
Hi, @wing-yang, @peterjones, @mapje71 and All,
In the last part of my previous post, I proposed a regex which does not use any conditional statement :
(?x) ^ [^(\r\n]* (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}
However, this regex can simply be improved by using the @peterjones’s look-behind structure, instead of the negative class character :
(?x) (?<!\() (?: \( (\d{3}) \) | (?1) - ) (?1) - \d{4}
which allows to get several occurences in a single line, as below :
abc (123)456-7890 def 123-456-7890 ghi
BR
guy038