How to find numbers in multiline in Notepad++
- 
 @gelle_marrisa The regex I provided you is tested on the text quoted just above it. All it does is match either of the 6 character strings in the quote. I provided it on the assumption that you were trying to learn techniques to help solve your overall problem. It was not intended as a complete solution. If you want help with a complete solution, you will need to read, with care, with attention, with seriousness, my remarks about the importance of being able to determine the start and the end of records in your data. 
- 
 @neil-schipper 
 lets forget the slash value, if there is only 111|22 in 2nd or 3rd line, can we get the desired result?\d{10}\R\d{3}/\d{2}\R\d{3} this shows pattern error, I am noob, \d{3}[/|]\d{2} not sure to use it as complete pattern or i have to merge it with any previous pattern that was mentioned above. 
- 
 To convert this: 12 abcd : 115/22 xyz : 333 product : blablabla 4567 code : 01010 serial : 34 56 abcd : 116|22 xyz : 333 product : blablabla 45678 code : 01010 serial : 78 90 abcd : 117|22 xyz : 333 product : blablabla 456789 code : 01010 serial : 12into this: 12|115|22|333|4567|01010|34 56|116|22|333|45678|01010|78 90|117|22|333|456789|01010|12You can use: F: (\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+
 R:$1|$2|$3|$4|$5|$6|$7\r\n
 Set cursor to the left of first number of first record
 Execute Replace AllIt will only work on your whole file if every record has exactly 7 numbers. 
- 
 Hello, @neil-schipper, @gelle_marrisa, @peterjones and All, An other solution which does not depend on the number of lines of a section would be : - 
SEARCH \D+((^\r\n)+|\z)|\D+
- 
REPLACE ?1\r\n:|
- 
Tick the Wrap aroundoption
- 
Click on the Replace Allbutton
 Of course, I assume that each section is separated by, at least, one pure empty line So, from this INPUT text : 12 abcd : 115/22 xyz : 333 product : blablabla 4567 code : 01010 serial : 34 56 abcd : 116|22 xyz : 90 abcd : product : blablabla 456789 code : serial : 12You would obtain this expected text : 12|115|22|333|4567|01010|34 56|116|22 90|456789|12
 Note that if we try to factorize the search regex expression as below : - 
SEARCH \D+(((^\r\n)+|\z)|)
- 
REPLACE ?2\r\n:|
 This regex S/R does not work properly and gives this output : 12|115|22|333|4567|01010|34|56|116|22|90|456789|12So, why, in this new regex, the case \D+(^\r\n)+never occurs ? For instance, after the number34, ending the first section of my exemple ? Well, we have this range of chars :34\r\n\r\n\r\n\r\n\r\n56. So :- 
First, the regex \D+matches\r\n\r\n\r\n\r\n\r\nbut would need some backtraking process in order that the first alternative\D+(^\r\n)+matches this same range
- 
As the whole regex contains other alternatives, the regex engine, before backtracking, tries a match attempt with the second alternative. However, the regex \D+\zcannot be applied to, at this position !
- 
Finally, the regex engine tries the last empty alternative \D+()which, of course, matches the range\r\n\r\n\r\n\r\n\r\n
 This explains why the gap between two sections is never detected with this second version of the regex S/R Best Regards, guy038 
- 
- 
 Hello, @neil-schipper, @gelle_marrisa, @peterjones and All, My reasoning, at the end of my previous post, about the second form of regex \D+(((^\r\n)+|\z)|)is not exact ! Indeed, I said :As the whole regex contains other alternatives, the regex engine, before backtracking, tries a match attempt with the second alternative But, in this case, the correct search regex of my previous post \D+((^\r\n)+|\z)|\D+, which also contains an alternation, should show the same behavior and always choose the second alternative\D+?!I’ve tried to find out an explanation, without any success :-( May be, one of yours will be able to find out a plausible one ! 
 In brief, even simplifying the first version by omitting the \zcase , and given this INPUT text, with a blank line after the last12number12 abcd : 115/22 xyz : 333 product : blablabla 4567 code : 01010 serial : 34 56 abcd : 116|22 xyz : 90 abcd : product : blablabla 456789 code : serial : 12Why the regex S/R : - 
SEARCH \D+(^\r\n)+|\D+
- 
REPLACE ?1\r\n:|
 gives : 12|115|22|333|4567|01010|34 56|116|22 90|456789|12|And this second equivalent S/R : - 
SEARCH \D+((^\r\n)+|)
- 
REPLACE ?2\r\n:|
 gives this result : 12|115|22|333|4567|01010|34|56|116|22|90|456789|12|??? BR guy038 P.S. : The problem does not comes from the empty alternative. For instance, the regex abc(def|)does find, either, the stringsabcdefand justabc!
- 
- 
 @guy038 said in How to find numbers in multiline in Notepad++: solution which does not depend on the number of lines of a section Very nice solution. I can see its applicability and am glad to know it so thanks for sharing. I won’t be much help on the follow-up discussion. I’m not even clear on what motivated you to go in this direction: if we try to factorize the search regex expression However, in trying to understand one building block of your newer regex, which includes a null in an OR subexpression, I encountered something confusing. I wanted to know “does a captured null return true or false?” So I ran ‘Replace All’ with F= (), R=?1dog:caton a few cases.In the case of a new empty file, there are 0 matches. This seems wrong, although I wouldn’t be surprised if a more experienced regex person would say it’s correct and expected (because maybe in the docs it says “no text ==> no matches” or maybe, “a zero-length null only occurs before or after a character”). In the case of a file with the single character ‘p’ there are 2 matches and we get dogpdogwhich seems reasonable.
- 
 Hello, @Neil-Schipper and All, I had never done this test : SEARCH ()REPLACE (?1dog:cat)Interesting ! You said : In the case of a new empty file, there are 0 matches. This seems wrong,… Well, your assertion is a bit philosophical : does an empty file contains a single empty string ( or an infinity ! ) ? Note that , in regex mode, the search of ()( an empty group1) does show the^ zero length matchcalltip, when applied to a new empty tab or a zero byte file !However, as you said, even a simple replacement with a dummy string, as for instance Test, does not occur and no text is inserted !Now, if I type in the phrase This is a test in a new tab and I use the regex S/R : SEARCH ()REPLACE ?1:|:xI get, after clicking on the Replace Allbutton, with theWrap aroundoption ticked, the text :|T|h|i|s| |i|s| |a| |t|e|s|t|And, to my mind, all this is quite logic : - 
The group 1is defined and contains an empty string
- 
Technically, an empty string does exist between two characters, as well as before the first char and after the last. So each occurrence is changed into the |char
 Note that we can obtain the same result with this other regex S/R : SEARCH (.{0})REPLACE ?1:|:xand also with the more simple forms : SEARCH ()REPLACE |or SEARCH .{0}REPLACE |
 As we’re speaking about empty groups, I would like to mention a particular but important point when using conditional structures, in regex mode : Let’s consider this list : Ted=First Name 25=Age Mary=First Name 75=Age Elisabeth=First Name 47=Age Bob=First Name 62=AgeLet’s introduce the conditional regex structure (?(1)Age|First Name)which means : if a group1has been previously defined, in the search regex, searches for the string Age else searches for the string First NameIf we build the regex (?-si)^(\d*).*=(?(1)Age|First Name)$, you could say :- 
If a line begins with a number, the part \d*matches this number, the part.*matches an empty string=matches the equal sign and the conditional bloc(?(1)Age|First Name)matches the string Age as the group1contains the number and is defined
- 
If a line does not begin with a number, the part \d*matches an empty string, the part.*matches the first name,=matches the equal sign and the conditional bloc(?(1)Age|First Name)matches the string First Name as the group1is not defined and empty
 However, running this regex, against our text, it matches only the lines relative to the age and not all the lines. Why ? Well, what really represents the (\d*)group, after the^assertion :- 
If a line begins with some digits, no problem : group 1is defined and contains the number
- 
Now, if a line does not begin with digits, the group 1is ALSO defined but contains an empty string
 Thus, in all cases the group 1is defined;, breaking the normal behaviour of the conditional part(?(1)Age|First Name)To get a functional overall regex, you need to change this non-optional group 1(\d*)into an optional group, with a non-optional contents…, thanks to the syntax(\d+)?. Then, the search regex becomes :(?-si)^(\d+)?.*=(?(1)Age|First Name)$This time : - 
If a line begins with a number, the optional part (\d+)?matches this number and the group1is clearly defined and contains this number
- 
But, if a line does not begin with a number the optional part (\d+)?matches nothing and the group1is not defined at all !
 You can verify that this final regex find, as expected, all the lines of our text ! Remark : Of course, we could had simply used the regex (?-si)^(\d+=Age|.+=First Name)$, without any conditional block !
 This reasoning can be applied, as well, to conditional replacements ! For instance, given this text : Ted 25 Mary 75 Elisabeth 47 Bob 62The following regex S/R : SEARCH (?-s)^(\d+)?.*$REPLACE (?1Age:First Name) : $&would gives : First Name : Ted Age : 25 First Name : Mary Age : 75 First Name : Elisabeth Age : 47 First Name : Bob Age : 62- 
If a number begins a line, group 1is defined and the string Age, followed with\x20:\x20, is inserted right before the number
- 
If a number does not begin a line, the group 1is not defined at all. So the string First Name, followed with\x20:\x20, is inserted, this time, right before the first name
 And you’ll verify, that the similar version, with the non-optional group 1(\d*):SEARCH (?-s)^(\d*).*$REPLACE (?1Age:First Name) : $&gives wrong results, with the string "Age : " ALWAYS inserted :-(( Best Regards, guy038 
- 
- 
 Good write up, @guy038. It’s good to know there’s a way to have a group conditionally defined as you showed. To get a functional overall regex, you need to change this non-optional group 1 (\d*) into an optional group, with a non-optional contents…, thanks to the syntax (\d+)? At first it seemed like this property of (spec+)?was an anomaly being exploited, or maybe an afterthought by the regex authors, but upon reflection there is some sense to it:In cases where (spec)has no match…- 
with (spec*)the (little man in the) machine says, "you asked for a capture group containing zero or more matches, so I’m giving you a capture group that contains null text; and a thing which contains surely must be defined.
- 
with (spec+)?the (little man in the) machine says, "you asked for zero or one capture groups containing matched text, so I give you zero such groups, and a thing of which there are zero (in compsci) has no memory allocated and no address, ie, is undefined
 After realizing this, I wondered if some sticky situations might arise using this technique when there’s a sequence of these conditionally defined groups (ConDefGrps for short). Consider an expression in which all capture groups (CaptGrps) are also ConDefGrps, and, say the first ConDefGrp doesn’t match, so a CaptGrp isn’t defined, but, the second one does; since this latter one is the first CaptGrp that “comes to life”, wouldn’t its reference be 1 so that any conditional test on it (no matter if later in the same expression or in the substitution statement) would actually be testing for the existence of that second appearing, first defined CaptGrp? So I set up a test to check this. Consider a scheme in which a valid code consists of zero or more number 1’s, then 2’s, then 3’s, in that order, with at least one element present. An expression that only matches lines completely filled by a valid code is: ^(?=\S)([1]+)?([2]+)?([3]+)?$but that’s not so interesting.Here’s an F/R pair that always captures the whole line whether it contains valid codes or not, and then writes it back with information about each group’s existence appended: F: ^([1]+)?([2]+)?([3]+)?.*$
 R:$0 - groups (?{1}1:.)(?{2}2:.)(?{3}3:.)When applied to this test data: 1 2 3 112 1222222223 2233111111 4 4123 12z3 1111111222223333 31 32 1133we obtain: 1 - groups 1.. 2 - groups .2. 3 - groups ..3 - groups ... 112 - groups 12. 1222222223 - groups 123 2233111111 - groups .23 4 - groups ... 4123 - groups ... 12z3 - groups 12. 1111111222223333 - groups 123 31 - groups ..3 32 - groups ..3 1133 - groups 1.3What the above demonstrates is that when a ConDefGrp is encountered in an expression, even though it may remain “undefined” (and return False in an existence test) it still consumes a group number allocated in the normal fashion. Thus, one need not worry that including multiple ConDefGrps might lead to ambiguity in the mapping of group to group number. 
- 
- 
 Hi, @neil-schipper and All, To summarize : - 
With the syntax ^(1+)?•••••, group1must contain some1'. So, if no1'can be found in text, the group1cannot be defined and is not used as optional
 (?quantifier )
- 
With the syntax ^(1*)•••••, group1may or not contain some1'. So, if no1'can be found in text, the group1is still defined with empty contents
 (*quantifier )
- 
With the syntax ^(1)*•••••, group1must contain one1'. So, if no1'can be found in text, the group1cannot be defined and is not used as optional
 (*quantifier )
 
 So, given the text : 000000 | 111111 | 222222 | 333333 | 111222 | 111133 | 223333 | 112233 |The regex S/R : SEARCH (?-s)^(1+)?(2+)?(3+)?.+REPLACE $0 groups (?{1}1:.)(?{2}2:.)(?{3}3:.)gives : 000000 | groups ... 111111 | groups 1.. 222222 | groups .2. 333333 | groups ..3 111222 | groups 12. 111133 | groups 1.3 223333 | groups .23 112233 | groups 123The regex S/R : SEARCH (?-s)^(1*)(2*)(3*).+REPLACE $0 groups (?{1}1:.)(?{2}2:.)(?{3}3:.)gives : 000000 | groups 123 111111 | groups 123 222222 | groups 123 333333 | groups 123 111222 | groups 123 111133 | groups 123 223333 | groups 123 112233 | groups 123And the regex S/R : SEARCH (?-s)^(1)*(2)*(3)*.+REPLACE $0 groups (?{1}1:.)(?{2}2:.)(?{3}3:.)gives : 000000 | groups ... 111111 | groups 1.. 222222 | groups .2. 333333 | groups ..3 111222 | groups 12. 111133 | groups 1.3 223333 | groups .23 112233 | groups 123BR guy038 
- 
- 
 So this is a good discussion thread, but the choice to use literal 1,2,3in the examples IMO wasn’t the best for the utmost clarity. :-)



