regex: compare lines and find out different numbers
-
hello. I have this lines on my html pages (1978 pages).
xxx (22)</a>
yyy (21)</a>
zzz (13)</a>Somewhere in my files, there is a mistake, but I’m not sure where at thos final numbers. Some numbers are may be different in some pages, but I need the be the same in all pages.
Can anyone give me an idea how to find the html pages that contains a line that has other numbers then all above?
-
Hello, Vasile Caraus and All,
I, certainly, miss something obvious, because regexes to achieve what you want, do not seem that difficult ;-))
For instance, the regex :
(?-s)^.{4}\((?!(22|21|13))\).+would match all the contents of any line, which has :-
An opening round parenthesis, at position
5 -
Then a two-digits number, different of, either,
22,21and13 -
And, finally, an ending round parenthesis, at position
8
On the other hand, the regex :
(?-s)^.{4}\((?!(22|21|13))\K..(?=\))would match any two-digits number,enclosed in parentheses, which is different from, either, the values22,21and13Best Regards,
guy038
-
-
hello Guy, but in the case I have something like this, what will be the formula?
Please see this link, akismet do not let me write the code
https://regex101.com/r/vRXKWj/3/
So, the order of the numbers must be taken into account, and regex formula should match only where the number does not match the default number.
-
Hi, Vasile Caraus and All,
I don’t understand, exactly, what you need ! Please, you should provide best explanations ! So, just one assumption :
Given the text, below :
# A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... ... ... # Then an other BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <== A <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (57)</a></li> <== B ... ... ... And a last BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (20)</a></li> <== C <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (33)</a></li> <== D <li><a href="mmm.html" title="mmm">mmm (34)</a></li> <== E <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... ... ...You would like that the
5lines, fromAtoE, would be matched by the regex engine, because their numbers do not correspond to the default numbers, respectively to their location in each block ! Wouldn’t you ?See you later
Cheers,
guy038
-
yes. That’s right
-
Hi, @vasile-caraus,
OK ! But what do you expect when the
5lines, fromAtoE, are found ?Do you want that a regex S/R replaces the erroneous values, of the second and third block, with their corresponding default values, of the first block of
6lines ?BR
guy038
-
in case the numbers are different, regex should match only those lines. Other case, nothing to find.
-
Hello, @vasile-caraus, and All,
I already did numerous tests, but I’m still not satisfied ! Let’s carry on our discussion :
Assuming the text, below :
.... # A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) ... <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... # Then, an other BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <== F <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (21)</a></li> <== G ...Do you want that the regex matches all line contents, when :
-
Only case
F, where the value is different from any of the 6 default values -
Both cases
FandGwhich has the value21, corresponding to the second line of the default block, above, and not the sixth !
BR
guy038
-
-
Both, F and G.
Something like this with default numbers
<li><a href=*.html" title=.*(?!\b(22|21|13|23|24|25\b).)*And, if F and G are not the same number on my default numbers, regex should match that line. -
Hello, @vasile-caraus, and All;
Unfortunately, I could not find an automatic way, because you need,both, condition on values and condition on locations, which would need, preferably, a Python or Lua script
However, here is, below a possible work-around, which produce correct results !
So, assuming the original sample text, below :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (57)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (20)</a></li> <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (21)</a></li> <li><a href="mmm.html" title="mmm">mmm (34)</a></li> <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... <li><a href="ooo.html" title="ooo">ooo (22)</a></li> <li><a href="ppp.html" title="ppp">ppp (99)</a></li> <li><a href="qqq.html" title="qqq">qqq (15)</a></li> <li><a href="rrr.html" title="rrr">rrr (23)</a></li> <li><a href="sss.html" title="sss">sss (24)</a></li> <li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... <li><a href="uuu.html" title="uuu">uuu (07)</a></li> <li><a href="vvv.html" title="vvv">vvv (13)</a></li> <li><a href="www.html" title="www">www (21)</a></li> <li><a href="xxx.html" title="xxx">xxx (15)</a></li> <li><a href="yyy.html" title="yyy">yyy (23)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...I thought, to begin with, to prefix any line of these
6-linesblocks, with their corresponding default values, with a regex S/R ( I used the#symbol as a separator, which, I hope, does not exist, yet, in your file ! )SEARCH
(?-s)^(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)REPLACE
22#${1}21#${2}13#${3}23#${4}24#${5}15#${6}So, we get the following text :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) 22#<li><a href="xxx.html" title="xxx">xxx (22)</a></li> 21#<li><a href="yyy.html" title="yyy">yyy (21)</a></li> 13#<li><a href="zzz.html" title="zzz">zzz (13)</a></li> 23#<li><a href="xxx.html" title="xxx">xxx (23)</a></li> 24#<li><a href="yyy.html" title="yyy">yyy (24)</a></li> 15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... 22#<li><a href="ccc.html" title="ccc">ccc (22)</a></li> 21#<li><a href="ddd.html" title="ddd">ddd (21)</a></li> 13#<li><a href="eee.html" title="eee">eee (00)</a></li> 23#<li><a href="fff.html" title="fff">fff (23)</a></li> 24#<li><a href="ggg.html" title="ggg">ggg (24)</a></li> 15#<li><a href="hhh.html" title="hhh">hhh (57)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... 22#<li><a href="iii.html" title="iii">iii (20)</a></li> 21#<li><a href="jjj.html" title="jjj">jjj (21)</a></li> 13#<li><a href="kkk.html" title="kkk">kkk (13)</a></li> 23#<li><a href="lll.html" title="lll">lll (21)</a></li> 24#<li><a href="mmm.html" title="mmm">mmm (34)</a></li> 15#<li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... 22#<li><a href="ooo.html" title="ooo">ooo (22)</a></li> 21#<li><a href="ppp.html" title="ppp">ppp (99)</a></li> 13#<li><a href="qqq.html" title="qqq">qqq (15)</a></li> 23#<li><a href="rrr.html" title="rrr">rrr (23)</a></li> 24#<li><a href="sss.html" title="sss">sss (24)</a></li> 15#<li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... 22#<li><a href="uuu.html" title="uuu">uuu (07)</a></li> 21#<li><a href="vvv.html" title="vvv">vvv (13)</a></li> 13#<li><a href="www.html" title="www">www (21)</a></li> 23#<li><a href="xxx.html" title="xxx">xxx (15)</a></li> 24#<li><a href="yyy.html" title="yyy">yyy (23)</a></li> 15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...Now, it’s obvious that the simple regex
^(.+)#(?!.+\(\1\)).+, will match any line with a number, between parentheses, different from the number, at beginning of current line, located before the#separator !
If you prefer to replace all the erroneous values with the right ones, you may use the following regex S/R
SEARCH
^(.+)#(?!.+\(\1\))(.+\().+(\).+)|^.+#REPLACE
\2\1\3And, of course, you’ll get the different blocks, with the identical default values between parentheses :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (13)</a></li> <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (15)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (22)</a></li> <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (23)</a></li> <li><a href="mmm.html" title="mmm">mmm (24)</a></li> <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... <li><a href="ooo.html" title="ooo">ooo (22)</a></li> <li><a href="ppp.html" title="ppp">ppp (21)</a></li> <li><a href="qqq.html" title="qqq">qqq (13)</a></li> <li><a href="rrr.html" title="rrr">rrr (23)</a></li> <li><a href="sss.html" title="sss">sss (24)</a></li> <li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... <li><a href="uuu.html" title="uuu">uuu (22)</a></li> <li><a href="vvv.html" title="vvv">vvv (21)</a></li> <li><a href="www.html" title="www">www (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...Cheers,
guy038
-
thanks guy, your solution is ok, but complex. I just found another solution.
<li><a href=".*\.html" title=".*">.* (?:(?!\b(22|9|15|23|4|15)\b).)*<\/a><\/li>$Check this out: https://regex101.com/r/vRXKWj/4/