regex: compare lines and find out different numbers
-
hello. I have this lines on my html pages (1978 pages).
xxx (22)</a>
yyy (21)</a>
zzz (13)</a>
Somewhere in my files, there is a mistake, but I’m not sure where at thos final numbers. Some numbers are may be different in some pages, but I need the be the same in all pages.
Can anyone give me an idea how to find the html pages that contains a line that has other numbers then all above?
-
Hello, Vasile Caraus and All,
I, certainly, miss something obvious, because regexes to achieve what you want, do not seem that difficult ;-))
For instance, the regex :
(?-s)^.{4}\((?!(22|21|13))\).+
would match all the contents of any line, which has :-
An opening round parenthesis, at position
5
-
Then a two-digits number, different of, either,
22
,21
and13
-
And, finally, an ending round parenthesis, at position
8
On the other hand, the regex :
(?-s)^.{4}\((?!(22|21|13))\K..(?=\))
would match any two-digits number,enclosed in parentheses, which is different from, either, the values22
,21
and13
Best Regards,
guy038
-
-
hello Guy, but in the case I have something like this, what will be the formula?
Please see this link, akismet do not let me write the code
https://regex101.com/r/vRXKWj/3/
So, the order of the numbers must be taken into account, and regex formula should match only where the number does not match the default number.
-
Hi, Vasile Caraus and All,
I don’t understand, exactly, what you need ! Please, you should provide best explanations ! So, just one assumption :
Given the text, below :
# A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... ... ... # Then an other BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <== A <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (57)</a></li> <== B ... ... ... And a last BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (20)</a></li> <== C <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (33)</a></li> <== D <li><a href="mmm.html" title="mmm">mmm (34)</a></li> <== E <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... ... ...
You would like that the
5
lines, fromA
toE
, would be matched by the regex engine, because their numbers do not correspond to the default numbers, respectively to their location in each block ! Wouldn’t you ?See you later
Cheers,
guy038
-
yes. That’s right
-
Hi, @vasile-caraus,
OK ! But what do you expect when the
5
lines, fromA
toE
, are found ?Do you want that a regex S/R replaces the erroneous values, of the second and third block, with their corresponding default values, of the first block of
6
lines ?BR
guy038
-
in case the numbers are different, regex should match only those lines. Other case, nothing to find.
-
Hello, @vasile-caraus, and All,
I already did numerous tests, but I’m still not satisfied ! Let’s carry on our discussion :
Assuming the text, below :
.... # A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) ... <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... # Then, an other BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <== F <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (21)</a></li> <== G ...
Do you want that the regex matches all line contents, when :
-
Only case
F
, where the value is different from any of the 6 default values -
Both cases
F
andG
which has the value21
, corresponding to the second line of the default block, above, and not the sixth !
BR
guy038
-
-
Both, F and G.
Something like this with default numbers
<li><a href=*.html" title=.*(?!\b(22|21|13|23|24|25\b).)*
And, if F and G are not the same number on my default numbers, regex should match that line. -
Hello, @vasile-caraus, and All;
Unfortunately, I could not find an automatic way, because you need,both, condition on values and condition on locations, which would need, preferably, a Python or Lua script
However, here is, below a possible work-around, which produce correct results !
So, assuming the original sample text, below :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (00)</a></li> <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (57)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (20)</a></li> <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (21)</a></li> <li><a href="mmm.html" title="mmm">mmm (34)</a></li> <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... <li><a href="ooo.html" title="ooo">ooo (22)</a></li> <li><a href="ppp.html" title="ppp">ppp (99)</a></li> <li><a href="qqq.html" title="qqq">qqq (15)</a></li> <li><a href="rrr.html" title="rrr">rrr (23)</a></li> <li><a href="sss.html" title="sss">sss (24)</a></li> <li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... <li><a href="uuu.html" title="uuu">uuu (07)</a></li> <li><a href="vvv.html" title="vvv">vvv (13)</a></li> <li><a href="www.html" title="www">www (21)</a></li> <li><a href="xxx.html" title="xxx">xxx (15)</a></li> <li><a href="yyy.html" title="yyy">yyy (23)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...
I thought, to begin with, to prefix any line of these
6-lines
blocks, with their corresponding default values, with a regex S/R ( I used the#
symbol as a separator, which, I hope, does not exist, yet, in your file ! )SEARCH
(?-s)^(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)
REPLACE
22#${1}21#${2}13#${3}23#${4}24#${5}15#${6}
So, we get the following text :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) 22#<li><a href="xxx.html" title="xxx">xxx (22)</a></li> 21#<li><a href="yyy.html" title="yyy">yyy (21)</a></li> 13#<li><a href="zzz.html" title="zzz">zzz (13)</a></li> 23#<li><a href="xxx.html" title="xxx">xxx (23)</a></li> 24#<li><a href="yyy.html" title="yyy">yyy (24)</a></li> 15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... 22#<li><a href="ccc.html" title="ccc">ccc (22)</a></li> 21#<li><a href="ddd.html" title="ddd">ddd (21)</a></li> 13#<li><a href="eee.html" title="eee">eee (00)</a></li> 23#<li><a href="fff.html" title="fff">fff (23)</a></li> 24#<li><a href="ggg.html" title="ggg">ggg (24)</a></li> 15#<li><a href="hhh.html" title="hhh">hhh (57)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... 22#<li><a href="iii.html" title="iii">iii (20)</a></li> 21#<li><a href="jjj.html" title="jjj">jjj (21)</a></li> 13#<li><a href="kkk.html" title="kkk">kkk (13)</a></li> 23#<li><a href="lll.html" title="lll">lll (21)</a></li> 24#<li><a href="mmm.html" title="mmm">mmm (34)</a></li> 15#<li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... 22#<li><a href="ooo.html" title="ooo">ooo (22)</a></li> 21#<li><a href="ppp.html" title="ppp">ppp (99)</a></li> 13#<li><a href="qqq.html" title="qqq">qqq (15)</a></li> 23#<li><a href="rrr.html" title="rrr">rrr (23)</a></li> 24#<li><a href="sss.html" title="sss">sss (24)</a></li> 15#<li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... 22#<li><a href="uuu.html" title="uuu">uuu (07)</a></li> 21#<li><a href="vvv.html" title="vvv">vvv (13)</a></li> 13#<li><a href="www.html" title="www">www (21)</a></li> 23#<li><a href="xxx.html" title="xxx">xxx (15)</a></li> 24#<li><a href="yyy.html" title="yyy">yyy (23)</a></li> 15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...
Now, it’s obvious that the simple regex
^(.+)#(?!.+\(\1\)).+
, will match any line with a number, between parentheses, different from the number, at beginning of current line, located before the#
separator !
If you prefer to replace all the erroneous values with the right ones, you may use the following regex S/R
SEARCH
^(.+)#(?!.+\(\1\))(.+\().+(\).+)|^.+#
REPLACE
\2\1\3
And, of course, you’ll get the different blocks, with the identical default values between parentheses :
This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 ) <li><a href="xxx.html" title="xxx">xxx (22)</a></li> <li><a href="yyy.html" title="yyy">yyy (21)</a></li> <li><a href="zzz.html" title="zzz">zzz (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ... A 2nd BLOCK of 6 lines, downwards : ... <li><a href="ccc.html" title="ccc">ccc (22)</a></li> <li><a href="ddd.html" title="ddd">ddd (21)</a></li> <li><a href="eee.html" title="eee">eee (13)</a></li> <li><a href="fff.html" title="fff">fff (23)</a></li> <li><a href="ggg.html" title="ggg">ggg (24)</a></li> <li><a href="hhh.html" title="hhh">hhh (15)</a></li> ... A 3rd BLOCK of 6 lines, downwards : ... <li><a href="iii.html" title="iii">iii (22)</a></li> <li><a href="jjj.html" title="jjj">jjj (21)</a></li> <li><a href="kkk.html" title="kkk">kkk (13)</a></li> <li><a href="lll.html" title="lll">lll (23)</a></li> <li><a href="mmm.html" title="mmm">mmm (24)</a></li> <li><a href="nnn.html" title="nnn">nnn (15)</a></li> ... A 4th BLOCK of 6 lines, downwards : ... <li><a href="ooo.html" title="ooo">ooo (22)</a></li> <li><a href="ppp.html" title="ppp">ppp (21)</a></li> <li><a href="qqq.html" title="qqq">qqq (13)</a></li> <li><a href="rrr.html" title="rrr">rrr (23)</a></li> <li><a href="sss.html" title="sss">sss (24)</a></li> <li><a href="ttt.html" title="ttt">ttt (15)</a></li> ... A 5th BLOCK of 6 lines, downwards : ... <li><a href="uuu.html" title="uuu">uuu (22)</a></li> <li><a href="vvv.html" title="vvv">vvv (21)</a></li> <li><a href="www.html" title="www">www (13)</a></li> <li><a href="xxx.html" title="xxx">xxx (23)</a></li> <li><a href="yyy.html" title="yyy">yyy (24)</a></li> <li><a href="zzz.html" title="zzz">zzz (15)</a></li> ...
Cheers,
guy038
-
thanks guy, your solution is ok, but complex. I just found another solution.
<li><a href=".*\.html" title=".*">.* (?:(?!\b(22|9|15|23|4|15)\b).)*<\/a><\/li>$
Check this out: https://regex101.com/r/vRXKWj/4/