Community
    • Login

    regex: compare lines and find out different numbers

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 2 Posters 3.3k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV Offline
      Vasile Caraus
      last edited by Vasile Caraus

      hello. I have this lines on my html pages (1978 pages).

      xxx (22)</a>
      yyy (21)</a>
      zzz (13)</a>

      Somewhere in my files, there is a mistake, but I’m not sure where at thos final numbers. Some numbers are may be different in some pages, but I need the be the same in all pages.

      Can anyone give me an idea how to find the html pages that contains a line that has other numbers then all above?

      1 Reply Last reply Reply Quote 1
      • guy038G Offline
        guy038
        last edited by guy038

        Hello, Vasile Caraus and All,

        I, certainly, miss something obvious, because regexes to achieve what you want, do not seem that difficult ;-))

        For instance, the regex :

        (?-s)^.{4}\((?!(22|21|13))\).+ would match all the contents of any line, which has :

        • An opening round parenthesis, at position 5

        • Then a two-digits number, different of, either, 22, 21 and 13

        • And, finally, an ending round parenthesis, at position 8


        On the other hand, the regex :

        (?-s)^.{4}\((?!(22|21|13))\K..(?=\)) would match any two-digits number,enclosed in parentheses, which is different from, either, the values 22, 21 and 13

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 3
        • Vasile CarausV Offline
          Vasile Caraus
          last edited by

          hello Guy, but in the case I have something like this, what will be the formula?

          Please see this link, akismet do not let me write the code

          https://regex101.com/r/vRXKWj/3/

          So, the order of the numbers must be taken into account, and regex formula should match only where the number does not match the default number.

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hi, Vasile Caraus and All,

            I don’t understand, exactly, what you need ! Please, you should provide best explanations ! So, just one assumption :

            Given the text, below :

            
            # A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 )
            
            <li><a href="xxx.html" title="xxx">xxx (22)</a></li>
            <li><a href="yyy.html" title="yyy">yyy (21)</a></li>
            <li><a href="zzz.html" title="zzz">zzz (13)</a></li>
            <li><a href="xxx.html" title="xxx">xxx (23)</a></li>
            <li><a href="yyy.html" title="yyy">yyy (24)</a></li>
            <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
            ...
            ...
            ...
            # Then an other BLOCK of 6 lines, downwards :
            ...
            <li><a href="ccc.html" title="ccc">ccc (22)</a></li>
            <li><a href="ddd.html" title="ddd">ddd (21)</a></li>
            <li><a href="eee.html" title="eee">eee (00)</a></li>  <==  A
            <li><a href="fff.html" title="fff">fff (23)</a></li>
            <li><a href="ggg.html" title="ggg">ggg (24)</a></li>
            <li><a href="hhh.html" title="hhh">hhh (57)</a></li>  <==  B
            ...
            ...
            ...
            And a last BLOCK of 6 lines, downwards :
            ...
            <li><a href="iii.html" title="iii">iii (20)</a></li>  <==  C
            <li><a href="jjj.html" title="jjj">jjj (21)</a></li>
            <li><a href="kkk.html" title="kkk">kkk (13)</a></li>
            <li><a href="lll.html" title="lll">lll (33)</a></li>  <==  D
            <li><a href="mmm.html" title="mmm">mmm (34)</a></li>  <==  E
            <li><a href="nnn.html" title="nnn">nnn (15)</a></li>
            ...
            ...
            ...
            

            You would like that the 5 lines, from A to E , would be matched by the regex engine, because their numbers do not correspond to the default numbers, respectively to their location in each block ! Wouldn’t you ?

            See you later

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • Vasile CarausV Offline
              Vasile Caraus
              last edited by

              yes. That’s right

              1 Reply Last reply Reply Quote 0
              • guy038G Offline
                guy038
                last edited by guy038

                Hi, @vasile-caraus,

                OK ! But what do you expect when the 5 lines, from A to E , are found ?

                Do you want that a regex S/R replaces the erroneous values, of the second and third block, with their corresponding default values, of the first block of 6 lines ?

                BR

                guy038

                1 Reply Last reply Reply Quote 0
                • Vasile CarausV Offline
                  Vasile Caraus
                  last edited by

                  in case the numbers are different, regex should match only those lines. Other case, nothing to find.

                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by guy038

                    Hello, @vasile-caraus, and All,

                    I already did numerous tests, but I’m still not satisfied ! Let’s carry on our discussion :

                    Assuming the text, below :

                    ....
                    
                    # A BLOCK of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 )
                    ...
                    <li><a href="xxx.html" title="xxx">xxx (22)</a></li>
                    <li><a href="yyy.html" title="yyy">yyy (21)</a></li>
                    <li><a href="zzz.html" title="zzz">zzz (13)</a></li>
                    <li><a href="xxx.html" title="xxx">xxx (23)</a></li>
                    <li><a href="yyy.html" title="yyy">yyy (24)</a></li>
                    <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                    ...
                    # Then, an other BLOCK of 6 lines, downwards :
                    ...
                    <li><a href="ccc.html" title="ccc">ccc (22)</a></li>
                    <li><a href="ddd.html" title="ddd">ddd (21)</a></li>
                    <li><a href="eee.html" title="eee">eee (00)</a></li>  <==  F
                    <li><a href="fff.html" title="fff">fff (23)</a></li>
                    <li><a href="ggg.html" title="ggg">ggg (24)</a></li>
                    <li><a href="hhh.html" title="hhh">hhh (21)</a></li>  <==  G
                    ...
                    

                    Do you want that the regex matches all line contents, when :

                    • Only case F, where the value is different from any of the 6 default values

                    • Both cases F and G which has the value 21, corresponding to the second line of the default block, above, and not the sixth !

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Vasile CarausV Offline
                      Vasile Caraus
                      last edited by

                      Both, F and G.

                      Something like this with default numbers <li><a href=*.html" title=.*(?!\b(22|21|13|23|24|25\b).)* And, if F and G are not the same number on my default numbers, regex should match that line.

                      1 Reply Last reply Reply Quote 0
                      • guy038G Offline
                        guy038
                        last edited by guy038

                        Hello, @vasile-caraus, and All;

                        Unfortunately, I could not find an automatic way, because you need,both, condition on values and condition on locations, which would need, preferably, a Python or Lua script

                        However, here is, below a possible work-around, which produce correct results !

                        So, assuming the original sample text, below :

                         This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 )
                        
                        <li><a href="xxx.html" title="xxx">xxx (22)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (21)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (13)</a></li>
                        <li><a href="xxx.html" title="xxx">xxx (23)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (24)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                         A 2nd BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="ccc.html" title="ccc">ccc (22)</a></li>
                        <li><a href="ddd.html" title="ddd">ddd (21)</a></li>
                        <li><a href="eee.html" title="eee">eee (00)</a></li>
                        <li><a href="fff.html" title="fff">fff (23)</a></li>
                        <li><a href="ggg.html" title="ggg">ggg (24)</a></li>
                        <li><a href="hhh.html" title="hhh">hhh (57)</a></li>
                        ...
                         A 3rd BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="iii.html" title="iii">iii (20)</a></li>
                        <li><a href="jjj.html" title="jjj">jjj (21)</a></li>
                        <li><a href="kkk.html" title="kkk">kkk (13)</a></li>
                        <li><a href="lll.html" title="lll">lll (21)</a></li>
                        <li><a href="mmm.html" title="mmm">mmm (34)</a></li>
                        <li><a href="nnn.html" title="nnn">nnn (15)</a></li>
                        ...
                         A 4th BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="ooo.html" title="ooo">ooo (22)</a></li>
                        <li><a href="ppp.html" title="ppp">ppp (99)</a></li>
                        <li><a href="qqq.html" title="qqq">qqq (15)</a></li>
                        <li><a href="rrr.html" title="rrr">rrr (23)</a></li>
                        <li><a href="sss.html" title="sss">sss (24)</a></li>
                        <li><a href="ttt.html" title="ttt">ttt (15)</a></li>
                        ...
                        A 5th BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="uuu.html" title="uuu">uuu (07)</a></li>
                        <li><a href="vvv.html" title="vvv">vvv (13)</a></li>
                        <li><a href="www.html" title="www">www (21)</a></li>
                        <li><a href="xxx.html" title="xxx">xxx (15)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (23)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                        

                        I thought, to begin with, to prefix any line of these 6-lines blocks, with their corresponding default values, with a regex S/R ( I used the # symbol as a separator, which, I hope, does not exist, yet, in your file ! )

                        SEARCH (?-s)^(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)(<li>.+\R)

                        REPLACE 22#${1}21#${2}13#${3}23#${4}24#${5}15#${6}

                        So, we get the following text :

                         This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 )
                        
                        22#<li><a href="xxx.html" title="xxx">xxx (22)</a></li>
                        21#<li><a href="yyy.html" title="yyy">yyy (21)</a></li>
                        13#<li><a href="zzz.html" title="zzz">zzz (13)</a></li>
                        23#<li><a href="xxx.html" title="xxx">xxx (23)</a></li>
                        24#<li><a href="yyy.html" title="yyy">yyy (24)</a></li>
                        15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                         A 2nd BLOCK of 6 lines, downwards :
                        ...
                        22#<li><a href="ccc.html" title="ccc">ccc (22)</a></li>
                        21#<li><a href="ddd.html" title="ddd">ddd (21)</a></li>
                        13#<li><a href="eee.html" title="eee">eee (00)</a></li>
                        23#<li><a href="fff.html" title="fff">fff (23)</a></li>
                        24#<li><a href="ggg.html" title="ggg">ggg (24)</a></li>
                        15#<li><a href="hhh.html" title="hhh">hhh (57)</a></li>
                        ...
                         A 3rd BLOCK of 6 lines, downwards :
                        ...
                        22#<li><a href="iii.html" title="iii">iii (20)</a></li>
                        21#<li><a href="jjj.html" title="jjj">jjj (21)</a></li>
                        13#<li><a href="kkk.html" title="kkk">kkk (13)</a></li>
                        23#<li><a href="lll.html" title="lll">lll (21)</a></li>
                        24#<li><a href="mmm.html" title="mmm">mmm (34)</a></li>
                        15#<li><a href="nnn.html" title="nnn">nnn (15)</a></li>
                        ...
                         A 4th BLOCK of 6 lines, downwards :
                        ...
                        22#<li><a href="ooo.html" title="ooo">ooo (22)</a></li>
                        21#<li><a href="ppp.html" title="ppp">ppp (99)</a></li>
                        13#<li><a href="qqq.html" title="qqq">qqq (15)</a></li>
                        23#<li><a href="rrr.html" title="rrr">rrr (23)</a></li>
                        24#<li><a href="sss.html" title="sss">sss (24)</a></li>
                        15#<li><a href="ttt.html" title="ttt">ttt (15)</a></li>
                        ...
                        A 5th BLOCK of 6 lines, downwards :
                        ...
                        22#<li><a href="uuu.html" title="uuu">uuu (07)</a></li>
                        21#<li><a href="vvv.html" title="vvv">vvv (13)</a></li>
                        13#<li><a href="www.html" title="www">www (21)</a></li>
                        23#<li><a href="xxx.html" title="xxx">xxx (15)</a></li>
                        24#<li><a href="yyy.html" title="yyy">yyy (23)</a></li>
                        15#<li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                        

                        Now, it’s obvious that the simple regex ^(.+)#(?!.+\(\1\)).+, will match any line with a number, between parentheses, different from the number, at beginning of current line, located before the # separator !


                        If you prefer to replace all the erroneous values with the right ones, you may use the following regex S/R

                        SEARCH ^(.+)#(?!.+\(\1\))(.+\().+(\).+)|^.+#

                        REPLACE \2\1\3

                        And, of course, you’ll get the different blocks, with the identical default values between parentheses :

                         This is the CORRECT block of 6 lines, with 6 DEFAULT values ( 22, 21, 13, 23, 24 and 15 )
                        
                        <li><a href="xxx.html" title="xxx">xxx (22)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (21)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (13)</a></li>
                        <li><a href="xxx.html" title="xxx">xxx (23)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (24)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                         A 2nd BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="ccc.html" title="ccc">ccc (22)</a></li>
                        <li><a href="ddd.html" title="ddd">ddd (21)</a></li>
                        <li><a href="eee.html" title="eee">eee (13)</a></li>
                        <li><a href="fff.html" title="fff">fff (23)</a></li>
                        <li><a href="ggg.html" title="ggg">ggg (24)</a></li>
                        <li><a href="hhh.html" title="hhh">hhh (15)</a></li>
                        ...
                         A 3rd BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="iii.html" title="iii">iii (22)</a></li>
                        <li><a href="jjj.html" title="jjj">jjj (21)</a></li>
                        <li><a href="kkk.html" title="kkk">kkk (13)</a></li>
                        <li><a href="lll.html" title="lll">lll (23)</a></li>
                        <li><a href="mmm.html" title="mmm">mmm (24)</a></li>
                        <li><a href="nnn.html" title="nnn">nnn (15)</a></li>
                        ...
                         A 4th BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="ooo.html" title="ooo">ooo (22)</a></li>
                        <li><a href="ppp.html" title="ppp">ppp (21)</a></li>
                        <li><a href="qqq.html" title="qqq">qqq (13)</a></li>
                        <li><a href="rrr.html" title="rrr">rrr (23)</a></li>
                        <li><a href="sss.html" title="sss">sss (24)</a></li>
                        <li><a href="ttt.html" title="ttt">ttt (15)</a></li>
                        ...
                        A 5th BLOCK of 6 lines, downwards :
                        ...
                        <li><a href="uuu.html" title="uuu">uuu (22)</a></li>
                        <li><a href="vvv.html" title="vvv">vvv (21)</a></li>
                        <li><a href="www.html" title="www">www (13)</a></li>
                        <li><a href="xxx.html" title="xxx">xxx (23)</a></li>
                        <li><a href="yyy.html" title="yyy">yyy (24)</a></li>
                        <li><a href="zzz.html" title="zzz">zzz (15)</a></li>
                        ...
                        

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • Vasile CarausV Offline
                          Vasile Caraus
                          last edited by

                          thanks guy, your solution is ok, but complex. I just found another solution.

                          <li><a href=".*\.html" title=".*">.* (?:(?!\b(22|9|15|23|4|15)\b).)*<\/a><\/li>$

                          Check this out: https://regex101.com/r/vRXKWj/4/

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors