Community
    • Login

    Regex tests of the build 618 of the 'ComparePlus' plugin

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    12 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @pnedev and All,

      Pavel, I downloaded your penultimate build 618 of the ComparePlus plugin in order to test the new Ignore Regex... option

      You decided that this option should belong to the main menu. I’m sorry to tell you that it’s not very practical to use :-( Indeed, if you already chose a regex, you need to :

      • First, un-check this option

      • Secondly, check again the option to get the pop-up window and, then, modify the regex !

      So, I think that it should be implemented as AlexVerschoot did in its build 506, i.e. this option is included in the Settings dialog, where you may check or uncheck this option and easily create or modify the regex !

      https://github.com/pnedev/compare-plugin/pull/230

      Note that the x64 version of the old build 506 does not work properly with last versions of N++, as the v8.4.4 - x64 release. However, with N++ v.8.1.9.2- x64, the build 506-x64 of the ComparePlus plugin seems functional !


      There’s a main difference, regarding the regex behavior, between you and Alex. You, both, use an opposite logic ! !He rather refers to the parts to consider, in each line, whereas you refer to the part to ignore, in each line

      I noticed that you, both, use the implicit modifier (?-is) and that you must not insert these modifiers at beginning of the regex, else an error occurs !

      Now, if we slighly simplify the AlexVerschoot’s regex as X[0-9]?[0-9], initially given at :

      https://github.com/pnedev/compare-plugin/pull/230

      The EQUIVALENT ignore regex, in your 618 build, should be ^.+(?=X)|^.+. I did try this regex with success. However, in this example, it’s obvious that the AlexVerschoot’s regex, focusing to parts to compare, seems easier to build than your ignore regex, focusing to parts to ignore !

      To test it, here are the two texts to compare :

      • Test_2.txt
      N001 X0
      N002 Y12
      N003 X8
      N004 Z8
      
      • Test_1.txt
      N001 X0
      N002 Y12
      N003 Y8
      N004 X6
      N005 Z8
      

      The two tabs are ordered Test_2.txt then Test_1.txt which is the current file, right before the comparison process


      All these elements made me think about a nice improvement that you could easily manage :

      • You would get rid of the option Ignore regex... in the main menu

      AND

      • In the Settings dialog, you would add these three lines :

        • ☐ Enable regex comparison

        •     ◎ Match      ◎ Ignore

        • The Input field, to type in the regex

      Following this way, you would get the best :

      • If the Match box is ticked, the regex would refer to the parts to consider, during comparison, for each line ( Alex behavior )

      • If the Ignore box is ticked, the regex would refer to the part to ignore, during comparison, for each line ( Your behavior )

      As the two options ◎ are mutually exclusive, I suppose that it should not be a problem to code ?


      Of course, these are just suggestions ! Good luck, Pavel, for your final developement time !

      Best Regards,

      guy038

      P.S. :

      Pavel, in the ComparePlus sub-folder, I, presently, have these two DLL :

      • libgit2.dll ( v0.24.3.0 )

      • sqlite3.dll ( v3.15.0.0 )

      Are these versions still OK or an upgrade is necessary ?

      Notepad++ v8.4.4   (64-bit)
      Build time : Jul 15 2022 - 17:54:42
      Path : E:\844_x64\notepad++.exe
      Command Line : 
      Admin mode : OFF
      Local Conf mode : ON
      Cloud Config : OFF
      OS Name : Windows 10 Pro (64-bit) 
      OS Version : 21H2
      OS Build : 19044.1826
      Current ANSI codepage : 1252
      Plugins : 
          mimeTools (2.8)
          NppConverter (4.4)
          NppExport (0.4)
          ComparePlus (1)
      
      pnedevP 1 Reply Last reply Reply Quote 5
      • pnedevP
        pnedev
        last edited by

        Hello @guy038 ,

        Thank you very much for your feedback on the new Ignore regex... functionality and for the extensive analysis “Guy’s style” ;)
        I highly appreciate your suggestions and deep knowledge in the field of Regular expressions.

        You are completely right of course about the opposite logic between my implementation and Alex’s. This is also what I mentioned in my reply to his Compare plugin’s PR.

        The reasoning behind my choice to implement Ignore regex
        instead of Match regex is to keep the consistency in the plugins menu (with all the different Ignore options). Besides, IMO in most “regex” cases the ignored part should be “smaller” and something more simple like for example in some log files to filter the timestamp in the beginning or some line numbers or to drop some starting columns from CSV files. I think this should be much more common case than other specific ones but I might as well be wrong. Do you think such ignore regexes would be difficult to compose?

        I would really like to keep Ignore regex... in the menu and not move it to Settings... dialog partly because of the consistency in the Ignore menu and also because this possibility should be more easily visible and more accessible (and also not “polluted” by other rarely changed settings). It is kind-of more ‘functional’ switch than the more permanent Settings.
        On the other hand I could easily change the behavior so on every Ignore regex click the regex edit dialog appears and it is disabled if you click Cancel or it will have also an enable switch in the dialog. But again, it shouldn’t be changed that often to be that cumbersome.
        Please tell me what you think about that.

        I will not have the time to implement your suggestion of Ignore vs. Match regex setting so I’ll keep it only as Ignore for now. Maybe in the future versions of ComparePlus I will add such setting.

        I would also like to ask you as a regex expert what regex type is better to be used in ComparePlus. The C++ library variants are ECMAScript (the currently used one), Basic POSIX, Extended POSIX, Awk, Grep or Egrep.

        Thank you once again.

        BR

        YaronY 1 Reply Last reply Reply Quote 4
        • pnedevP
          pnedev @guy038
          last edited by

          @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

          Are these versions still OK or an upgrade is necessary ?

          Sorry, forgot about that question.
          Those are OK to use, no problem but I would advise you to use the latest ones from the current ComparePlus dev.

          BR

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by

            Hi, @pnedev,

            I’ve already had a quick overview of the questions / propositions, provided in your post. I’ll will go on, tomorrow and answer you shortly !

            BR

            guy038

            1 Reply Last reply Reply Quote 2
            • YaronY
              Yaron @pnedev
              last edited by

              Hello Pavel and Guy,

              @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

              First, un-check this option

              Secondly, check again the option to get the pop-up window and, then, modify the regex !

              Good point.

              @pnedev said in Regex tests of the build 618 of the 'ComparePlus' plugin:

              On the other hand I could easily change the behavior so on every Ignore regex click the regex edit dialog appears and it is disabled if you click Cancel or it will have also an enable switch in the dialog.

              This seems to be the best solution.
              How about “Apply” instead of “OK” to enable?

              Thank you.

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hi, @pnedev, @yaron and All,

                Pavel, I better see why you prefer to keep this option in the main menu. Indeed, it rather acts as a switch and is correctly part of the section containing all the other Ignore options !

                Now, if you’re going to do a comparison, based on a regex, it’s very likely that you won’t get the right regex, at the first try, isn’t it ?

                So, my idea, about the behavior of the Ignore Regex... option, is :

                • Whatever the Ignore Regex... option, in the main menu, is checked or not, a left mouse click on this option will always open the ComparePlus Ignore regex window

                • That ComparePlus Ignore regex window would have two buttons Enable and Disable :

                  • If the Ignore Regex... option, in main menu, is presently disabled ( no check mark ) :

                    • A left mouse click on the Enable button would valid the regex AND enable the Ignore Regex.., with its check mark

                    • A left mouse click on the Disable button would still valid the regex BUT would keep the Ignore Regex.. option disabled, with no check mark

                  • If the Ignore Regex... option, in main menu, is presently enabled ( check mark ) :

                    • A left mouse click on the Enable button would valid the regex AND keep the Ignore Regex.. enabled, with its check mark

                    • A left mouse click on the Disable button would still valid the regex BUT would disable the Ignore Regex.. option, with no check mark


                Regarding the regex engine to use with the ComparePlus plugin, I would say that the present C++ ECMAScript implementation seems the best of all the others that you provided. Indeed, all the others ( Basic POSIX, Extended POSIX, Awk, Grep and Egrep ), for instance, do not recognize the look-arounds feature !

                However, the current C++ ECMAScript regex library is not the best one, too. For instance, it does not handle the look-behind feature ! I tried your build 618 with the two regexes ^.+(?=X) and (?<=3).+. The second regex, with the look-behind, didn’t work and gave the error window :

                PluginManager:errorPluginCommand Exception
                regex_error(error_syntax)
                

                To verify my assertion, refer to this site :

                https://cplusplus.com/reference/regex/ECMAScript/


                Now, Pavel, don’t be annoyed about it. Just wait and see if some users need more regex features in order to create the appropriate ignore regex !

                Of course, later, why not use the powerful Boost regex library, already embedded in Notepad++ itself ?

                A good one, as well, would be the .NET regex library. Refer to the Microsoft site :

                https://docs.microsoft.com/fr-fr/dotnet/standard/base-types/regular-expression-language-quick-reference

                https://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-A63F-AEB6DA0B921C/Regular expressions quick reference.pdf

                But I do understand that the change of the regex library, within your plugin, may not be so easy to implement ! It’s up to you to go on that way !


                Now, regarding the way to compose an ignore regex, I don’t think, finally, that it would be very difficult ! To prove this, here is an example with two small CSV files, containing nine fields :

                • Test_1.txt
                abc,123,456,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,012
                
                xyz,123,456,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,012
                
                xyz,123,456,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,999
                
                xyz,123,456,def,fgh,ijk,789,xyz,999
                
                • Text_2.txt
                abc,123,456,fgh,def,ijk,789,xyz,012
                
                xyz,123,456,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,999
                
                xyz,123,000,def,fgh,ijk,789,xyz,012
                
                abc,123,456,def,fgh,ijk,789,xyz,012
                
                xyz,123,def,456,fgh,ijk,789,xyz,012
                
                abc,123,456,def,ijk,fgh,789,xyz,999
                
                xyz,000,456,def,fgh,ijk,000,xyz,999
                
                • First, I would like to mention that, if we delete all the blank lines, in test_1.txt and test_2.txt, it’s really not easy to get an idea of the comparison process, as it considers added and removed lines as well as changed lines ! Thanks to the blank lines, we get only some changed lines !

                • Secondly, for a clean view of all the changes, I disabled the Detect Moves option for this test


                From this point, I did some tests with different regexes, typed in the Ignore Regex... option

                • By default, if the Ignore Regex... option is disabled, all the lines are totally compared :

                  • So, the lines 1, 3, 5, 7, 11, 13 and 15 are changed between the two files
                • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){2}, the comparison ignores the first two fields, in each file :

                  • So, the lines 5, 15 and 1 7, 11, 13, only, are changed between the two files

                • When the Ignore Regex... option contains the regex (,[^,\r\n]+?){3}$, the comparison ignores the last three fields, in each file :

                  • So, the lines 3, 15 and 1 7, 11, 13, only, are changed between the two files

                • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){2}|(,[^,\r\n]+?){3}$, the comparison ignores the first two fields OR the last three fields, in each file. Thus, the comparison take in account everything which is not the first two AND not the last three :

                  • So, the lines 1 7, 11 and 13, only, are changed between the two files

                • Now, if we would like to ignore the middle four fields, in each file :

                  • We cannot use look-behinds, because both, the look-behind would have a non-fixed length and also because it’s not allowed with the present C++ ECMAScript regex library of the ComparePlus plugin

                  • We cannot use the \K feature, too. Actually, the regex ^([^,\r\n]+?,){2}\K([^,\r\n]+?,){4} does not work at all and is simply equivalent to the default comparison, without any Ignore Regex... option !

                However, Pavel, there is still a solution which uses a look-ahead ;-))

                • When the Ignore Regex... option contains the regex (,[^,\r\n]+?){4}(?=(,[^,\r\n]+?){3}$), the comparison ignores the four fields, IF they are followed with the last three fields :

                  • So, the lines 3, 5 and 15, only, are changed between the two files

                You could say : what happens if we choose an Ignore Regex... which represents the totality of each line ? In this specific case, the range to compare becomes the empty range of each line !!

                • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){8}[^,\r\n]+$, the comparison ignore all lines contents, in each file :

                  • So, the dialog Files 'test_1.txt' and 'test_2.txt' match / Close compared files? occurs ( Logical ! )

                As you can see, Pavel, no need to worry : you’ll probably never have to change the regex engine, within the ComparePlus plugin ! There is always a valid regex solution to use

                Best Regards,

                guy038

                P.S. :

                • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){8}[^,\r\n]+\R, it does not work too and give the default compare results ( without the Ignore Regex... option ). The \R syntax seems forbidden, too

                • I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex... option remains checked !

                pnedevP 1 Reply Last reply Reply Quote 2
                • pnedevP
                  pnedev @guy038
                  last edited by pnedev

                  @guy038 ,

                  Thank you very much for your excellent and thorough analysis and feedback, it is much appreciated.

                  @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

                  So, my idea, about the behavior of the Ignore Regex… option, is : …

                  Exactly what I meant as improvement based on your previous post with one exception:
                  Why on Disable should we remember the entered regex?
                  I thought I should disregard the entered regex in that case although it doesn’t really matter. It just seems counter-intuitive to me.
                  If people prefer it that way I’m OK with it.

                  Thank you for the info regarding different regex engines. I’ll keep standard C++ library ECMAScript then. If in the future a need for more sophisticated engine (Boost for example) arises then I’ll consider implementing it.

                  About the excellent CSV test example… honestly your regex entries look like magic to me :) I really don’t have enough knowledge in that field. What I can say is that I’m really glad that with enough know-how one could have so many possibilities.
                  And you are definitely a virtuoso!

                  @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

                  I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex… option remains checked !

                  That is something I should look into but since the regex entry is not validated prior the comparison itself the behavior now is what it is. I’ll see what I can do, thanks.

                  BR

                  P.S. I thought that it is good to mention here that the Ignore Regex is implemented to be on a line-by-line basis. I’m ‘telling’ that because I saw in you regex entries the ‘\r\n’ sequence that reminds me of a line-end check.

                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @pnedev and All,

                    In my last post, I said , at the end :

                    • I also noticed that, when you type in a non-valid regex in the ComparPlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex... option remains checked !

                    I was totally wrong about it :-((. I did additional tests and, for instance :

                    • The valid regex ^([^,\r\n]+?,){8}[^,\r\n]+\R, matching all line contents, leads to the dialog Files 'test_1.txt' and 'test_2.txt' match / Close compared files? ( Logical )

                    • The invalid regex ^(([^,\r\n]+?,){8}[^,\r\n]+\R, containing one more opening parenthese, near the beginning of the regex, is correctly detected and outputs the error window :

                    PluginManager:errorPluginCommand Exception
                    regex_error(error_paren): The expression contains mismatched ( and ).
                    
                    • But the valid regex ^0([^,\r\n]+?,){8}[^,\r\n]+\R, which cannot be found, in any line of Test_1.txt and Test_2.txt, of course, means that the default comparison is run, with the Ignore Regex... option still checked ! It’s the normal behavior and we cannot do anything about it ;-))

                    Best Regards,

                    guy038

                    pnedevP 1 Reply Last reply Reply Quote 2
                    • pnedevP
                      pnedev @guy038
                      last edited by

                      @guy038 ,

                      Thanks for the clarification.
                      I’ll do some changes and write back.

                      BR

                      1 Reply Last reply Reply Quote 2
                      • pnedevP
                        pnedev
                        last edited by

                        Hello @guy038 ,

                        Could you please try briefly build https://ci.appveyor.com/project/pnedev/compare-plugin/builds/44268759 ?

                        It has implemented the Enable / Disable behavior we discussed (on Disable the entered regex value is not saved because it doesn’t seem intuitive to me) and also checks the validity of the regex on Enable .

                        Thank you.

                        BR

                        1 Reply Last reply Reply Quote 2
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @panedev and All,

                          Wonderful !! It works fine ;-))

                          So, given my previous example with the two CSV test file, for instance :

                          • I click on the Ignore Regex... option, not checked, => I get the ComparePlus Ignore Regex window which is empty

                          • I type in the regex ^([^,\r\n]+?,){2} which should ignore the first two fields of these CSV files

                          • I click on the Enable button => The ComparPlus Ignore Regex window disappears, and the Ignore Regex... option is now checked

                          • I run the comparison and, as expected, no orange highlighting can be observed in the first two fields of each file

                          Then :

                          • I click again on the Ignore Regex... option, which is checked => I get the ComparePlus Ignore Regex window, which kept the regex

                          • I click on the Disable button => The ComparPlus Ignore Regex window disappears, and the Ignore Regex... option is again not checked

                          • I run the comparison and, as expected, I get the default comparaison process ( with the Ignore Regex disabled). Refer line 3 in each file !


                          These two successive list of operations shows that we can easily compare the result of any Ignore Regex with the default comparison case !

                          The nice thing is that, from one call of the Ignore Regex... option to another call, you keep the current regex typed, making easy any regex modification with a further click on the Enable button

                          That’s what I meant when I wrongly spoke, in a previous post, of keeping the regex valid ! I wanted to say that the regex should stay in the entry field, in all cases !


                          As a summary, I would say that your new Ignore Regex... option is, from now on, fully functional, and will certainly help a lot of users ;-))

                          Best Regards,

                          guy038

                          pnedevP 1 Reply Last reply Reply Quote 3
                          • pnedevP
                            pnedev @guy038
                            last edited by

                            @guy038 ,

                            Thanks again for the feedback and for the help with the regex functionality.

                            BR

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors