Community
    • Login

    Regex tests of the build 618 of the 'ComparePlus' plugin

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    12 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pnedevP
      pnedev @guy038
      last edited by

      @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

      Are these versions still OK or an upgrade is necessary ?

      Sorry, forgot about that question.
      Those are OK to use, no problem but I would advise you to use the latest ones from the current ComparePlus dev.

      BR

      1 Reply Last reply Reply Quote 1
      • guy038G
        guy038
        last edited by

        Hi, @pnedev,

        I’ve already had a quick overview of the questions / propositions, provided in your post. I’ll will go on, tomorrow and answer you shortly !

        BR

        guy038

        1 Reply Last reply Reply Quote 2
        • YaronY
          Yaron @pnedev
          last edited by

          Hello Pavel and Guy,

          @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

          First, un-check this option

          Secondly, check again the option to get the pop-up window and, then, modify the regex !

          Good point.

          @pnedev said in Regex tests of the build 618 of the 'ComparePlus' plugin:

          On the other hand I could easily change the behavior so on every Ignore regex click the regex edit dialog appears and it is disabled if you click Cancel or it will have also an enable switch in the dialog.

          This seems to be the best solution.
          How about “Apply” instead of “OK” to enable?

          Thank you.

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hi, @pnedev, @yaron and All,

            Pavel, I better see why you prefer to keep this option in the main menu. Indeed, it rather acts as a switch and is correctly part of the section containing all the other Ignore options !

            Now, if you’re going to do a comparison, based on a regex, it’s very likely that you won’t get the right regex, at the first try, isn’t it ?

            So, my idea, about the behavior of the Ignore Regex... option, is :

            • Whatever the Ignore Regex... option, in the main menu, is checked or not, a left mouse click on this option will always open the ComparePlus Ignore regex window

            • That ComparePlus Ignore regex window would have two buttons Enable and Disable :

              • If the Ignore Regex... option, in main menu, is presently disabled ( no check mark ) :

                • A left mouse click on the Enable button would valid the regex AND enable the Ignore Regex.., with its check mark

                • A left mouse click on the Disable button would still valid the regex BUT would keep the Ignore Regex.. option disabled, with no check mark

              • If the Ignore Regex... option, in main menu, is presently enabled ( check mark ) :

                • A left mouse click on the Enable button would valid the regex AND keep the Ignore Regex.. enabled, with its check mark

                • A left mouse click on the Disable button would still valid the regex BUT would disable the Ignore Regex.. option, with no check mark


            Regarding the regex engine to use with the ComparePlus plugin, I would say that the present C++ ECMAScript implementation seems the best of all the others that you provided. Indeed, all the others ( Basic POSIX, Extended POSIX, Awk, Grep and Egrep ), for instance, do not recognize the look-arounds feature !

            However, the current C++ ECMAScript regex library is not the best one, too. For instance, it does not handle the look-behind feature ! I tried your build 618 with the two regexes ^.+(?=X) and (?<=3).+. The second regex, with the look-behind, didn’t work and gave the error window :

            PluginManager:errorPluginCommand Exception
            regex_error(error_syntax)
            

            To verify my assertion, refer to this site :

            https://cplusplus.com/reference/regex/ECMAScript/


            Now, Pavel, don’t be annoyed about it. Just wait and see if some users need more regex features in order to create the appropriate ignore regex !

            Of course, later, why not use the powerful Boost regex library, already embedded in Notepad++ itself ?

            A good one, as well, would be the .NET regex library. Refer to the Microsoft site :

            https://docs.microsoft.com/fr-fr/dotnet/standard/base-types/regular-expression-language-quick-reference

            https://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-A63F-AEB6DA0B921C/Regular expressions quick reference.pdf

            But I do understand that the change of the regex library, within your plugin, may not be so easy to implement ! It’s up to you to go on that way !


            Now, regarding the way to compose an ignore regex, I don’t think, finally, that it would be very difficult ! To prove this, here is an example with two small CSV files, containing nine fields :

            • Test_1.txt
            abc,123,456,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,012
            
            xyz,123,456,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,012
            
            xyz,123,456,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,999
            
            xyz,123,456,def,fgh,ijk,789,xyz,999
            
            • Text_2.txt
            abc,123,456,fgh,def,ijk,789,xyz,012
            
            xyz,123,456,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,999
            
            xyz,123,000,def,fgh,ijk,789,xyz,012
            
            abc,123,456,def,fgh,ijk,789,xyz,012
            
            xyz,123,def,456,fgh,ijk,789,xyz,012
            
            abc,123,456,def,ijk,fgh,789,xyz,999
            
            xyz,000,456,def,fgh,ijk,000,xyz,999
            
            • First, I would like to mention that, if we delete all the blank lines, in test_1.txt and test_2.txt, it’s really not easy to get an idea of the comparison process, as it considers added and removed lines as well as changed lines ! Thanks to the blank lines, we get only some changed lines !

            • Secondly, for a clean view of all the changes, I disabled the Detect Moves option for this test


            From this point, I did some tests with different regexes, typed in the Ignore Regex... option

            • By default, if the Ignore Regex... option is disabled, all the lines are totally compared :

              • So, the lines 1, 3, 5, 7, 11, 13 and 15 are changed between the two files
            • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){2}, the comparison ignores the first two fields, in each file :

              • So, the lines 5, 15 and 1 7, 11, 13, only, are changed between the two files

            • When the Ignore Regex... option contains the regex (,[^,\r\n]+?){3}$, the comparison ignores the last three fields, in each file :

              • So, the lines 3, 15 and 1 7, 11, 13, only, are changed between the two files

            • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){2}|(,[^,\r\n]+?){3}$, the comparison ignores the first two fields OR the last three fields, in each file. Thus, the comparison take in account everything which is not the first two AND not the last three :

              • So, the lines 1 7, 11 and 13, only, are changed between the two files

            • Now, if we would like to ignore the middle four fields, in each file :

              • We cannot use look-behinds, because both, the look-behind would have a non-fixed length and also because it’s not allowed with the present C++ ECMAScript regex library of the ComparePlus plugin

              • We cannot use the \K feature, too. Actually, the regex ^([^,\r\n]+?,){2}\K([^,\r\n]+?,){4} does not work at all and is simply equivalent to the default comparison, without any Ignore Regex... option !

            However, Pavel, there is still a solution which uses a look-ahead ;-))

            • When the Ignore Regex... option contains the regex (,[^,\r\n]+?){4}(?=(,[^,\r\n]+?){3}$), the comparison ignores the four fields, IF they are followed with the last three fields :

              • So, the lines 3, 5 and 15, only, are changed between the two files

            You could say : what happens if we choose an Ignore Regex... which represents the totality of each line ? In this specific case, the range to compare becomes the empty range of each line !!

            • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){8}[^,\r\n]+$, the comparison ignore all lines contents, in each file :

              • So, the dialog Files 'test_1.txt' and 'test_2.txt' match / Close compared files? occurs ( Logical ! )

            As you can see, Pavel, no need to worry : you’ll probably never have to change the regex engine, within the ComparePlus plugin ! There is always a valid regex solution to use

            Best Regards,

            guy038

            P.S. :

            • When the Ignore Regex... option contains the regex ^([^,\r\n]+?,){8}[^,\r\n]+\R, it does not work too and give the default compare results ( without the Ignore Regex... option ). The \R syntax seems forbidden, too

            • I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex... option remains checked !

            pnedevP 1 Reply Last reply Reply Quote 2
            • pnedevP
              pnedev @guy038
              last edited by pnedev

              @guy038 ,

              Thank you very much for your excellent and thorough analysis and feedback, it is much appreciated.

              @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

              So, my idea, about the behavior of the Ignore Regex… option, is : …

              Exactly what I meant as improvement based on your previous post with one exception:
              Why on Disable should we remember the entered regex?
              I thought I should disregard the entered regex in that case although it doesn’t really matter. It just seems counter-intuitive to me.
              If people prefer it that way I’m OK with it.

              Thank you for the info regarding different regex engines. I’ll keep standard C++ library ECMAScript then. If in the future a need for more sophisticated engine (Boost for example) arises then I’ll consider implementing it.

              About the excellent CSV test example… honestly your regex entries look like magic to me :) I really don’t have enough knowledge in that field. What I can say is that I’m really glad that with enough know-how one could have so many possibilities.
              And you are definitely a virtuoso!

              @guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:

              I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex… option remains checked !

              That is something I should look into but since the regex entry is not validated prior the comparison itself the behavior now is what it is. I’ll see what I can do, thanks.

              BR

              P.S. I thought that it is good to mention here that the Ignore Regex is implemented to be on a line-by-line basis. I’m ‘telling’ that because I saw in you regex entries the ‘\r\n’ sequence that reminds me of a line-end check.

              1 Reply Last reply Reply Quote 3
              • guy038G
                guy038
                last edited by guy038

                Hi, @pnedev and All,

                In my last post, I said , at the end :

                • I also noticed that, when you type in a non-valid regex in the ComparPlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex... option remains checked !

                I was totally wrong about it :-((. I did additional tests and, for instance :

                • The valid regex ^([^,\r\n]+?,){8}[^,\r\n]+\R, matching all line contents, leads to the dialog Files 'test_1.txt' and 'test_2.txt' match / Close compared files? ( Logical )

                • The invalid regex ^(([^,\r\n]+?,){8}[^,\r\n]+\R, containing one more opening parenthese, near the beginning of the regex, is correctly detected and outputs the error window :

                PluginManager:errorPluginCommand Exception
                regex_error(error_paren): The expression contains mismatched ( and ).
                
                • But the valid regex ^0([^,\r\n]+?,){8}[^,\r\n]+\R, which cannot be found, in any line of Test_1.txt and Test_2.txt, of course, means that the default comparison is run, with the Ignore Regex... option still checked ! It’s the normal behavior and we cannot do anything about it ;-))

                Best Regards,

                guy038

                pnedevP 1 Reply Last reply Reply Quote 2
                • pnedevP
                  pnedev @guy038
                  last edited by

                  @guy038 ,

                  Thanks for the clarification.
                  I’ll do some changes and write back.

                  BR

                  1 Reply Last reply Reply Quote 2
                  • pnedevP
                    pnedev
                    last edited by

                    Hello @guy038 ,

                    Could you please try briefly build https://ci.appveyor.com/project/pnedev/compare-plugin/builds/44268759 ?

                    It has implemented the Enable / Disable behavior we discussed (on Disable the entered regex value is not saved because it doesn’t seem intuitive to me) and also checks the validity of the regex on Enable .

                    Thank you.

                    BR

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @panedev and All,

                      Wonderful !! It works fine ;-))

                      So, given my previous example with the two CSV test file, for instance :

                      • I click on the Ignore Regex... option, not checked, => I get the ComparePlus Ignore Regex window which is empty

                      • I type in the regex ^([^,\r\n]+?,){2} which should ignore the first two fields of these CSV files

                      • I click on the Enable button => The ComparPlus Ignore Regex window disappears, and the Ignore Regex... option is now checked

                      • I run the comparison and, as expected, no orange highlighting can be observed in the first two fields of each file

                      Then :

                      • I click again on the Ignore Regex... option, which is checked => I get the ComparePlus Ignore Regex window, which kept the regex

                      • I click on the Disable button => The ComparPlus Ignore Regex window disappears, and the Ignore Regex... option is again not checked

                      • I run the comparison and, as expected, I get the default comparaison process ( with the Ignore Regex disabled). Refer line 3 in each file !


                      These two successive list of operations shows that we can easily compare the result of any Ignore Regex with the default comparison case !

                      The nice thing is that, from one call of the Ignore Regex... option to another call, you keep the current regex typed, making easy any regex modification with a further click on the Enable button

                      That’s what I meant when I wrongly spoke, in a previous post, of keeping the regex valid ! I wanted to say that the regex should stay in the entry field, in all cases !


                      As a summary, I would say that your new Ignore Regex... option is, from now on, fully functional, and will certainly help a lot of users ;-))

                      Best Regards,

                      guy038

                      pnedevP 1 Reply Last reply Reply Quote 3
                      • pnedevP
                        pnedev @guy038
                        last edited by

                        @guy038 ,

                        Thanks again for the feedback and for the help with the regex functionality.

                        BR

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors