Regex tests of the build 618 of the 'ComparePlus' plugin
-
Hello, @pnedev and All,
Pavel, I downloaded your penultimate build
618
of theComparePlus
plugin in order to test the newIgnore Regex...
optionYou decided that this option should belong to the main menu. I’m sorry to tell you that it’s not very practical to use :-( Indeed, if you already chose a regex, you need to :
-
First, un-check this option
-
Secondly, check again the option to get the pop-up window and, then, modify the regex !
So, I think that it should be implemented as AlexVerschoot did in its build
506
, i.e. this option is included in theSettings
dialog, where you may check or uncheck this option and easily create or modify the regex !https://github.com/pnedev/compare-plugin/pull/230
Note that the
x64
version of the old build506
does not work properly with last versions of N++, as thev8.4.4 - x64
release. However, with N++v.8.1.9.2- x64
, the build506-x64
of theComparePlus
plugin seems functional !
There’s a main difference, regarding the regex behavior, between you and Alex. You, both, use an opposite logic ! !He rather refers to the parts to consider, in each line, whereas you refer to the part to ignore, in each line
I noticed that you, both, use the implicit modifier
(?-is)
and that you must not insert these modifiers at beginning of the regex, else an error occurs !Now, if we slighly simplify the AlexVerschoot’s regex as
X[0-9]?[0-9]
, initially given at :https://github.com/pnedev/compare-plugin/pull/230
The EQUIVALENT ignore regex, in your
618
build, should be^.+(?=X)|^.+
. I did try this regex with success. However, in this example, it’s obvious that the AlexVerschoot’s regex, focusing to parts to compare, seems easier to build than your ignore regex, focusing to parts to ignore !To test it, here are the two texts to compare :
- Test_2.txt
N001 X0 N002 Y12 N003 X8 N004 Z8
- Test_1.txt
N001 X0 N002 Y12 N003 Y8 N004 X6 N005 Z8
The two tabs are ordered Test_2.txt then Test_1.txt which is the current file, right before the comparison process
All these elements made me think about a nice improvement that you could easily manage :
- You would get rid of the option
Ignore regex...
in the main menu
AND
-
In the
Settings
dialog, you would add these three lines :-
☐ Enable regex comparison
-
◎ Match ◎ Ignore
-
The Input field, to type in the regex
-
Following this way, you would get the best :
-
If the
Match
box is ticked, the regex would refer to the parts to consider, during comparison, for each line ( Alex behavior ) -
If the
Ignore
box is ticked, the regex would refer to the part to ignore, during comparison, for each line ( Your behavior )
As the two options
◎
are mutually exclusive, I suppose that it should not be a problem to code ?
Of course, these are just suggestions ! Good luck, Pavel, for your final developement time !
Best Regards,
guy038
P.S. :
Pavel, in the
ComparePlus
sub-folder, I, presently, have these two DLL :-
libgit2.dll
(v0.24.3.0
) -
sqlite3.dll
(v3.15.0.0
)
Are these versions still OK or an upgrade is necessary ?
Notepad++ v8.4.4 (64-bit) Build time : Jul 15 2022 - 17:54:42 Path : E:\844_x64\notepad++.exe Command Line : Admin mode : OFF Local Conf mode : ON Cloud Config : OFF OS Name : Windows 10 Pro (64-bit) OS Version : 21H2 OS Build : 19044.1826 Current ANSI codepage : 1252 Plugins : mimeTools (2.8) NppConverter (4.4) NppExport (0.4) ComparePlus (1)
-
-
Hello @guy038 ,
Thank you very much for your feedback on the new
Ignore regex...
functionality and for the extensive analysis “Guy’s style” ;)
I highly appreciate your suggestions and deep knowledge in the field of Regular expressions.You are completely right of course about the opposite logic between my implementation and Alex’s. This is also what I mentioned in my reply to his Compare plugin’s PR.
The reasoning behind my choice to implement
Ignore regex
instead ofMatch regex
is to keep the consistency in the plugins menu (with all the different Ignore options). Besides, IMO in most “regex” cases the ignored part should be “smaller” and something more simple like for example in some log files to filter the timestamp in the beginning or some line numbers or to drop some starting columns from CSV files. I think this should be much more common case than other specific ones but I might as well be wrong. Do you think such ignore regexes would be difficult to compose?I would really like to keep
Ignore regex...
in the menu and not move it toSettings...
dialog partly because of the consistency in the Ignore menu and also because this possibility should be more easily visible and more accessible (and also not “polluted” by other rarely changed settings). It is kind-of more ‘functional’ switch than the more permanent Settings.
On the other hand I could easily change the behavior so on everyIgnore regex
click the regex edit dialog appears and it is disabled if you clickCancel
or it will have also an enable switch in the dialog. But again, it shouldn’t be changed that often to be that cumbersome.
Please tell me what you think about that.I will not have the time to implement your suggestion of
Ignore
vs.Match
regex setting so I’ll keep it only asIgnore
for now. Maybe in the future versions of ComparePlus I will add such setting.I would also like to ask you as a regex expert what regex type is better to be used in ComparePlus. The C++ library variants are ECMAScript (the currently used one), Basic POSIX, Extended POSIX, Awk, Grep or Egrep.
Thank you once again.
BR
-
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
Are these versions still OK or an upgrade is necessary ?
Sorry, forgot about that question.
Those are OK to use, no problem but I would advise you to use the latest ones from the current ComparePlus dev.BR
-
Hi, @pnedev,
I’ve already had a quick overview of the questions / propositions, provided in your post. I’ll will go on, tomorrow and answer you shortly !
BR
guy038
-
Hello Pavel and Guy,
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
First, un-check this option
Secondly, check again the option to get the pop-up window and, then, modify the regex !
Good point.
@pnedev said in Regex tests of the build 618 of the 'ComparePlus' plugin:
On the other hand I could easily change the behavior so on every Ignore regex click the regex edit dialog appears and it is disabled if you click Cancel or it will have also an enable switch in the dialog.
This seems to be the best solution.
How about “Apply” instead of “OK” to enable?Thank you.
-
Pavel, I better see why you prefer to keep this option in the main menu. Indeed, it rather acts as a switch and is correctly part of the section containing all the other Ignore options !
Now, if you’re going to do a comparison, based on a regex, it’s very likely that you won’t get the right regex, at the first try, isn’t it ?
So, my idea, about the behavior of the
Ignore Regex...
option, is :-
Whatever the
Ignore Regex...
option, in the main menu, is checked or not, a left mouse click on this option will always open theComparePlus Ignore regex
window -
That
ComparePlus Ignore regex
window would have two buttonsEnable
andDisable
:-
If the
Ignore Regex...
option, in main menu, is presently disabled ( no check mark ) :-
A left mouse click on the
Enable
button would valid the regex AND enable theIgnore Regex..
, with its check mark -
A left mouse click on the
Disable
button would still valid the regex BUT would keep theIgnore Regex..
option disabled, with no check mark
-
-
If the
Ignore Regex...
option, in main menu, is presently enabled ( check mark ) :-
A left mouse click on the
Enable
button would valid the regex AND keep theIgnore Regex..
enabled, with its check mark -
A left mouse click on the
Disable
button would still valid the regex BUT would disable theIgnore Regex..
option, with no check mark
-
-
Regarding the regex engine to use with the
ComparePlus
plugin, I would say that the presentC++ ECMAScript
implementation seems the best of all the others that you provided. Indeed, all the others (Basic POSIX
,Extended POSIX
,Awk
,Grep
andEgrep
), for instance, do not recognize the look-arounds feature !However, the current
C++ ECMAScript
regex library is not the best one, too. For instance, it does not handle the look-behind feature ! I tried your build618
with the two regexes^.+(?=X)
and(?<=3).+
. The second regex, with the look-behind, didn’t work and gave the error window :PluginManager:errorPluginCommand Exception regex_error(error_syntax)
To verify my assertion, refer to this site :
https://cplusplus.com/reference/regex/ECMAScript/
Now, Pavel, don’t be annoyed about it. Just wait and see if some users need more regex features in order to create the appropriate
ignore
regex !Of course, later, why not use the powerful
Boost
regex library, already embedded in Notepad++ itself ?A good one, as well, would be the
.NET
regex library. Refer to the Microsoft site :But I do understand that the change of the regex library, within your plugin, may not be so easy to implement ! It’s up to you to go on that way !
Now, regarding the way to compose an ignore regex, I don’t think, finally, that it would be very difficult ! To prove this, here is an example with two small
CSV
files, containing nine fields :- Test_1.txt
abc,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,999 xyz,123,456,def,fgh,ijk,789,xyz,999
- Text_2.txt
abc,123,456,fgh,def,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,999 xyz,123,000,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,def,456,fgh,ijk,789,xyz,012 abc,123,456,def,ijk,fgh,789,xyz,999 xyz,000,456,def,fgh,ijk,000,xyz,999
-
First, I would like to mention that, if we delete all the blank lines, in
test_1.txt
andtest_2.txt
, it’s really not easy to get an idea of the comparison process, as it considers added and removed lines as well as changed lines ! Thanks to the blank lines, we get only some changed lines ! -
Secondly, for a clean view of all the changes, I disabled the
Detect Moves
option for this test
From this point, I did some tests with different regexes, typed in the
Ignore Regex...
option-
By default, if the
Ignore Regex...
option is disabled, all the lines are totally compared :- So, the lines
1
,3
,5
,7
,11
,13
and15
are changed between the two files
- So, the lines
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){2}
, the comparison ignores the first two fields, in each file :-
So, the lines
5
,15
and1
7
,11
,13
, only, are changed between the two files
-
-
When the
Ignore Regex...
option contains the regex(,[^,\r\n]+?){3}$
, the comparison ignores the last three fields, in each file :-
So, the lines
3
,15
and1
7
,11
,13
, only, are changed between the two files
-
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){2}|(,[^,\r\n]+?){3}$
, the comparison ignores the first two fields OR the last three fields, in each file. Thus, the comparison take in account everything which is not the first two AND not the last three :-
So, the lines
1
7
,11
and13
, only, are changed between the two files
-
-
Now, if we would like to ignore the middle four fields, in each file :
-
We cannot use look-behinds, because both, the look-behind would have a non-fixed length and also because it’s not allowed with the present
C++ ECMAScript
regex library of theComparePlus
plugin -
We cannot use the
\K
feature, too. Actually, the regex^([^,\r\n]+?,){2}\K([^,\r\n]+?,){4}
does not work at all and is simply equivalent to the default comparison, without anyIgnore Regex...
option !
-
However, Pavel, there is still a solution which uses a look-ahead ;-))
-
When the
Ignore Regex...
option contains the regex(,[^,\r\n]+?){4}(?=(,[^,\r\n]+?){3}$)
, the comparison ignores the four fields, IF they are followed with the last three fields :- So, the lines
3
,5
and15
, only, are changed between the two files
- So, the lines
You could say : what happens if we choose an
Ignore Regex...
which represents the totality of each line ? In this specific case, the range to compare becomes the empty range of each line !!-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){8}[^,\r\n]+$
, the comparison ignore all lines contents, in each file :- So, the dialog
Files 'test_1.txt' and 'test_2.txt' match / Close compared files?
occurs ( Logical ! )
- So, the dialog
As you can see, Pavel, no need to worry : you’ll probably never have to change the regex engine, within the
ComparePlus
plugin ! There is always a valid regex solution to useBest Regards,
guy038
P.S. :
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){8}[^,\r\n]+\R
, it does not work too and give the default compare results ( without theIgnore Regex...
option ). The\R
syntax seems forbidden, too -
I also noticed that, when you type in a non-valid regex in the
ComparePlus Ignore regex
window, you won’t get any error message AND the default comparison process is run, although theIgnore Regex...
option remains checked !
-
-
@guy038 ,
Thank you very much for your excellent and thorough analysis and feedback, it is much appreciated.
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
So, my idea, about the behavior of the Ignore Regex… option, is : …
Exactly what I meant as improvement based on your previous post with one exception:
Why onDisable
should we remember the entered regex?
I thought I should disregard the entered regex in that case although it doesn’t really matter. It just seems counter-intuitive to me.
If people prefer it that way I’m OK with it.Thank you for the info regarding different regex engines. I’ll keep standard C++ library ECMAScript then. If in the future a need for more sophisticated engine (Boost for example) arises then I’ll consider implementing it.
About the excellent CSV test example… honestly your regex entries look like magic to me :) I really don’t have enough knowledge in that field. What I can say is that I’m really glad that with enough know-how one could have so many possibilities.
And you are definitely a virtuoso!@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex… option remains checked !
That is something I should look into but since the regex entry is not validated prior the comparison itself the behavior now is what it is. I’ll see what I can do, thanks.
BR
P.S. I thought that it is good to mention here that the
Ignore Regex
is implemented to be on a line-by-line basis. I’m ‘telling’ that because I saw in you regex entries the ‘\r\n’ sequence that reminds me of a line-end check. -
Hi, @pnedev and All,
In my last post, I said , at the end :
- I also noticed that, when you type in a non-valid regex in the
ComparPlus Ignore regex
window, you won’t get any error message AND the default comparison process is run, although theIgnore Regex...
option remains checked !
I was totally wrong about it :-((. I did additional tests and, for instance :
-
The valid regex
^([^,\r\n]+?,){8}[^,\r\n]+\R
, matching all line contents, leads to the dialogFiles 'test_1.txt' and 'test_2.txt' match / Close compared files?
( Logical ) -
The invalid regex
^(([^,\r\n]+?,){8}[^,\r\n]+\R
, containing one more opening parenthese, near the beginning of the regex, is correctly detected and outputs the error window :
PluginManager:errorPluginCommand Exception regex_error(error_paren): The expression contains mismatched ( and ).
- But the valid regex
^0([^,\r\n]+?,){8}[^,\r\n]+\R
, which cannot be found, in any line ofTest_1.txt
andTest_2.txt
, of course, means that the default comparison is run, with theIgnore Regex...
option still checked ! It’s the normal behavior and we cannot do anything about it ;-))
Best Regards,
guy038
- I also noticed that, when you type in a non-valid regex in the
-
-
Hello @guy038 ,
Could you please try briefly build https://ci.appveyor.com/project/pnedev/compare-plugin/builds/44268759 ?
It has implemented the
Enable
/Disable
behavior we discussed (onDisable
the entered regex value is not saved because it doesn’t seem intuitive to me) and also checks the validity of the regex onEnable
.Thank you.
BR
-
Hello, @panedev and All,
Wonderful !! It works fine ;-))
So, given my previous example with the two
CSV
test file, for instance :-
I click on the
Ignore Regex...
option, not checked, => I get theComparePlus Ignore Regex
window which is empty -
I type in the regex
^([^,\r\n]+?,){2}
which should ignore the first two fields of theseCSV
files -
I click on the
Enable
button => TheComparPlus Ignore Regex
window disappears, and theIgnore Regex...
option is now checked -
I run the comparison and, as expected, no orange highlighting can be observed in the first two fields of each file
Then :
-
I click again on the
Ignore Regex...
option, which is checked => I get theComparePlus Ignore Regex
window, which kept the regex -
I click on the
Disable
button => TheComparPlus Ignore Regex
window disappears, and theIgnore Regex...
option is again not checked -
I run the comparison and, as expected, I get the default comparaison process ( with the
Ignore Regex
disabled). Refer line3
in each file !
These two successive list of operations shows that we can easily compare the result of any
Ignore Regex
with the default comparison case !The nice thing is that, from one call of the
Ignore Regex...
option to another call, you keep the current regex typed, making easy any regex modification with a further click on theEnable
buttonThat’s what I meant when I wrongly spoke, in a previous post, of keeping the regex valid ! I wanted to say that the regex should stay in the entry field, in all cases !
As a summary, I would say that your new
Ignore Regex...
option is, from now on, fully functional, and will certainly help a lot of users ;-))Best Regards,
guy038
-
-