Regex tests of the build 618 of the 'ComparePlus' plugin
-
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
Are these versions still OK or an upgrade is necessary ?
Sorry, forgot about that question.
Those are OK to use, no problem but I would advise you to use the latest ones from the current ComparePlus dev.BR
-
Hi, @pnedev,
I’ve already had a quick overview of the questions / propositions, provided in your post. I’ll will go on, tomorrow and answer you shortly !
BR
guy038
-
Hello Pavel and Guy,
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
First, un-check this option
Secondly, check again the option to get the pop-up window and, then, modify the regex !
Good point.
@pnedev said in Regex tests of the build 618 of the 'ComparePlus' plugin:
On the other hand I could easily change the behavior so on every Ignore regex click the regex edit dialog appears and it is disabled if you click Cancel or it will have also an enable switch in the dialog.
This seems to be the best solution.
How about “Apply” instead of “OK” to enable?Thank you.
-
Pavel, I better see why you prefer to keep this option in the main menu. Indeed, it rather acts as a switch and is correctly part of the section containing all the other Ignore options !
Now, if you’re going to do a comparison, based on a regex, it’s very likely that you won’t get the right regex, at the first try, isn’t it ?
So, my idea, about the behavior of the
Ignore Regex...
option, is :-
Whatever the
Ignore Regex...
option, in the main menu, is checked or not, a left mouse click on this option will always open theComparePlus Ignore regex
window -
That
ComparePlus Ignore regex
window would have two buttonsEnable
andDisable
:-
If the
Ignore Regex...
option, in main menu, is presently disabled ( no check mark ) :-
A left mouse click on the
Enable
button would valid the regex AND enable theIgnore Regex..
, with its check mark -
A left mouse click on the
Disable
button would still valid the regex BUT would keep theIgnore Regex..
option disabled, with no check mark
-
-
If the
Ignore Regex...
option, in main menu, is presently enabled ( check mark ) :-
A left mouse click on the
Enable
button would valid the regex AND keep theIgnore Regex..
enabled, with its check mark -
A left mouse click on the
Disable
button would still valid the regex BUT would disable theIgnore Regex..
option, with no check mark
-
-
Regarding the regex engine to use with the
ComparePlus
plugin, I would say that the presentC++ ECMAScript
implementation seems the best of all the others that you provided. Indeed, all the others (Basic POSIX
,Extended POSIX
,Awk
,Grep
andEgrep
), for instance, do not recognize the look-arounds feature !However, the current
C++ ECMAScript
regex library is not the best one, too. For instance, it does not handle the look-behind feature ! I tried your build618
with the two regexes^.+(?=X)
and(?<=3).+
. The second regex, with the look-behind, didn’t work and gave the error window :PluginManager:errorPluginCommand Exception regex_error(error_syntax)
To verify my assertion, refer to this site :
https://cplusplus.com/reference/regex/ECMAScript/
Now, Pavel, don’t be annoyed about it. Just wait and see if some users need more regex features in order to create the appropriate
ignore
regex !Of course, later, why not use the powerful
Boost
regex library, already embedded in Notepad++ itself ?A good one, as well, would be the
.NET
regex library. Refer to the Microsoft site :But I do understand that the change of the regex library, within your plugin, may not be so easy to implement ! It’s up to you to go on that way !
Now, regarding the way to compose an ignore regex, I don’t think, finally, that it would be very difficult ! To prove this, here is an example with two small
CSV
files, containing nine fields :- Test_1.txt
abc,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,999 xyz,123,456,def,fgh,ijk,789,xyz,999
- Text_2.txt
abc,123,456,fgh,def,ijk,789,xyz,012 xyz,123,456,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,999 xyz,123,000,def,fgh,ijk,789,xyz,012 abc,123,456,def,fgh,ijk,789,xyz,012 xyz,123,def,456,fgh,ijk,789,xyz,012 abc,123,456,def,ijk,fgh,789,xyz,999 xyz,000,456,def,fgh,ijk,000,xyz,999
-
First, I would like to mention that, if we delete all the blank lines, in
test_1.txt
andtest_2.txt
, it’s really not easy to get an idea of the comparison process, as it considers added and removed lines as well as changed lines ! Thanks to the blank lines, we get only some changed lines ! -
Secondly, for a clean view of all the changes, I disabled the
Detect Moves
option for this test
From this point, I did some tests with different regexes, typed in the
Ignore Regex...
option-
By default, if the
Ignore Regex...
option is disabled, all the lines are totally compared :- So, the lines
1
,3
,5
,7
,11
,13
and15
are changed between the two files
- So, the lines
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){2}
, the comparison ignores the first two fields, in each file :-
So, the lines
5
,15
and1
7
,11
,13
, only, are changed between the two files
-
-
When the
Ignore Regex...
option contains the regex(,[^,\r\n]+?){3}$
, the comparison ignores the last three fields, in each file :-
So, the lines
3
,15
and1
7
,11
,13
, only, are changed between the two files
-
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){2}|(,[^,\r\n]+?){3}$
, the comparison ignores the first two fields OR the last three fields, in each file. Thus, the comparison take in account everything which is not the first two AND not the last three :-
So, the lines
1
7
,11
and13
, only, are changed between the two files
-
-
Now, if we would like to ignore the middle four fields, in each file :
-
We cannot use look-behinds, because both, the look-behind would have a non-fixed length and also because it’s not allowed with the present
C++ ECMAScript
regex library of theComparePlus
plugin -
We cannot use the
\K
feature, too. Actually, the regex^([^,\r\n]+?,){2}\K([^,\r\n]+?,){4}
does not work at all and is simply equivalent to the default comparison, without anyIgnore Regex...
option !
-
However, Pavel, there is still a solution which uses a look-ahead ;-))
-
When the
Ignore Regex...
option contains the regex(,[^,\r\n]+?){4}(?=(,[^,\r\n]+?){3}$)
, the comparison ignores the four fields, IF they are followed with the last three fields :- So, the lines
3
,5
and15
, only, are changed between the two files
- So, the lines
You could say : what happens if we choose an
Ignore Regex...
which represents the totality of each line ? In this specific case, the range to compare becomes the empty range of each line !!-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){8}[^,\r\n]+$
, the comparison ignore all lines contents, in each file :- So, the dialog
Files 'test_1.txt' and 'test_2.txt' match / Close compared files?
occurs ( Logical ! )
- So, the dialog
As you can see, Pavel, no need to worry : you’ll probably never have to change the regex engine, within the
ComparePlus
plugin ! There is always a valid regex solution to useBest Regards,
guy038
P.S. :
-
When the
Ignore Regex...
option contains the regex^([^,\r\n]+?,){8}[^,\r\n]+\R
, it does not work too and give the default compare results ( without theIgnore Regex...
option ). The\R
syntax seems forbidden, too -
I also noticed that, when you type in a non-valid regex in the
ComparePlus Ignore regex
window, you won’t get any error message AND the default comparison process is run, although theIgnore Regex...
option remains checked !
-
-
@guy038 ,
Thank you very much for your excellent and thorough analysis and feedback, it is much appreciated.
@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
So, my idea, about the behavior of the Ignore Regex… option, is : …
Exactly what I meant as improvement based on your previous post with one exception:
Why onDisable
should we remember the entered regex?
I thought I should disregard the entered regex in that case although it doesn’t really matter. It just seems counter-intuitive to me.
If people prefer it that way I’m OK with it.Thank you for the info regarding different regex engines. I’ll keep standard C++ library ECMAScript then. If in the future a need for more sophisticated engine (Boost for example) arises then I’ll consider implementing it.
About the excellent CSV test example… honestly your regex entries look like magic to me :) I really don’t have enough knowledge in that field. What I can say is that I’m really glad that with enough know-how one could have so many possibilities.
And you are definitely a virtuoso!@guy038 said in Regex tests of the build 618 of the 'ComparePlus' plugin:
I also noticed that, when you type in a non-valid regex in the ComparePlus Ignore regex window, you won’t get any error message AND the default comparison process is run, although the Ignore Regex… option remains checked !
That is something I should look into but since the regex entry is not validated prior the comparison itself the behavior now is what it is. I’ll see what I can do, thanks.
BR
P.S. I thought that it is good to mention here that the
Ignore Regex
is implemented to be on a line-by-line basis. I’m ‘telling’ that because I saw in you regex entries the ‘\r\n’ sequence that reminds me of a line-end check. -
Hi, @pnedev and All,
In my last post, I said , at the end :
- I also noticed that, when you type in a non-valid regex in the
ComparPlus Ignore regex
window, you won’t get any error message AND the default comparison process is run, although theIgnore Regex...
option remains checked !
I was totally wrong about it :-((. I did additional tests and, for instance :
-
The valid regex
^([^,\r\n]+?,){8}[^,\r\n]+\R
, matching all line contents, leads to the dialogFiles 'test_1.txt' and 'test_2.txt' match / Close compared files?
( Logical ) -
The invalid regex
^(([^,\r\n]+?,){8}[^,\r\n]+\R
, containing one more opening parenthese, near the beginning of the regex, is correctly detected and outputs the error window :
PluginManager:errorPluginCommand Exception regex_error(error_paren): The expression contains mismatched ( and ).
- But the valid regex
^0([^,\r\n]+?,){8}[^,\r\n]+\R
, which cannot be found, in any line ofTest_1.txt
andTest_2.txt
, of course, means that the default comparison is run, with theIgnore Regex...
option still checked ! It’s the normal behavior and we cannot do anything about it ;-))
Best Regards,
guy038
- I also noticed that, when you type in a non-valid regex in the
-
-
Hello @guy038 ,
Could you please try briefly build https://ci.appveyor.com/project/pnedev/compare-plugin/builds/44268759 ?
It has implemented the
Enable
/Disable
behavior we discussed (onDisable
the entered regex value is not saved because it doesn’t seem intuitive to me) and also checks the validity of the regex onEnable
.Thank you.
BR
-
Hello, @panedev and All,
Wonderful !! It works fine ;-))
So, given my previous example with the two
CSV
test file, for instance :-
I click on the
Ignore Regex...
option, not checked, => I get theComparePlus Ignore Regex
window which is empty -
I type in the regex
^([^,\r\n]+?,){2}
which should ignore the first two fields of theseCSV
files -
I click on the
Enable
button => TheComparPlus Ignore Regex
window disappears, and theIgnore Regex...
option is now checked -
I run the comparison and, as expected, no orange highlighting can be observed in the first two fields of each file
Then :
-
I click again on the
Ignore Regex...
option, which is checked => I get theComparePlus Ignore Regex
window, which kept the regex -
I click on the
Disable
button => TheComparPlus Ignore Regex
window disappears, and theIgnore Regex...
option is again not checked -
I run the comparison and, as expected, I get the default comparaison process ( with the
Ignore Regex
disabled). Refer line3
in each file !
These two successive list of operations shows that we can easily compare the result of any
Ignore Regex
with the default comparison case !The nice thing is that, from one call of the
Ignore Regex...
option to another call, you keep the current regex typed, making easy any regex modification with a further click on theEnable
buttonThat’s what I meant when I wrongly spoke, in a previous post, of keeping the regex valid ! I wanted to say that the regex should stay in the entry field, in all cases !
As a summary, I would say that your new
Ignore Regex...
option is, from now on, fully functional, and will certainly help a lot of users ;-))Best Regards,
guy038
-
-