regex replace performance regression
-
Hello @guy038 the file is available on github https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
Another user hasn’t reproduce my issue. So I’m wondering of the veracity of my issue. I’ve tested with portable version of course, but Local Conf mode was OFF. So it may be linked to my configuration.The regexp seems stupids, but it’s just for the test !
-
Hello, @cmeriaux and All,
I used a file which contains five times your initial file, so 1 empty line at beginning +
47,520lines ( 5 * 9,504 ). I used this protocol :-
A recent
Win 10laptop withSSD, connected to the power supply and a cell phone for timing ! -
Tests with, both,
x32andx64versions and, both,UserandAdministratormodes -
Tested N++ versions :
v7.9.2,v7.9.5,v8.1.5andv8.1.9.2 -
I did, at least,
3tries for each case ! -
The Replace All action changed, each time,
290,640occurrences
Practically :
-
I used the
Regular expressionmode and theWrap aroundoption -
I left the
Match caseunticked -
Ctrl + Home( Back to the first empty line ) -
Ctrl + H( Replace Dialog )
- All + A( Replace All operation ) + start timingAfter results :
Escto close the Replace dialogCtrl + Zto undo the resultsand so on …
I got this table :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.0 s | | 17,0 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.58 •--------------------• | Win 10 x64 | 7.9.2 | 6,6 s | | 6.6 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•Note that, for fun, I added the
Win XP x32case, with7.9.2, which is the last available version runningWin XP. Not too bad, isn’t it ? ( For an old1,70 Ghzmono-core, with1 Gbmemory ! )
Interpretation of the results :
-
First, no significant change exists between
userandadminmode ! -
Secondlly, each
x64version is, globally,2.5times speeder that its correspondingx32one ! -
Thirdly, the small differences between the
x32versions, for one hand and thex64versions for the other hand, are rather non significant and simply represent the measurement incertainties !
In the end, regarding this test, no significant difference could be observed, for each category (
x32andx64)Best Regards,
guy038
Wesnesday or Thurday, I 'll run an other test of @scott-sumner with deletion of some lines !
-
-
@guy038
32bit for 7.9.2 is faster than 64bit or it’s just order mistake? -
Thanks @guy038 for the full interesting report. The other conclusion is that my original issue is located on my side.
Cheers -
Hello, @cmeriaux, @Arkadiuszmichalski and All,
Forget the results of my previous post. You’ll find, below, an updated version, without the typo regarding versions for
v7.9.2and with new results for thev8.0version !
I must add, that, regarding my customized preferences for this test, I used :
-
Alternate icons( General ) -
Enable Multi-Editing,Enable Smooth fontandEnable scrolling beyond last line( Editing ) -
Disable smart Highlingting( Highlighting ) -
Use Monospaced font in Find dalog( Searching ) -
Disable session snapshot and periodic backupandBackup on Save : None( Backup ) -
Auto-completion : Function completion( Auto-completion )
So, my final list is :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.3 s | | 17,3 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.60 •--------------------• | Win 10 x64 | 7.9.2 | 6,7 s | | 6.65 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.0 | 63,0 s | | 63,0 s | | 1.00 | •---------------•-------------•----------• 1.45 •----------• 1.470 •--------------------• | Win 10 x64 | 8.0 | 43.4 s | | 43.0 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•Note that, for fun, I added the
Win XP x32case, with7.9.2, which is the last available version runningWin XP. Not too bad, isn’t it ? ( For an old1,70 Ghzmono-core, with1 Gbmemory ! )
Interpretation of the results :
-
Note that, in
8.0version, the new handling of accentuated chars, in regex replacement, was functional : unfortunately, the performance regression is obvious :-(( -
But later, in the
8.1.5version, due to this performance regression, the issue was reverted and the general performance was back again ! -
So, except for the special
v8.0case :-
Firstly, no significant change exists between
userandadminmode ! -
Secondlly, each
x64version is, globally,2.5times speeder that its correspondingx32one -
Thirdly, the small differences between the
x32versions, for one hand and thex64versions for the other hand, are rather non significant and simply represent the measurement incertainties !
-
In the end :
-
Performances were degraded from the
v8.0version till thev8.1.3version, when the handling ofnon_ ASCIIaccentuated characters, in replacement, was enabled -
Else, no significant difference could be observed, for each category (
x32andx64)
Best Regards,
guy038
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
-
-
@guy038 said in regex replace performance regression:
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
Hmm. How, exactly does one test Scott ??
-
Hi, @alan-kilborn,
I did download a Scott-sumner’s file, named
data8279.txt, which was still available, one week ago, about, onGitHub! But, now, I can’t even remember in which issue or pull request I’ve had seen it :-((I just can tell you that it uses a regex expression to delete any line containing the word
NotepadPPWith the
v8.1.9.2 (64 bits)version and theMatch caseoption unticked, it deletes203,236occurrences, on a total of300,000lines, in37,2 s. So, it remains, after replacement,96,764lines !. If theMatch caseoption is enabled, it’s rather similar :37,1 s!BR
guy038
-
@guy038 said in regex replace performance regression:
data8279.txt
I searched the github issues for that: it’s in #8279 comment#743696790
-
Hello, @peterjones and All,
Peter, Thanks for being able to find out this issue- comment !
Now, I realize that Scoot do a double-operation :
-
Firstly, he performs a mark operation, with the
Bookmark lineoption ticked, on wordNotepadPP -
Secondly, he performs a
Remove Bookmarked Linesoperation
So, not exactly what I meant, before !
Be patient till Friday, as, like Tuesday, tomorrow is a nice sunny ski day for me. The second one since
March 2019 !BR
guy038
-
-
BTW, Peter, could you tell me which criteria did you use, in
GitHubsearch, to get the right issue ?Thanks in advance !
BR
guy038
-
@guy038 ,
I went to the issues search, removed the “closed” condition, and searched for the name of the file
https://github.com/notepad-plus-plus/notepad-plus-plus/issues?q=is%3Aissue+data8279.zip
(originally, I tried going through GitHub help files to see how to search comments for specific attachments; when I couldn’t find it, I decided to see if the simple plaintext search for the filename would work, hoping that either the name of the file was in plaintext in the comment, or that when it searched, it could see in the URL as well.)
-
Hi, @cmeriaux, @peterjones, @alan-kilborn, @arkadiuszmichalski and All,
So, I"m going on testing some examples of text, dealing with replacements, bookmarks and replacement modifiers ! Note that I did not consider the Admin case, rather identical !
First, I used the @sasumner’s file
data8279.txt(300,000lines ) and I performed two types of text :-
A global replacement of
(?-s)^.*NotepadPP.*\RwithNothing, with theWrap aroundoption ticked ( First table, below ) -
A mark operation of the string
NotepadPP, with theBookmark lineandWrap aroundoption ticked, but not theMatch caseone, followed with aSearch > Bookmark > Remove Bookmarked Linesoperation ( Second table, below )
203,236occurrences were deleted or were marked then deleted !•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 65.0 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.2 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 47.4 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.5 | 37.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 86.2 s | | •---------------•-------------•----------• 1.22 | | Win 10 x64 | 8.0 | 70.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 8.1.5 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 47.5 s | | •---------------•-------------•----------• 1.28 | | Win 10 x64 | 8.1.9.2 | 37.2 s | | •===============•=============•==========•=================•
•===============•=============•====================================• | Archi- | Version | User Mode | | | |------------------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==================•=================• | Win XP x32 | 7.9.2 | 10.0 s + 64.2 s | -/- | •===============•=============•==================•=================• | Win 10 x32 | 7.9.2 | 4.9 s + 49.1 s | | •---------------•-------------•------------------• 1.36 | | Win 10 x64 | 7.9.2 | 2.1 s + 37.5 s | | •===============•=============•==================•=================• | Win 10 x32 | 7.9.5 | 4.8 s + 49.1 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 7.9.5 | 2.3 s + 37.6 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.0 | 20.0 s + 49.1 s | | •---------------•-------------•------------------• 1.31 | | Win 10 x64 | 8.0 | 14.8 s + 37.8 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.5 | 4.8 s + 49.3 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 8.1.5 | 2.3 s + 37.7 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.9.2 | 4.8 s + 49.2 s | | •---------------•-------------•------------------• 1.37 | | Win 10 x64 | 8.1.9.2 | 2.1 s + 37.4 s | | •===============•=============•==================•=================•In the second table, I decomposed the total time in two parts :
-
Time to bookmark the lines
-
Time to delete these lines
-
I summarized the two values before calculating the ratio
x32/x64
Interpretation of the results :
If xe except the special case of the
v8.0version, the results are very similar, for the two tables :-
In the first case, the more complicated regex
(?-s)^.*NotepadPP.*\Rdecrease a bit the ratio between thex32andx64versions -
In the second case, both the
markoperation and thedeletionof lines have an impact, but the ratio between thex32andx64versions is a bit better -
Note that, regarding the
v8.0version, in the second table, the performance regression comes from the bad results of the mark operation only !
I performed a last test, using the same Search and Replace regexes than in my initial issue :
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636
So the regex S/R :
SEARCH
\wREPLACE
\U$0I, then, created a file containing
1,000lines ( every odd ones ) with the French text :C’est là, près de la forêt, dans un gîte, où régnait un grand capharnaüm, que l’aïeul ôta sa flûte et son bâton de son canoë.
And I added
1,000English lines ( every even ones ) :Here is a example of text, containing the complete French set of accentuated characters, traditionally used.
After replacement,
184,000occurrences have been modified :•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 18.7 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 10.5 s | | •---------------•-------------•----------• 2.56 | | Win 10 x64 | 7.9.2 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 10.3 s | | •---------------•-------------•----------• 2.51 | | Win 10 x64 | 7.9.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 38.5 s | | •---------------•-------------•----------• 1.41 | | Win 10 x64 | 8.0 | 27.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.9.2 | 4.1 s | | •===============•=============•==========•=================•
Interpretation of the results :
Again, if we except the special case of the
v8.0version :-
The results, whatever the version, are quite similar, for each case (
x32andx64) -
The ratio
x32/x64is similar to the one of my previous post (~ 2.52) !
Best Regards,
guy038
-