regex replace performance regression
-
Hello @guy038 the file is available on github https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
Another user hasn’t reproduce my issue. So I’m wondering of the veracity of my issue. I’ve tested with portable version of course, but Local Conf mode was OFF. So it may be linked to my configuration.The regexp seems stupids, but it’s just for the test !
-
Hello, @cmeriaux and All,
I used a file which contains five times your initial file, so 1 empty line at beginning +
47,520
lines ( 5 * 9,504 ). I used this protocol :-
A recent
Win 10
laptop withSSD
, connected to the power supply and a cell phone for timing ! -
Tests with, both,
x32
andx64
versions and, both,User
andAdministrator
modes -
Tested N++ versions :
v7.9.2
,v7.9.5
,v8.1.5
andv8.1.9.2
-
I did, at least,
3
tries for each case ! -
The Replace All action changed, each time,
290,640
occurrences
Practically :
-
I used the
Regular expression
mode and theWrap around
option -
I left the
Match case
unticked -
Ctrl + Home
( Back to the first empty line ) -
Ctrl + H
( Replace Dialog )
- All + A
( Replace All operation ) + start timingAfter results :
Esc
to close the Replace dialogCtrl + Z
to undo the resultsand so on …
I got this table :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.0 s | | 17,0 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.58 •--------------------• | Win 10 x64 | 7.9.2 | 6,6 s | | 6.6 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•
Note that, for fun, I added the
Win XP x32
case, with7.9.2
, which is the last available version runningWin XP
. Not too bad, isn’t it ? ( For an old1,70 Ghz
mono-core, with1 Gb
memory ! )
Interpretation of the results :
-
First, no significant change exists between
user
andadmin
mode ! -
Secondlly, each
x64
version is, globally,2.5
times speeder that its correspondingx32
one ! -
Thirdly, the small differences between the
x32
versions, for one hand and thex64
versions for the other hand, are rather non significant and simply represent the measurement incertainties !
In the end, regarding this test, no significant difference could be observed, for each category (
x32
andx64
)Best Regards,
guy038
Wesnesday or Thurday, I 'll run an other test of @scott-sumner with deletion of some lines !
-
-
@guy038
32bit for 7.9.2 is faster than 64bit or it’s just order mistake? -
Thanks @guy038 for the full interesting report. The other conclusion is that my original issue is located on my side.
Cheers -
Hello, @cmeriaux, @Arkadiuszmichalski and All,
Forget the results of my previous post. You’ll find, below, an updated version, without the typo regarding versions for
v7.9.2
and with new results for thev8.0
version !
I must add, that, regarding my customized preferences for this test, I used :
-
Alternate icons
( General ) -
Enable Multi-Editing
,Enable Smooth font
andEnable scrolling beyond last line
( Editing ) -
Disable smart Highlingting
( Highlighting ) -
Use Monospaced font in Find dalog
( Searching ) -
Disable session snapshot and periodic backup
andBackup on Save : None
( Backup ) -
Auto-completion : Function completion
( Auto-completion )
So, my final list is :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.3 s | | 17,3 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.60 •--------------------• | Win 10 x64 | 7.9.2 | 6,7 s | | 6.65 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.0 | 63,0 s | | 63,0 s | | 1.00 | •---------------•-------------•----------• 1.45 •----------• 1.470 •--------------------• | Win 10 x64 | 8.0 | 43.4 s | | 43.0 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•
Note that, for fun, I added the
Win XP x32
case, with7.9.2
, which is the last available version runningWin XP
. Not too bad, isn’t it ? ( For an old1,70 Ghz
mono-core, with1 Gb
memory ! )
Interpretation of the results :
-
Note that, in
8.0
version, the new handling of accentuated chars, in regex replacement, was functional : unfortunately, the performance regression is obvious :-(( -
But later, in the
8.1.5
version, due to this performance regression, the issue was reverted and the general performance was back again ! -
So, except for the special
v8.0
case :-
Firstly, no significant change exists between
user
andadmin
mode ! -
Secondlly, each
x64
version is, globally,2.5
times speeder that its correspondingx32
one -
Thirdly, the small differences between the
x32
versions, for one hand and thex64
versions for the other hand, are rather non significant and simply represent the measurement incertainties !
-
In the end :
-
Performances were degraded from the
v8.0
version till thev8.1.3
version, when the handling ofnon_ ASCII
accentuated characters, in replacement, was enabled -
Else, no significant difference could be observed, for each category (
x32
andx64
)
Best Regards,
guy038
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
-
-
@guy038 said in regex replace performance regression:
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
Hmm. How, exactly does one test Scott ??
-
Hi, @alan-kilborn,
I did download a Scott-sumner’s file, named
data8279.txt
, which was still available, one week ago, about, onGitHub
! But, now, I can’t even remember in which issue or pull request I’ve had seen it :-((I just can tell you that it uses a regex expression to delete any line containing the word
NotepadPP
With the
v8.1.9.2 (64 bits)
version and theMatch case
option unticked, it deletes203,236
occurrences, on a total of300,000
lines, in37,2 s
. So, it remains, after replacement,96,764
lines !. If theMatch case
option is enabled, it’s rather similar :37,1 s
!BR
guy038
-
@guy038 said in regex replace performance regression:
data8279.txt
I searched the github issues for that: it’s in #8279 comment#743696790
-
Hello, @peterjones and All,
Peter, Thanks for being able to find out this issue- comment !
Now, I realize that Scoot do a double-operation :
-
Firstly, he performs a mark operation, with the
Bookmark line
option ticked, on wordNotepadPP
-
Secondly, he performs a
Remove Bookmarked Lines
operation
So, not exactly what I meant, before !
Be patient till Friday, as, like Tuesday, tomorrow is a nice sunny ski day for me. The second one since
March 2019 !
BR
guy038
-
-
BTW, Peter, could you tell me which criteria did you use, in
GitHub
search, to get the right issue ?Thanks in advance !
BR
guy038
-
@guy038 ,
I went to the issues search, removed the “closed” condition, and searched for the name of the file
https://github.com/notepad-plus-plus/notepad-plus-plus/issues?q=is%3Aissue+data8279.zip
(originally, I tried going through GitHub help files to see how to search comments for specific attachments; when I couldn’t find it, I decided to see if the simple plaintext search for the filename would work, hoping that either the name of the file was in plaintext in the comment, or that when it searched, it could see in the URL as well.)
-
Hi, @cmeriaux, @peterjones, @alan-kilborn, @arkadiuszmichalski and All,
So, I"m going on testing some examples of text, dealing with replacements, bookmarks and replacement modifiers ! Note that I did not consider the Admin case, rather identical !
First, I used the @sasumner’s file
data8279.txt
(300,000
lines ) and I performed two types of text :-
A global replacement of
(?-s)^.*NotepadPP.*\R
withNothing
, with theWrap around
option ticked ( First table, below ) -
A mark operation of the string
NotepadPP
, with theBookmark line
andWrap around
option ticked, but not theMatch case
one, followed with aSearch > Bookmark > Remove Bookmarked Lines
operation ( Second table, below )
203,236
occurrences were deleted or were marked then deleted !•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 65.0 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.2 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 47.4 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.5 | 37.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 86.2 s | | •---------------•-------------•----------• 1.22 | | Win 10 x64 | 8.0 | 70.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 8.1.5 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 47.5 s | | •---------------•-------------•----------• 1.28 | | Win 10 x64 | 8.1.9.2 | 37.2 s | | •===============•=============•==========•=================•
•===============•=============•====================================• | Archi- | Version | User Mode | | | |------------------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==================•=================• | Win XP x32 | 7.9.2 | 10.0 s + 64.2 s | -/- | •===============•=============•==================•=================• | Win 10 x32 | 7.9.2 | 4.9 s + 49.1 s | | •---------------•-------------•------------------• 1.36 | | Win 10 x64 | 7.9.2 | 2.1 s + 37.5 s | | •===============•=============•==================•=================• | Win 10 x32 | 7.9.5 | 4.8 s + 49.1 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 7.9.5 | 2.3 s + 37.6 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.0 | 20.0 s + 49.1 s | | •---------------•-------------•------------------• 1.31 | | Win 10 x64 | 8.0 | 14.8 s + 37.8 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.5 | 4.8 s + 49.3 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 8.1.5 | 2.3 s + 37.7 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.9.2 | 4.8 s + 49.2 s | | •---------------•-------------•------------------• 1.37 | | Win 10 x64 | 8.1.9.2 | 2.1 s + 37.4 s | | •===============•=============•==================•=================•
In the second table, I decomposed the total time in two parts :
-
Time to bookmark the lines
-
Time to delete these lines
-
I summarized the two values before calculating the ratio
x32/x64
Interpretation of the results :
If xe except the special case of the
v8.0
version, the results are very similar, for the two tables :-
In the first case, the more complicated regex
(?-s)^.*NotepadPP.*\R
decrease a bit the ratio between thex32
andx64
versions -
In the second case, both the
mark
operation and thedeletion
of lines have an impact, but the ratio between thex32
andx64
versions is a bit better -
Note that, regarding the
v8.0
version, in the second table, the performance regression comes from the bad results of the mark operation only !
I performed a last test, using the same Search and Replace regexes than in my initial issue :
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636
So the regex S/R :
SEARCH
\w
REPLACE
\U$0
I, then, created a file containing
1,000
lines ( every odd ones ) with the French text :C’est là, près de la forêt, dans un gîte, où régnait un grand capharnaüm, que l’aïeul ôta sa flûte et son bâton de son canoë.
And I added
1,000
English lines ( every even ones ) :Here is a example of text, containing the complete French set of accentuated characters, traditionally used.
After replacement,
184,000
occurrences have been modified :•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 18.7 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 10.5 s | | •---------------•-------------•----------• 2.56 | | Win 10 x64 | 7.9.2 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 10.3 s | | •---------------•-------------•----------• 2.51 | | Win 10 x64 | 7.9.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 38.5 s | | •---------------•-------------•----------• 1.41 | | Win 10 x64 | 8.0 | 27.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.9.2 | 4.1 s | | •===============•=============•==========•=================•
Interpretation of the results :
Again, if we except the special case of the
v8.0
version :-
The results, whatever the version, are quite similar, for each case (
x32
andx64
) -
The ratio
x32/x64
is similar to the one of my previous post (~ 2.52
) !
Best Regards,
guy038
-