regex replace performance regression
-
There is a performance regression when doing a “replace all” with regular expression option since version 8.0.
It was supposed to be fixed in v8.1.4 but tests are not concluant and I’m still suffuring this regression.
Am I the only one who is suffering this defect ?Here is the github issue https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
Target file to replace is a test.txt with 8000 lines, 1.5 Mo, and 48000 match.
@guy038 as a master of regular expression, have you noticed something ?
-
Hello, @cmeriaux and All,
Sorry, I’ve been busy lately organizing a photo collection and I haven’t really followed the changes of Notepad++ since version 7.9.2 ( the last one supported by
Win XP
)Now, with my new
Win 10
Pro laptop, I’m going to install the two32
and64
bits architectures of the3
portable versions stable of the7.9.5
,8.1.5
and8.1.9.2
releases of Notepad++ and I’m going to do some tests !Note that I’ll be away next weekend and that my tests could overflow into next week !
Last thing : Probably, I’m unconsciously at the origin of all these performance problems because I’m the author of the issue 9636 :-(
Best Regards
guy038
P.S. :
Regarding this topic ( not exhaustive ! ) :
N++ 7.9.2 Stable January 01 N++ 7.9.3 Stable February 15 N++ 7.9.4 Stable March 15 N++ 7.9.5 Stable March 23 --> Pull Request 9707 March 27 https://github.com/notepad-plus-plus/notepad-plus-plus/pull/9707 --> INITIAL Issue 9636 April 04 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636 N++ 8.0.0 Stable June 7 --> Pull request 10010 June 15 https://github.com/notepad-plus-plus/notepad-plus-plus/pull/10010 N++ 8.1 Stable June 17 N++ 8.1.1 Stable July 04 N++ 8.1.2 Stable July 19 --> Issue 10260 July 26 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10260 N++ 8.1.3 Stable August 13 --> Issue 10398 August 17 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10398 --> Fix of issue 9636 reverted August 19 https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed Fix #10398, fix #10296, fix #10260, close #10403 ( This commit revert 86c66bb due to the boost REGEX performance issue.) N++ 8.1.4 Stable August 25 N++ 8.1.5 Stable September 27 N++ ( 8.1.6 ) Instable October 13 N++ ( 8.1.7 ) Instable October 15 N++ ( 8.1.8 ) Instable October 19 N++ ( 8.1.9 ) Instable October 22 N++ ( 8.1.9.1 ) Instable November 13 N++ 8.1.9.2 Stable November 21
-
Hi, @cmeriaux,
Looking at your issue, you said :
- create a file test.txt with 8000 lines, 1.5 Mo, with 48000 match
- open replace panel, enable regular expression option, enable wrap around option
- Replace all “u_(\w+)” with “u\1”
-
Regarding point
3
I’m wondering about the regexes ! Are the syntax of both search and replace regex correct ? -
Now, regarding point
1
, could you tell me the general structure of this file ? May be, you could e-mail me this test file to … ( temporary displayed ! )
Best Regards,
guy038
-
Hello @guy038 the file is available on github https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
Another user hasn’t reproduce my issue. So I’m wondering of the veracity of my issue. I’ve tested with portable version of course, but Local Conf mode was OFF. So it may be linked to my configuration.The regexp seems stupids, but it’s just for the test !
-
Hello, @cmeriaux and All,
I used a file which contains five times your initial file, so 1 empty line at beginning +
47,520
lines ( 5 * 9,504 ). I used this protocol :-
A recent
Win 10
laptop withSSD
, connected to the power supply and a cell phone for timing ! -
Tests with, both,
x32
andx64
versions and, both,User
andAdministrator
modes -
Tested N++ versions :
v7.9.2
,v7.9.5
,v8.1.5
andv8.1.9.2
-
I did, at least,
3
tries for each case ! -
The Replace All action changed, each time,
290,640
occurrences
Practically :
-
I used the
Regular expression
mode and theWrap around
option -
I left the
Match case
unticked -
Ctrl + Home
( Back to the first empty line ) -
Ctrl + H
( Replace Dialog )
- All + A
( Replace All operation ) + start timingAfter results :
Esc
to close the Replace dialogCtrl + Z
to undo the resultsand so on …
I got this table :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.0 s | | 17,0 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.58 •--------------------• | Win 10 x64 | 7.9.2 | 6,6 s | | 6.6 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•
Note that, for fun, I added the
Win XP x32
case, with7.9.2
, which is the last available version runningWin XP
. Not too bad, isn’t it ? ( For an old1,70 Ghz
mono-core, with1 Gb
memory ! )
Interpretation of the results :
-
First, no significant change exists between
user
andadmin
mode ! -
Secondlly, each
x64
version is, globally,2.5
times speeder that its correspondingx32
one ! -
Thirdly, the small differences between the
x32
versions, for one hand and thex64
versions for the other hand, are rather non significant and simply represent the measurement incertainties !
In the end, regarding this test, no significant difference could be observed, for each category (
x32
andx64
)Best Regards,
guy038
Wesnesday or Thurday, I 'll run an other test of @scott-sumner with deletion of some lines !
-
-
@guy038
32bit for 7.9.2 is faster than 64bit or it’s just order mistake? -
Thanks @guy038 for the full interesting report. The other conclusion is that my original issue is located on my side.
Cheers -
Hello, @cmeriaux, @Arkadiuszmichalski and All,
Forget the results of my previous post. You’ll find, below, an updated version, without the typo regarding versions for
v7.9.2
and with new results for thev8.0
version !
I must add, that, regarding my customized preferences for this test, I used :
-
Alternate icons
( General ) -
Enable Multi-Editing
,Enable Smooth font
andEnable scrolling beyond last line
( Editing ) -
Disable smart Highlingting
( Highlighting ) -
Use Monospaced font in Find dalog
( Searching ) -
Disable session snapshot and periodic backup
andBackup on Save : None
( Backup ) -
Auto-completion : Function completion
( Auto-completion )
So, my final list is :
•===============•=============•==========•=================•==========•=================• | Archi- | Version | User Mode | Administrator Mode | | | |----------•-----------------•----------•-----------------•--------------------• | tecture | Notepad++ | Time | Ratio x32/x64 | Time | Ratio x32/x64 | Ratio User/Admin | •===============•=============•==========•=================•==========•=================•====================• | Win XP x32 | 7.9.2 | 14.4 s | -/- | -/- | -/- | -/- | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.2 | 17.3 s | | 17,3 s | | 1.00 | •---------------•-------------•----------• 2.58 •----------• 2.60 •--------------------• | Win 10 x64 | 7.9.2 | 6,7 s | | 6.65 s | | 1.00 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 7.9.5 | 16.5 s | | 16.4 s | | 1.00 | •---------------•-------------•----------• 2.46 •----------• 2.48 •--------------------• | Win 10 x64 | 7.9.5 | 6.7 s | | 6.6 s | | 1.02 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.0 | 63,0 s | | 63,0 s | | 1.00 | •---------------•-------------•----------• 1.45 •----------• 1.470 •--------------------• | Win 10 x64 | 8.0 | 43.4 s | | 43.0 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.5 | 16.7 s | | 16,65 s | | 1.00 | •---------------•-------------•----------• 2.49 •----------• 2.50 •--------------------• | Win 10 x64 | 8.1.5 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================• | Win 10 x32 | 8.1.9.2 | 16.9 s | | 16.85 s | | 1.00 | •---------------•-------------•----------• 2.52 •----------• 2.53 •--------------------• | Win 10 x64 | 8.1.9.2 | 6.7 s | | 6.65 s | | 1.01 | •===============•=============•==========•=================•==========•=================•====================•
Note that, for fun, I added the
Win XP x32
case, with7.9.2
, which is the last available version runningWin XP
. Not too bad, isn’t it ? ( For an old1,70 Ghz
mono-core, with1 Gb
memory ! )
Interpretation of the results :
-
Note that, in
8.0
version, the new handling of accentuated chars, in regex replacement, was functional : unfortunately, the performance regression is obvious :-(( -
But later, in the
8.1.5
version, due to this performance regression, the issue was reverted and the general performance was back again ! -
So, except for the special
v8.0
case :-
Firstly, no significant change exists between
user
andadmin
mode ! -
Secondlly, each
x64
version is, globally,2.5
times speeder that its correspondingx32
one -
Thirdly, the small differences between the
x32
versions, for one hand and thex64
versions for the other hand, are rather non significant and simply represent the measurement incertainties !
-
In the end :
-
Performances were degraded from the
v8.0
version till thev8.1.3
version, when the handling ofnon_ ASCII
accentuated characters, in replacement, was enabled -
Else, no significant difference could be observed, for each category (
x32
andx64
)
Best Regards,
guy038
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
-
-
@guy038 said in regex replace performance regression:
Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !
Hmm. How, exactly does one test Scott ??
-
Hi, @alan-kilborn,
I did download a Scott-sumner’s file, named
data8279.txt
, which was still available, one week ago, about, onGitHub
! But, now, I can’t even remember in which issue or pull request I’ve had seen it :-((I just can tell you that it uses a regex expression to delete any line containing the word
NotepadPP
With the
v8.1.9.2 (64 bits)
version and theMatch case
option unticked, it deletes203,236
occurrences, on a total of300,000
lines, in37,2 s
. So, it remains, after replacement,96,764
lines !. If theMatch case
option is enabled, it’s rather similar :37,1 s
!BR
guy038
-
@guy038 said in regex replace performance regression:
data8279.txt
I searched the github issues for that: it’s in #8279 comment#743696790
-
Hello, @peterjones and All,
Peter, Thanks for being able to find out this issue- comment !
Now, I realize that Scoot do a double-operation :
-
Firstly, he performs a mark operation, with the
Bookmark line
option ticked, on wordNotepadPP
-
Secondly, he performs a
Remove Bookmarked Lines
operation
So, not exactly what I meant, before !
Be patient till Friday, as, like Tuesday, tomorrow is a nice sunny ski day for me. The second one since
March 2019 !
BR
guy038
-
-
BTW, Peter, could you tell me which criteria did you use, in
GitHub
search, to get the right issue ?Thanks in advance !
BR
guy038
-
@guy038 ,
I went to the issues search, removed the “closed” condition, and searched for the name of the file
https://github.com/notepad-plus-plus/notepad-plus-plus/issues?q=is%3Aissue+data8279.zip
(originally, I tried going through GitHub help files to see how to search comments for specific attachments; when I couldn’t find it, I decided to see if the simple plaintext search for the filename would work, hoping that either the name of the file was in plaintext in the comment, or that when it searched, it could see in the URL as well.)
-
Hi, @cmeriaux, @peterjones, @alan-kilborn, @arkadiuszmichalski and All,
So, I"m going on testing some examples of text, dealing with replacements, bookmarks and replacement modifiers ! Note that I did not consider the Admin case, rather identical !
First, I used the @sasumner’s file
data8279.txt
(300,000
lines ) and I performed two types of text :-
A global replacement of
(?-s)^.*NotepadPP.*\R
withNothing
, with theWrap around
option ticked ( First table, below ) -
A mark operation of the string
NotepadPP
, with theBookmark line
andWrap around
option ticked, but not theMatch case
one, followed with aSearch > Bookmark > Remove Bookmarked Lines
operation ( Second table, below )
203,236
occurrences were deleted or were marked then deleted !•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 65.0 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.2 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 47.4 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 7.9.5 | 37.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 86.2 s | | •---------------•-------------•----------• 1.22 | | Win 10 x64 | 8.0 | 70.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 47.6 s | | •---------------•-------------•----------• 1.27 | | Win 10 x64 | 8.1.5 | 37.5 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 47.5 s | | •---------------•-------------•----------• 1.28 | | Win 10 x64 | 8.1.9.2 | 37.2 s | | •===============•=============•==========•=================•
•===============•=============•====================================• | Archi- | Version | User Mode | | | |------------------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==================•=================• | Win XP x32 | 7.9.2 | 10.0 s + 64.2 s | -/- | •===============•=============•==================•=================• | Win 10 x32 | 7.9.2 | 4.9 s + 49.1 s | | •---------------•-------------•------------------• 1.36 | | Win 10 x64 | 7.9.2 | 2.1 s + 37.5 s | | •===============•=============•==================•=================• | Win 10 x32 | 7.9.5 | 4.8 s + 49.1 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 7.9.5 | 2.3 s + 37.6 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.0 | 20.0 s + 49.1 s | | •---------------•-------------•------------------• 1.31 | | Win 10 x64 | 8.0 | 14.8 s + 37.8 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.5 | 4.8 s + 49.3 s | | •---------------•-------------•------------------• 1.35 | | Win 10 x64 | 8.1.5 | 2.3 s + 37.7 s | | •===============•=============•==================•=================• | Win 10 x32 | 8.1.9.2 | 4.8 s + 49.2 s | | •---------------•-------------•------------------• 1.37 | | Win 10 x64 | 8.1.9.2 | 2.1 s + 37.4 s | | •===============•=============•==================•=================•
In the second table, I decomposed the total time in two parts :
-
Time to bookmark the lines
-
Time to delete these lines
-
I summarized the two values before calculating the ratio
x32/x64
Interpretation of the results :
If xe except the special case of the
v8.0
version, the results are very similar, for the two tables :-
In the first case, the more complicated regex
(?-s)^.*NotepadPP.*\R
decrease a bit the ratio between thex32
andx64
versions -
In the second case, both the
mark
operation and thedeletion
of lines have an impact, but the ratio between thex32
andx64
versions is a bit better -
Note that, regarding the
v8.0
version, in the second table, the performance regression comes from the bad results of the mark operation only !
I performed a last test, using the same Search and Replace regexes than in my initial issue :
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636
So the regex S/R :
SEARCH
\w
REPLACE
\U$0
I, then, created a file containing
1,000
lines ( every odd ones ) with the French text :C’est là, près de la forêt, dans un gîte, où régnait un grand capharnaüm, que l’aïeul ôta sa flûte et son bâton de son canoë.
And I added
1,000
English lines ( every even ones ) :Here is a example of text, containing the complete French set of accentuated characters, traditionally used.
After replacement,
184,000
occurrences have been modified :•===============•=============•==========•=================• | Archi- | Version | User Mode | | | |----------•-----------------| | tecture | Notepad++ | Time | Ratio x32/x64 | •===============•=============•==========•=================• | Win XP x32 | 7.9.2 | 18.7 s | -/- | •===============•=============•==========•=================• | Win 10 x32 | 7.9.2 | 10.5 s | | •---------------•-------------•----------• 2.56 | | Win 10 x64 | 7.9.2 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 7.9.5 | 10.3 s | | •---------------•-------------•----------• 2.51 | | Win 10 x64 | 7.9.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.0 | 38.5 s | | •---------------•-------------•----------• 1.41 | | Win 10 x64 | 8.0 | 27.4 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.5 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.5 | 4.1 s | | •===============•=============•==========•=================• | Win 10 x32 | 8.1.9.2 | 10.4 s | | •---------------•-------------•----------• 2.54 | | Win 10 x64 | 8.1.9.2 | 4.1 s | | •===============•=============•==========•=================•
Interpretation of the results :
Again, if we except the special case of the
v8.0
version :-
The results, whatever the version, are quite similar, for each case (
x32
andx64
) -
The ratio
x32/x64
is similar to the one of my previous post (~ 2.52
) !
Best Regards,
guy038
-