Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    regex replace performance regression

    Help wanted · · · – – – · · ·
    5
    15
    474
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • cmeriaux
      cmeriaux last edited by

      There is a performance regression when doing a “replace all” with regular expression option since version 8.0.

      It was supposed to be fixed in v8.1.4 but tests are not concluant and I’m still suffuring this regression.
      Am I the only one who is suffering this defect ?

      Here is the github issue https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860

      Target file to replace is a test.txt with 8000 lines, 1.5 Mo, and 48000 match.

      @guy038 as a master of regular expression, have you noticed something ?

      1 Reply Last reply Reply Quote 2
      • guy038
        guy038 last edited by guy038

        Hello, @cmeriaux and All,

        Sorry, I’ve been busy lately organizing a photo collection and I haven’t really followed the changes of Notepad++ since version 7.9.2 ( the last one supported by Win XP )

        Now, with my new Win 10 Pro laptop, I’m going to install the two 32 and 64 bits architectures of the 3 portable versions stable of the 7.9.5, 8.1.5 and 8.1.9.2 releases of Notepad++ and I’m going to do some tests !

        Note that I’ll be away next weekend and that my tests could overflow into next week !

        Last thing : Probably, I’m unconsciously at the origin of all these performance problems because I’m the author of the issue 9636 :-(

        Best Regards

        guy038

        P.S. :

        Regarding this topic ( not exhaustive ! ) :

        
        N++ 7.9.2             Stable        January    01
        N++ 7.9.3             Stable        February   15
        N++ 7.9.4             Stable        March      15
        N++ 7.9.5             Stable        March      23
        
        -->  Pull Request      9707         March      27      https://github.com/notepad-plus-plus/notepad-plus-plus/pull/9707
        
        -->  INITIAL Issue     9636         April      04      https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636
        
        N++ 8.0.0             Stable        June        7
        
        -->  Pull request     10010         June       15      https://github.com/notepad-plus-plus/notepad-plus-plus/pull/10010
        
        N++ 8.1               Stable        June       17
        N++ 8.1.1             Stable        July       04
        N++ 8.1.2             Stable        July       19
        
        -->  Issue            10260         July       26      https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10260
        
        N++ 8.1.3             Stable        August     13
        
        -->  Issue            10398         August     17      https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10398
        
        -->  Fix of issue 9636 reverted     August     19      https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed
        
                                                                   Fix #10398, fix #10296, fix #10260, close #10403  ( This commit revert 86c66bb due to the boost REGEX performance issue.)
        
        N++ 8.1.4             Stable        August     25
        N++ 8.1.5             Stable        September  27
        
        N++ ( 8.1.6 )         Instable      October    13
        N++ ( 8.1.7 )         Instable      October    15
        N++ ( 8.1.8 )         Instable      October    19
        N++ ( 8.1.9 )         Instable      October    22
        N++ ( 8.1.9.1 )       Instable      November   13
        
        N++ 8.1.9.2           Stable        November   21
        
        1 Reply Last reply Reply Quote 3
        • guy038
          guy038 last edited by guy038

          Hi, @cmeriaux,

          Looking at your issue, you said :

          1. create a file test.txt with 8000 lines, 1.5 Mo, with 48000 match
          1. open replace panel, enable regular expression option, enable wrap around option
          1. Replace all “u_(\w+)” with “u\1”

          • Regarding point 3 I’m wondering about the regexes ! Are the syntax of both search and replace regex correct ?

          • Now, regarding point 1, could you tell me the general structure of this file ? May be, you could e-mail me this test file to … ( temporary displayed ! )

          Best Regards,

          guy038

          cmeriaux 1 Reply Last reply Reply Quote 1
          • cmeriaux
            cmeriaux @guy038 last edited by cmeriaux

            Hello @guy038 the file is available on github https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
            Another user hasn’t reproduce my issue. So I’m wondering of the veracity of my issue. I’ve tested with portable version of course, but Local Conf mode was OFF. So it may be linked to my configuration.

            The regexp seems stupids, but it’s just for the test !

            1 Reply Last reply Reply Quote 1
            • guy038
              guy038 last edited by guy038

              Hello, @cmeriaux and All,

              I used a file which contains five times your initial file, so 1 empty line at beginning + 47,520 lines ( 5 * 9,504 ). I used this protocol :

              • A recent Win 10 laptop with SSD, connected to the power supply and a cell phone for timing !

              • Tests with, both, x32 and x64 versions and, both, User and Administrator modes

              • Tested N++ versions : v7.9.2, v7.9.5, v8.1.5 and v8.1.9.2

              • I did, at least, 3 tries for each case !

              • The Replace All action changed, each time, 290,640 occurrences


              Practically :

              • I used the Regular expression mode and the Wrap around option

              • I left the Match case unticked

              • Ctrl + Home ( Back to the first empty line )

              • Ctrl + H ( Replace Dialog )

              - All + A ( Replace All operation ) + start timing

              After results :

              Esc to close the Replace dialog

              Ctrl + Z to undo the results

              and so on …


              I got this table :

                  •===============•=============•==========•=================•==========•=================•
                  |    Archi-     |   Version   |          User Mode         |     Administrator Mode     |
                  |               |             |----------•-----------------•----------•-----------------•--------------------•
                  |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |   Time   |  Ratio x32/x64  |  Ratio User/Admin  |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win XP  x32  |    7.9.2    |  14.4 s  |       -/-       |   -/-    |       -/-       |        -/-         |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    7.9.2    |  17.0 s  |                 |  17,0 s  |                 |        1.00        |
                  •---------------•-------------•----------•      2.58       •----------•      2.58       •--------------------•
                  |  Win 10  x64  |    7.9.2    |   6,6 s  |                 |   6.6 s  |                 |        1.00        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    7.9.5    |  16.5 s  |                 |  16.4 s  |                 |        1.00        |
                  •---------------•-------------•----------•      2.46       •----------•      2.48       •--------------------•
                  |  Win 10  x64  |    7.9.5    |   6.7 s  |                 |   6.6 s  |                 |        1.02        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    8.1.5    |  16.7 s  |                 |  16,65 s |                 |        1.00        |
                  •---------------•-------------•----------•      2.49       •----------•      2.50       •--------------------•
                  |  Win 10  x64  |    8.1.5    |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |   8.1.9.2   |  16.9 s  |                 |  16.85 s |                 |        1.00        |
                  •---------------•-------------•----------•      2.52       •----------•      2.53       •--------------------•
                  |  Win 10  x64  |   8.1.9.2   |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                  •===============•=============•==========•=================•==========•=================•====================•
              

              Note that, for fun, I added the Win XP x32 case, with 7.9.2, which is the last available version running Win XP. Not too bad, isn’t it ? ( For an old 1,70 Ghz mono-core, with 1 Gb memory ! )


              Interpretation of the results :

              • First, no significant change exists between user and admin mode !

              • Secondlly, each x64 version is, globally, 2.5 times speeder that its corresponding x32 one !

              • Thirdly, the small differences between the x32 versions, for one hand and the x64 versions for the other hand, are rather non significant and simply represent the measurement incertainties !


              In the end, regarding this test, no significant difference could be observed, for each category ( x32 and x64 )

              Best Regards,

              guy038

              Wesnesday or Thurday, I 'll run an other test of @scott-sumner with deletion of some lines !

              ArkadiuszMichalski 1 Reply Last reply Reply Quote 1
              • ArkadiuszMichalski
                ArkadiuszMichalski @guy038 last edited by

                @guy038
                32bit for 7.9.2 is faster than 64bit or it’s just order mistake?

                1 Reply Last reply Reply Quote 0
                • cmeriaux
                  cmeriaux last edited by

                  Thanks @guy038 for the full interesting report. The other conclusion is that my original issue is located on my side.
                  Cheers

                  1 Reply Last reply Reply Quote 0
                  • guy038
                    guy038 last edited by guy038

                    Hello, @cmeriaux, @Arkadiuszmichalski and All,

                    Forget the results of my previous post. You’ll find, below, an updated version, without the typo regarding versions for v7.9.2 and with new results for the v8.0 version !


                    I must add, that, regarding my customized preferences for this test, I used :

                    • Alternate icons ( General )

                    • Enable Multi-Editing, Enable Smooth font and Enable scrolling beyond last line ( Editing )

                    • Disable smart Highlingting ( Highlighting )

                    • Use Monospaced font in Find dalog ( Searching )

                    • Disable session snapshot and periodic backup and Backup on Save : None ( Backup )

                    • Auto-completion : Function completion ( Auto-completion )


                    So, my final list is :

                        •===============•=============•==========•=================•==========•=================•
                        |    Archi-     |   Version   |          User Mode         |     Administrator Mode     |
                        |               |             |----------•-----------------•----------•-----------------•--------------------•
                        |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |   Time   |  Ratio x32/x64  |  Ratio User/Admin  |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win XP  x32  |    7.9.2    |  14.4 s  |       -/-       |   -/-    |       -/-       |        -/-         |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win 10  x32  |    7.9.2    |  17.3 s  |                 |  17,3 s  |                 |        1.00        |
                        •---------------•-------------•----------•      2.58       •----------•      2.60       •--------------------•
                        |  Win 10  x64  |    7.9.2    |   6,7 s  |                 |   6.65 s |                 |        1.00        |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win 10  x32  |    7.9.5    |  16.5 s  |                 |  16.4 s  |                 |        1.00        |
                        •---------------•-------------•----------•      2.46       •----------•      2.48       •--------------------•
                        |  Win 10  x64  |    7.9.5    |   6.7 s  |                 |   6.6 s  |                 |        1.02        |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win 10  x32  |    8.0      |  63,0 s  |                 |  63,0 s  |                 |        1.00        |
                        •---------------•-------------•----------•      1.45       •----------•      1.470      •--------------------•
                        |  Win 10  x64  |    8.0      |  43.4 s  |                 |  43.0 s  |                 |        1.01        |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win 10  x32  |    8.1.5    |  16.7 s  |                 |  16,65 s |                 |        1.00        |
                        •---------------•-------------•----------•      2.49       •----------•      2.50       •--------------------•
                        |  Win 10  x64  |    8.1.5    |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                        •===============•=============•==========•=================•==========•=================•====================•
                        |  Win 10  x32  |   8.1.9.2   |  16.9 s  |                 |  16.85 s |                 |        1.00        |
                        •---------------•-------------•----------•      2.52       •----------•      2.53       •--------------------•
                        |  Win 10  x64  |   8.1.9.2   |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                        •===============•=============•==========•=================•==========•=================•====================•
                    

                    Note that, for fun, I added the Win XP x32 case, with 7.9.2, which is the last available version running Win XP. Not too bad, isn’t it ? ( For an old 1,70 Ghz mono-core, with 1 Gb memory ! )


                    Interpretation of the results :

                    • Note that, in 8.0 version, the new handling of accentuated chars, in regex replacement, was functional : unfortunately, the performance regression is obvious :-((

                    • But later, in the 8.1.5 version, due to this performance regression, the issue was reverted and the general performance was back again !

                    • So, except for the special v8.0 case :

                      • Firstly, no significant change exists between user and admin mode !

                      • Secondlly, each x64 version is, globally, 2.5 times speeder that its corresponding x32 one

                      • Thirdly, the small differences between the x32 versions, for one hand and the x64 versions for the other hand, are rather non significant and simply represent the measurement incertainties !


                    In the end :

                    • Performances were degraded from the v8.0 version till the v8.1.3 version, when the handling of non_ ASCII accentuated characters, in replacement, was enabled

                    • Else, no significant difference could be observed, for each category ( x32 and x64 )

                    Best Regards,

                    guy038

                    Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !

                    Alan Kilborn 1 Reply Last reply Reply Quote 1
                    • Alan Kilborn
                      Alan Kilborn @guy038 last edited by

                      @guy038 said in regex replace performance regression:

                      Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !

                      Hmm. How, exactly does one test Scott ??

                      1 Reply Last reply Reply Quote 1
                      • guy038
                        guy038 last edited by guy038

                        Hi, @alan-kilborn,

                        I did download a Scott-sumner’s file, named data8279.txt, which was still available, one week ago, about, on GitHub ! But, now, I can’t even remember in which issue or pull request I’ve had seen it :-((

                        I just can tell you that it uses a regex expression to delete any line containing the word NotepadPP

                        With the v8.1.9.2 (64 bits) version and the Match case option unticked, it deletes 203,236 occurrences, on a total of 300,000 lines, in 37,2 s. So, it remains, after replacement, 96,764 lines !. If the Match case option is enabled, it’s rather similar : 37,1 s !

                        BR

                        guy038

                        PeterJones 1 Reply Last reply Reply Quote 2
                        • PeterJones
                          PeterJones @guy038 last edited by

                          @guy038 said in regex replace performance regression:

                          data8279.txt

                          I searched the github issues for that: it’s in #8279 comment#743696790

                          1 Reply Last reply Reply Quote 3
                          • guy038
                            guy038 last edited by guy038

                            Hello, @peterjones and All,

                            Peter, Thanks for being able to find out this issue- comment !

                            Now, I realize that Scoot do a double-operation :

                            • Firstly, he performs a mark operation, with the Bookmark line option ticked, on word NotepadPP

                            • Secondly, he performs a Remove Bookmarked Lines operation

                            So, not exactly what I meant, before !

                            Be patient till Friday, as, like Tuesday, tomorrow is a nice sunny ski day for me. The second one since March 2019 !

                            BR

                            guy038

                            1 Reply Last reply Reply Quote 1
                            • guy038
                              guy038 last edited by guy038

                              @peterjones,

                              BTW, Peter, could you tell me which criteria did you use, in GitHub search, to get the right issue ?

                              Thanks in advance !

                              BR

                              guy038

                              PeterJones 1 Reply Last reply Reply Quote 1
                              • PeterJones
                                PeterJones @guy038 last edited by

                                @guy038 ,

                                I went to the issues search, removed the “closed” condition, and searched for the name of the file

                                https://github.com/notepad-plus-plus/notepad-plus-plus/issues?q=is%3Aissue+data8279.zip

                                (originally, I tried going through GitHub help files to see how to search comments for specific attachments; when I couldn’t find it, I decided to see if the simple plaintext search for the filename would work, hoping that either the name of the file was in plaintext in the comment, or that when it searched, it could see in the URL as well.)

                                1 Reply Last reply Reply Quote 2
                                • guy038
                                  guy038 last edited by guy038

                                  Hi, @cmeriaux, @peterjones, @alan-kilborn, @arkadiuszmichalski and All,

                                  So, I"m going on testing some examples of text, dealing with replacements, bookmarks and replacement modifiers ! Note that I did not consider the Admin case, rather identical !

                                  First, I used the @sasumner’s file data8279.txt ( 300,000 lines ) and I performed two types of text :

                                  • A global replacement of (?-s)^.*NotepadPP.*\R with Nothing, with the Wrap around option ticked ( First table, below )

                                  • A mark operation of the string NotepadPP, with the Bookmark line and Wrap around option ticked, but not the Match case one, followed with a Search > Bookmark > Remove Bookmarked Lines operation ( Second table, below )

                                  203,236 occurrences were deleted or were marked then deleted !

                                      •===============•=============•==========•=================•
                                      |    Archi-     |   Version   |          User Mode         |
                                      |               |             |----------•-----------------|
                                      |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |
                                      •===============•=============•==========•=================•
                                      |  Win XP  x32  |    7.9.2    |  65.0 s  |       -/-       |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    7.9.2    |  47.6 s  |                 |
                                      •---------------•-------------•----------•      1.27       |
                                      |  Win 10  x64  |    7.9.2    |  37.5 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    7.9.5    |  47.4 s  |                 |
                                      •---------------•-------------•----------•      1.27       |
                                      |  Win 10  x64  |    7.9.5    |  37.4 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    8.0      |  86.2 s  |                 |
                                      •---------------•-------------•----------•      1.22       |
                                      |  Win 10  x64  |    8.0      |  70.4 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    8.1.5    |  47.6 s  |                 |
                                      •---------------•-------------•----------•      1.27       |
                                      |  Win 10  x64  |    8.1.5    |  37.5 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |   8.1.9.2   |  47.5 s  |                 |
                                      •---------------•-------------•----------•      1.28       |
                                      |  Win 10  x64  |   8.1.9.2   |  37.2 s  |                 |
                                      •===============•=============•==========•=================•
                                  

                                      •===============•=============•====================================•
                                      |    Archi-     |   Version   |              User Mode             |
                                      |               |             |------------------•-----------------|
                                      |    tecture    |  Notepad++  |       Time       |  Ratio x32/x64  |
                                      •===============•=============•==================•=================•
                                      |  Win XP  x32  |    7.9.2    | 10.0 s + 64.2 s  |       -/-       |
                                      •===============•=============•==================•=================•
                                      |  Win 10  x32  |    7.9.2    |  4.9 s + 49.1 s  |                 |
                                      •---------------•-------------•------------------•      1.36       |
                                      |  Win 10  x64  |    7.9.2    |  2.1 s + 37.5 s  |                 |
                                      •===============•=============•==================•=================•
                                      |  Win 10  x32  |    7.9.5    |  4.8 s + 49.1 s  |                 |
                                      •---------------•-------------•------------------•      1.35       |
                                      |  Win 10  x64  |    7.9.5    |  2.3 s + 37.6 s  |                 |
                                      •===============•=============•==================•=================•
                                      |  Win 10  x32  |    8.0      | 20.0 s + 49.1 s  |                 |
                                      •---------------•-------------•------------------•      1.31       |
                                      |  Win 10  x64  |    8.0      | 14.8 s + 37.8 s  |                 |
                                      •===============•=============•==================•=================•
                                      |  Win 10  x32  |    8.1.5    |  4.8 s + 49.3 s  |                 |
                                      •---------------•-------------•------------------•      1.35       |
                                      |  Win 10  x64  |    8.1.5    |  2.3 s + 37.7 s  |                 |
                                      •===============•=============•==================•=================•
                                      |  Win 10  x32  |   8.1.9.2   |  4.8 s + 49.2 s  |                 |
                                      •---------------•-------------•------------------•      1.37       |
                                      |  Win 10  x64  |   8.1.9.2   |  2.1 s + 37.4 s  |                 |
                                      •===============•=============•==================•=================•
                                  

                                  In the second table, I decomposed the total time in two parts :

                                  • Time to bookmark the lines

                                  • Time to delete these lines

                                  • I summarized the two values before calculating the ratio x32/x64


                                  Interpretation of the results :

                                  If xe except the special case of the v8.0 version, the results are very similar, for the two tables :

                                  • In the first case, the more complicated regex (?-s)^.*NotepadPP.*\R decrease a bit the ratio between the x32 and x64 versions

                                  • In the second case, both the mark operation and the deletion of lines have an impact, but the ratio between the x32 and x64 versions is a bit better

                                  • Note that, regarding the v8.0 version, in the second table, the performance regression comes from the bad results of the mark operation only !


                                  I performed a last test, using the same Search and Replace regexes than in my initial issue :

                                  https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

                                  So the regex S/R :

                                  SEARCH \w

                                  REPLACE \U$0

                                  I, then, created a file containing 1,000 lines ( every odd ones ) with the French text :

                                  C’est là, près de la forêt, dans un gîte, où régnait un grand capharnaüm, que l’aïeul ôta sa flûte et son bâton de son canoë.

                                  And I added 1,000 English lines ( every even ones ) :

                                  Here is a example of text, containing the complete French set of accentuated characters, traditionally used.

                                  After replacement, 184,000 occurrences have been modified :

                                      •===============•=============•==========•=================•
                                      |    Archi-     |   Version   |          User Mode         |
                                      |               |             |----------•-----------------|
                                      |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |
                                      •===============•=============•==========•=================•
                                      |  Win XP  x32  |    7.9.2    |  18.7 s  |       -/-       |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    7.9.2    |  10.5 s  |                 |
                                      •---------------•-------------•----------•      2.56       |
                                      |  Win 10  x64  |    7.9.2    |   4.1 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    7.9.5    |  10.3 s  |                 |
                                      •---------------•-------------•----------•      2.51       |
                                      |  Win 10  x64  |    7.9.5    |   4.1 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    8.0      |  38.5 s  |                 |
                                      •---------------•-------------•----------•      1.41       |
                                      |  Win 10  x64  |    8.0      |  27.4 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |    8.1.5    |  10.4 s  |                 |
                                      •---------------•-------------•----------•      2.54       |
                                      |  Win 10  x64  |    8.1.5    |   4.1 s  |                 |
                                      •===============•=============•==========•=================•
                                      |  Win 10  x32  |   8.1.9.2   |  10.4 s  |                 |
                                      •---------------•-------------•----------•      2.54       |
                                      |  Win 10  x64  |   8.1.9.2   |   4.1 s  |                 |
                                      •===============•=============•==========•=================•
                                  

                                  Interpretation of the results :

                                  Again, if we except the special case of the v8.0 version :

                                  • The results, whatever the version, are quite similar, for each case ( x32 and x64 )

                                  • The ratio x32/x64 is similar to the one of my previous post ( ~ 2.52 ) !

                                  Best Regards,

                                  guy038

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  Copyright © 2014 NodeBB Forums | Contributors