Community
    • Login

    regex replace performance regression

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 5 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • cmeriauxC
      cmeriaux @guy038
      last edited by cmeriaux

      Hello @guy038 the file is available on github https://github.com/notepad-plus-plus/notepad-plus-plus/issues/10860
      Another user hasn’t reproduce my issue. So I’m wondering of the veracity of my issue. I’ve tested with portable version of course, but Local Conf mode was OFF. So it may be linked to my configuration.

      The regexp seems stupids, but it’s just for the test !

      1 Reply Last reply Reply Quote 1
      • guy038G
        guy038
        last edited by guy038

        Hello, @cmeriaux and All,

        I used a file which contains five times your initial file, so 1 empty line at beginning + 47,520 lines ( 5 * 9,504 ). I used this protocol :

        • A recent Win 10 laptop with SSD, connected to the power supply and a cell phone for timing !

        • Tests with, both, x32 and x64 versions and, both, User and Administrator modes

        • Tested N++ versions : v7.9.2, v7.9.5, v8.1.5 and v8.1.9.2

        • I did, at least, 3 tries for each case !

        • The Replace All action changed, each time, 290,640 occurrences


        Practically :

        • I used the Regular expression mode and the Wrap around option

        • I left the Match case unticked

        • Ctrl + Home ( Back to the first empty line )

        • Ctrl + H ( Replace Dialog )

        - All + A ( Replace All operation ) + start timing

        After results :

        Esc to close the Replace dialog

        Ctrl + Z to undo the results

        and so on …


        I got this table :

            •===============•=============•==========•=================•==========•=================•
            |    Archi-     |   Version   |          User Mode         |     Administrator Mode     |
            |               |             |----------•-----------------•----------•-----------------•--------------------•
            |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |   Time   |  Ratio x32/x64  |  Ratio User/Admin  |
            •===============•=============•==========•=================•==========•=================•====================•
            |  Win XP  x32  |    7.9.2    |  14.4 s  |       -/-       |   -/-    |       -/-       |        -/-         |
            •===============•=============•==========•=================•==========•=================•====================•
            |  Win 10  x32  |    7.9.2    |  17.0 s  |                 |  17,0 s  |                 |        1.00        |
            •---------------•-------------•----------•      2.58       •----------•      2.58       •--------------------•
            |  Win 10  x64  |    7.9.2    |   6,6 s  |                 |   6.6 s  |                 |        1.00        |
            •===============•=============•==========•=================•==========•=================•====================•
            |  Win 10  x32  |    7.9.5    |  16.5 s  |                 |  16.4 s  |                 |        1.00        |
            •---------------•-------------•----------•      2.46       •----------•      2.48       •--------------------•
            |  Win 10  x64  |    7.9.5    |   6.7 s  |                 |   6.6 s  |                 |        1.02        |
            •===============•=============•==========•=================•==========•=================•====================•
            |  Win 10  x32  |    8.1.5    |  16.7 s  |                 |  16,65 s |                 |        1.00        |
            •---------------•-------------•----------•      2.49       •----------•      2.50       •--------------------•
            |  Win 10  x64  |    8.1.5    |   6.7 s  |                 |   6.65 s |                 |        1.01        |
            •===============•=============•==========•=================•==========•=================•====================•
            |  Win 10  x32  |   8.1.9.2   |  16.9 s  |                 |  16.85 s |                 |        1.00        |
            •---------------•-------------•----------•      2.52       •----------•      2.53       •--------------------•
            |  Win 10  x64  |   8.1.9.2   |   6.7 s  |                 |   6.65 s |                 |        1.01        |
            •===============•=============•==========•=================•==========•=================•====================•
        

        Note that, for fun, I added the Win XP x32 case, with 7.9.2, which is the last available version running Win XP. Not too bad, isn’t it ? ( For an old 1,70 Ghz mono-core, with 1 Gb memory ! )


        Interpretation of the results :

        • First, no significant change exists between user and admin mode !

        • Secondlly, each x64 version is, globally, 2.5 times speeder that its corresponding x32 one !

        • Thirdly, the small differences between the x32 versions, for one hand and the x64 versions for the other hand, are rather non significant and simply represent the measurement incertainties !


        In the end, regarding this test, no significant difference could be observed, for each category ( x32 and x64 )

        Best Regards,

        guy038

        Wesnesday or Thurday, I 'll run an other test of @scott-sumner with deletion of some lines !

        ArkadiuszMichalskiA 1 Reply Last reply Reply Quote 1
        • ArkadiuszMichalskiA
          ArkadiuszMichalski @guy038
          last edited by

          @guy038
          32bit for 7.9.2 is faster than 64bit or it’s just order mistake?

          1 Reply Last reply Reply Quote 0
          • cmeriauxC
            cmeriaux
            last edited by

            Thanks @guy038 for the full interesting report. The other conclusion is that my original issue is located on my side.
            Cheers

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, @cmeriaux, @Arkadiuszmichalski and All,

              Forget the results of my previous post. You’ll find, below, an updated version, without the typo regarding versions for v7.9.2 and with new results for the v8.0 version !


              I must add, that, regarding my customized preferences for this test, I used :

              • Alternate icons ( General )

              • Enable Multi-Editing, Enable Smooth font and Enable scrolling beyond last line ( Editing )

              • Disable smart Highlingting ( Highlighting )

              • Use Monospaced font in Find dalog ( Searching )

              • Disable session snapshot and periodic backup and Backup on Save : None ( Backup )

              • Auto-completion : Function completion ( Auto-completion )


              So, my final list is :

                  •===============•=============•==========•=================•==========•=================•
                  |    Archi-     |   Version   |          User Mode         |     Administrator Mode     |
                  |               |             |----------•-----------------•----------•-----------------•--------------------•
                  |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |   Time   |  Ratio x32/x64  |  Ratio User/Admin  |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win XP  x32  |    7.9.2    |  14.4 s  |       -/-       |   -/-    |       -/-       |        -/-         |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    7.9.2    |  17.3 s  |                 |  17,3 s  |                 |        1.00        |
                  •---------------•-------------•----------•      2.58       •----------•      2.60       •--------------------•
                  |  Win 10  x64  |    7.9.2    |   6,7 s  |                 |   6.65 s |                 |        1.00        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    7.9.5    |  16.5 s  |                 |  16.4 s  |                 |        1.00        |
                  •---------------•-------------•----------•      2.46       •----------•      2.48       •--------------------•
                  |  Win 10  x64  |    7.9.5    |   6.7 s  |                 |   6.6 s  |                 |        1.02        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    8.0      |  63,0 s  |                 |  63,0 s  |                 |        1.00        |
                  •---------------•-------------•----------•      1.45       •----------•      1.470      •--------------------•
                  |  Win 10  x64  |    8.0      |  43.4 s  |                 |  43.0 s  |                 |        1.01        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |    8.1.5    |  16.7 s  |                 |  16,65 s |                 |        1.00        |
                  •---------------•-------------•----------•      2.49       •----------•      2.50       •--------------------•
                  |  Win 10  x64  |    8.1.5    |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                  •===============•=============•==========•=================•==========•=================•====================•
                  |  Win 10  x32  |   8.1.9.2   |  16.9 s  |                 |  16.85 s |                 |        1.00        |
                  •---------------•-------------•----------•      2.52       •----------•      2.53       •--------------------•
                  |  Win 10  x64  |   8.1.9.2   |   6.7 s  |                 |   6.65 s |                 |        1.01        |
                  •===============•=============•==========•=================•==========•=================•====================•
              

              Note that, for fun, I added the Win XP x32 case, with 7.9.2, which is the last available version running Win XP. Not too bad, isn’t it ? ( For an old 1,70 Ghz mono-core, with 1 Gb memory ! )


              Interpretation of the results :

              • Note that, in 8.0 version, the new handling of accentuated chars, in regex replacement, was functional : unfortunately, the performance regression is obvious :-((

              • But later, in the 8.1.5 version, due to this performance regression, the issue was reverted and the general performance was back again !

              • So, except for the special v8.0 case :

                • Firstly, no significant change exists between user and admin mode !

                • Secondlly, each x64 version is, globally, 2.5 times speeder that its corresponding x32 one

                • Thirdly, the small differences between the x32 versions, for one hand and the x64 versions for the other hand, are rather non significant and simply represent the measurement incertainties !


              In the end :

              • Performances were degraded from the v8.0 version till the v8.1.3 version, when the handling of non_ ASCII accentuated characters, in replacement, was enabled

              • Else, no significant difference could be observed, for each category ( x32 and x64 )

              Best Regards,

              guy038

              Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @guy038
                last edited by

                @guy038 said in regex replace performance regression:

                Friday, I’ll run an other test, of @scott-sumner, with deletion of some lines !

                Hmm. How, exactly does one test Scott ??

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @alan-kilborn,

                  I did download a Scott-sumner’s file, named data8279.txt, which was still available, one week ago, about, on GitHub ! But, now, I can’t even remember in which issue or pull request I’ve had seen it :-((

                  I just can tell you that it uses a regex expression to delete any line containing the word NotepadPP

                  With the v8.1.9.2 (64 bits) version and the Match case option unticked, it deletes 203,236 occurrences, on a total of 300,000 lines, in 37,2 s. So, it remains, after replacement, 96,764 lines !. If the Match case option is enabled, it’s rather similar : 37,1 s !

                  BR

                  guy038

                  PeterJonesP 1 Reply Last reply Reply Quote 2
                  • PeterJonesP
                    PeterJones @guy038
                    last edited by

                    @guy038 said in regex replace performance regression:

                    data8279.txt

                    I searched the github issues for that: it’s in #8279 comment#743696790

                    1 Reply Last reply Reply Quote 3
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @peterjones and All,

                      Peter, Thanks for being able to find out this issue- comment !

                      Now, I realize that Scoot do a double-operation :

                      • Firstly, he performs a mark operation, with the Bookmark line option ticked, on word NotepadPP

                      • Secondly, he performs a Remove Bookmarked Lines operation

                      So, not exactly what I meant, before !

                      Be patient till Friday, as, like Tuesday, tomorrow is a nice sunny ski day for me. The second one since March 2019 !

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by guy038

                        @peterjones,

                        BTW, Peter, could you tell me which criteria did you use, in GitHub search, to get the right issue ?

                        Thanks in advance !

                        BR

                        guy038

                        PeterJonesP 1 Reply Last reply Reply Quote 1
                        • PeterJonesP
                          PeterJones @guy038
                          last edited by

                          @guy038 ,

                          I went to the issues search, removed the “closed” condition, and searched for the name of the file

                          https://github.com/notepad-plus-plus/notepad-plus-plus/issues?q=is%3Aissue+data8279.zip

                          (originally, I tried going through GitHub help files to see how to search comments for specific attachments; when I couldn’t find it, I decided to see if the simple plaintext search for the filename would work, hoping that either the name of the file was in plaintext in the comment, or that when it searched, it could see in the URL as well.)

                          1 Reply Last reply Reply Quote 2
                          • guy038G
                            guy038
                            last edited by guy038

                            Hi, @cmeriaux, @peterjones, @alan-kilborn, @arkadiuszmichalski and All,

                            So, I"m going on testing some examples of text, dealing with replacements, bookmarks and replacement modifiers ! Note that I did not consider the Admin case, rather identical !

                            First, I used the @sasumner’s file data8279.txt ( 300,000 lines ) and I performed two types of text :

                            • A global replacement of (?-s)^.*NotepadPP.*\R with Nothing, with the Wrap around option ticked ( First table, below )

                            • A mark operation of the string NotepadPP, with the Bookmark line and Wrap around option ticked, but not the Match case one, followed with a Search > Bookmark > Remove Bookmarked Lines operation ( Second table, below )

                            203,236 occurrences were deleted or were marked then deleted !

                                •===============•=============•==========•=================•
                                |    Archi-     |   Version   |          User Mode         |
                                |               |             |----------•-----------------|
                                |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |
                                •===============•=============•==========•=================•
                                |  Win XP  x32  |    7.9.2    |  65.0 s  |       -/-       |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    7.9.2    |  47.6 s  |                 |
                                •---------------•-------------•----------•      1.27       |
                                |  Win 10  x64  |    7.9.2    |  37.5 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    7.9.5    |  47.4 s  |                 |
                                •---------------•-------------•----------•      1.27       |
                                |  Win 10  x64  |    7.9.5    |  37.4 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    8.0      |  86.2 s  |                 |
                                •---------------•-------------•----------•      1.22       |
                                |  Win 10  x64  |    8.0      |  70.4 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    8.1.5    |  47.6 s  |                 |
                                •---------------•-------------•----------•      1.27       |
                                |  Win 10  x64  |    8.1.5    |  37.5 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |   8.1.9.2   |  47.5 s  |                 |
                                •---------------•-------------•----------•      1.28       |
                                |  Win 10  x64  |   8.1.9.2   |  37.2 s  |                 |
                                •===============•=============•==========•=================•
                            

                                •===============•=============•====================================•
                                |    Archi-     |   Version   |              User Mode             |
                                |               |             |------------------•-----------------|
                                |    tecture    |  Notepad++  |       Time       |  Ratio x32/x64  |
                                •===============•=============•==================•=================•
                                |  Win XP  x32  |    7.9.2    | 10.0 s + 64.2 s  |       -/-       |
                                •===============•=============•==================•=================•
                                |  Win 10  x32  |    7.9.2    |  4.9 s + 49.1 s  |                 |
                                •---------------•-------------•------------------•      1.36       |
                                |  Win 10  x64  |    7.9.2    |  2.1 s + 37.5 s  |                 |
                                •===============•=============•==================•=================•
                                |  Win 10  x32  |    7.9.5    |  4.8 s + 49.1 s  |                 |
                                •---------------•-------------•------------------•      1.35       |
                                |  Win 10  x64  |    7.9.5    |  2.3 s + 37.6 s  |                 |
                                •===============•=============•==================•=================•
                                |  Win 10  x32  |    8.0      | 20.0 s + 49.1 s  |                 |
                                •---------------•-------------•------------------•      1.31       |
                                |  Win 10  x64  |    8.0      | 14.8 s + 37.8 s  |                 |
                                •===============•=============•==================•=================•
                                |  Win 10  x32  |    8.1.5    |  4.8 s + 49.3 s  |                 |
                                •---------------•-------------•------------------•      1.35       |
                                |  Win 10  x64  |    8.1.5    |  2.3 s + 37.7 s  |                 |
                                •===============•=============•==================•=================•
                                |  Win 10  x32  |   8.1.9.2   |  4.8 s + 49.2 s  |                 |
                                •---------------•-------------•------------------•      1.37       |
                                |  Win 10  x64  |   8.1.9.2   |  2.1 s + 37.4 s  |                 |
                                •===============•=============•==================•=================•
                            

                            In the second table, I decomposed the total time in two parts :

                            • Time to bookmark the lines

                            • Time to delete these lines

                            • I summarized the two values before calculating the ratio x32/x64


                            Interpretation of the results :

                            If xe except the special case of the v8.0 version, the results are very similar, for the two tables :

                            • In the first case, the more complicated regex (?-s)^.*NotepadPP.*\R decrease a bit the ratio between the x32 and x64 versions

                            • In the second case, both the mark operation and the deletion of lines have an impact, but the ratio between the x32 and x64 versions is a bit better

                            • Note that, regarding the v8.0 version, in the second table, the performance regression comes from the bad results of the mark operation only !


                            I performed a last test, using the same Search and Replace regexes than in my initial issue :

                            https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

                            So the regex S/R :

                            SEARCH \w

                            REPLACE \U$0

                            I, then, created a file containing 1,000 lines ( every odd ones ) with the French text :

                            C’est là, près de la forêt, dans un gîte, où régnait un grand capharnaüm, que l’aïeul ôta sa flûte et son bâton de son canoë.

                            And I added 1,000 English lines ( every even ones ) :

                            Here is a example of text, containing the complete French set of accentuated characters, traditionally used.

                            After replacement, 184,000 occurrences have been modified :

                                •===============•=============•==========•=================•
                                |    Archi-     |   Version   |          User Mode         |
                                |               |             |----------•-----------------|
                                |    tecture    |  Notepad++  |   Time   |  Ratio x32/x64  |
                                •===============•=============•==========•=================•
                                |  Win XP  x32  |    7.9.2    |  18.7 s  |       -/-       |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    7.9.2    |  10.5 s  |                 |
                                •---------------•-------------•----------•      2.56       |
                                |  Win 10  x64  |    7.9.2    |   4.1 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    7.9.5    |  10.3 s  |                 |
                                •---------------•-------------•----------•      2.51       |
                                |  Win 10  x64  |    7.9.5    |   4.1 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    8.0      |  38.5 s  |                 |
                                •---------------•-------------•----------•      1.41       |
                                |  Win 10  x64  |    8.0      |  27.4 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |    8.1.5    |  10.4 s  |                 |
                                •---------------•-------------•----------•      2.54       |
                                |  Win 10  x64  |    8.1.5    |   4.1 s  |                 |
                                •===============•=============•==========•=================•
                                |  Win 10  x32  |   8.1.9.2   |  10.4 s  |                 |
                                •---------------•-------------•----------•      2.54       |
                                |  Win 10  x64  |   8.1.9.2   |   4.1 s  |                 |
                                •===============•=============•==========•=================•
                            

                            Interpretation of the results :

                            Again, if we except the special case of the v8.0 version :

                            • The results, whatever the version, are quite similar, for each case ( x32 and x64 )

                            • The ratio x32/x64 is similar to the one of my previous post ( ~ 2.52 ) !

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors