Community
    • Login

    Find line above given text in document

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    27 Posts 9 Posters 3.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • CoisesC
      Coises @Coises
      last edited by Coises

      I wrote:

      I’ll have to see if I can test this with some large files

      I have a moderately large file (19,473 lines, 119,172,867 bytes) I sometimes use for testing. It does not contain any lines which begin Access is denied.

      This expression:
      (?-s)^.*$(?=\RAccess is denied)
      results in the complexity error message; this one:
      (?-s)^.*+(?=\RAccess is denied)
      correctly returns zero matches.

      So the regular expression engine does not optimize the first expression to the equivalent second expression.

      Terry RT 1 Reply Last reply Reply Quote 2
      • Terry RT
        Terry R @Coises
        last edited by Terry R

        @Coises said in Find line above given text in document:

        So the regular expression engine does not optimize the first expression to the equivalent second expression.

        So although the .*$ doesn’t appear to allow for backtracking, maybe that’s what the engine thinks is possible, hence the error. In the second the .*+ states "there CANNOT be any backtracking!

        So if the OP changed the $ to a + that alone might be sufficient to allow for the regex to work.

        Terry

        PS actually the more I think about it, the .*$ will allow for backtracking. My (Our) thinking likely has to change. Although $ is a meta-character, it still doesn’t have the “power” to command the engine to not backtrack, whereas the + does. It might even be such that any character at the $ position isn’t regarded as an anchor to prevent backtracking when/if deemed (possibly) needed by the engine. I’d be interested in going back over some of these errors reported and seeing if it is possible to add the possessive modifier and re-test.

        1 Reply Last reply Reply Quote 2
        • CoisesC
          Coises @Terry R
          last edited by

          @Terry-R said in Find line above given text in document:

          Since we all seemed unsure of why it would generate such a message from what “seemed” to be a simple find expression I thought I would do a small amount of testing to see if the .* or the lookahead was likely to blame.

          A truly strange result happened when I tried removing the lookahead and using plain old Count in the Find dialog. With my 19,473 line, 119,172,867 byte file, entering either of these expressions:
          (?-s)^.*$
          (?-s)^.*+
          into the Find window and pressing Count causes Notepad++ to hang (“Not Responding”). I’ve waited over six minutes before force closing.

          So, I tried to Count one of those expressions in the search in my Columns++ plugin, because (depending on the cause) that can show a progress meter for slow operations, and I wanted to get an idea what was happening.

          It completed, with the correct answer (19,473, the number of lines) in around one second. The result is the same with either expression. (The original expressions behave as in Notepad++: the version with the dollar sign gets a complexity error, the version with a plus sign works.)

          Now given that both use Boost::regex, I have no idea why Notepad++ hangs.

          CoisesC 1 Reply Last reply Reply Quote 0
          • CoisesC
            Coises @Coises
            last edited by Coises

            @Coises said in Find line above given text in document:

            A truly strange result happened when I tried removing the lookahead and using plain old Count in the Find dialog. With my 19,473 line, 119,172,867 byte file, entering either of these expressions:
            (?-s)^.*$
            (?-s)^.*+
            into the Find window and pressing Count causes Notepad++ to hang (“Not Responding”). I’ve waited over six minutes before force closing.

            Now given that both use Boost::regex, I have no idea why Notepad++ hangs.

            Ugh. Doesn’t hang in 8.6.6 portable. Which gives me a bad feeling it could be related to PR #16208.

            Edit to add: Reported in 8.7.9 announcement as a regression. Still studying the cause.

            1 Reply Last reply Reply Quote 3
            • EkopalypseE
              Ekopalypse @guy038
              last edited by

              @guy038 said in Find line above given text in document:

              Oh, My God, I’ve been beaten by @ekopalypse :-((

              What’s that suppose to mean?? Lol - just kidding :-D

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hello, @ekopalypse,

                May be, it’s a language barrier ! It was, in no way, offensive to you !

                I just wanted to say that your example seemed closer to @benji2025 case and that is was impossible for me to compete with you ( 120,000,000 lines !) ;-)))

                BR

                guy038

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by

                  Hi, @benji2025, @ekopalypse, @alan-kilborn, @coises and All,

                  I did additional tests :

                  First, if the Word wrap option is enabled, using the same regex as before, with the Match Case option checked, the BookMarking operation took about the same time : 23.2 seconds

                  Secondly, after the BookMarking operation if you re-run the same regex, the whole operation is done in 16 seconds. This seems logical because n++ does not have to re-bookmark the already bookmarked lines !

                  Thirdly, if I do not check the Bookmark line option, in the Mark dialog, the Marking operation is a bit quicker : 19,4 seconds

                  Fourthly, if I select all the contents of the test file ( => the In selection box is automatically checked ) the operation is a bit slower : 23,8 seconds


                  Now, for all the tests below, I used these rules :

                  • The Word wrap option is unchecked

                  • In the Mark dialog, the Bookmark line option is checked and all the other box options are unchecked.

                  • Generally the Match case option is unchecked but may be checked in few occasions.

                  • Before each search, I hit the Clear all marks button and place the caret at the very beginning of the test file.

                  • I did my tests twice : on my old XP machine, with N++ v7.9.2 and on my new W10 laptop, with N++ v8.7.6 ( @coises, I avoided, on purpose, using the v8.7.8 and v8.7.9 releases ! )

                  • Each time, I opened N++, from a command prompt window, with the command Notepad++ -nosession Benji.txt Test_Benji.txt, so with only these two files.

                  • For the W10 test, I simply used an USB key containing the portable N++ v8.7.6 release and the test file.

                  -------------------------- NEC XP - N++ v7.9.2 -------------------------------------------------------------------------------------------------------
                  
                  (?-s)^.*\R(?=TEST$)             24   s                         Option 'Match Case' unchecked   
                  
                  (?-s)^.*\R(?=TEST$)             23.1 s                         Option 'Match Case' checked    ( The test in my **previous** post )
                  
                  (?-s)^.+\R(?=TEST$)             23.9 s                         Option 'Match Case' unchecked   
                  
                  (?-s)^.*+\R(?=TEST$)            19.9 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\R(?=TEST$)            19.8 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\r\n(?=TEST$)          18.6 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\r\n(?=TEST$)          17.7 s   ( Atomic )            Option 'Match Case' checked     
                  
                  (?-is)^.++\r\n(?=TEST$)         69   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  --- Without the ^ symbol ---
                  
                  (?-s).*\R(?=TEST$)             25.6 s                         Option 'Match Case' unchecked   
                  
                  (?-s).+\R(?=TEST$)             23.1 s                         Option 'Match Case' unchecked   
                  
                  (?-s).*+\R(?=TEST$)            21.5 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\R(?=TEST$)            19.2 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\r\n(?=TEST$)          18.1 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\r\n(?=TEST$)          17.25 s  ( Atomic )            Option 'Match Case' checked     
                  
                  (?-is).++\r\n(?=TEST$)        237   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  --- Without the (?-s)^ part ---
                  
                  .*\R(?=TEST$)                  25.1 s                         Option 'Match Case' unchecked   
                  
                  .+\R(?=TEST$)                  22.8 s                         Option 'Match Case' unchecked   
                  
                  .*+\R(?=TEST$)                 21.2 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\R(?=TEST$)                 18.9 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\r\n(?=TEST$)               17.8 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\r\n(?=TEST$)               17   s   ( Atomic )            Option 'Match Case' checked     
                  
                  (?-i).++\r\n(?=TEST$)         236   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  --- Using the @Terry-R solution ---
                  
                  (?-s).\R(?=TEST$)              77   s               ( ?! )    Option 'Match Case' unchecked   
                  
                  (?-s).\r\n(?=TEST$)            71   s               ( ?! )    Option 'Match Case' unchecked   
                  
                  
                  (?-s).{1}+\R(?=TEST$)          93   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  (?-s).{1}+\r\n(?=TEST$)        84   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  
                  .{1}+\R(?=TEST$)               88   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  .{1}+\r\n(?=TEST$)             79   s   ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  --- After CONCATENATION of the line BEFORE the line TEST with the line TEST ---
                  
                  First, the regex \R(?=TEST$) is replaced with NOTHING ( 57 s ) => 142,908,950 bytes for 3,030,301 lines. Then :
                  
                  TEST$                          20.8 s                         Option 'Match Case' unchecked   
                  
                  TEST$                          13.6 s                         Option 'Match Case' checked     
                  
                  (?-i)TEST$                     13.8 s                         Option 'Match Case' unchecked   
                  
                  Last, the regex TEST$ is replaced with \r\n$0  ( 65 s )
                  
                  -------------------------- HP Win 10 - N++ 8.7.6 -----------------------------------------------------------------------------------------------------
                  
                  (?-s)^.*\R(?=TEST$)             2,3  s                         Option 'Match Case' unchecked   
                  
                  (?-s)^.*\R(?=TEST$)             2    s                         Option 'Match Case' checked    ( The test in my **previous** post )
                  
                  (?-s)^.+\R(?=TEST$)             2.3  s                         Option 'Match Case' unchecked   
                  
                  (?-s)^.*+\R(?=TEST$)            1.9  s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\R(?=TEST$)            1.97 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\r\n(?=TEST$)          1.86 s   ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s)^.++\r\n(?=TEST$)          1.5  s   ( Atomic )            Option 'Match Case' checked     
                  
                  (?-is)^.++\r\n(?=TEST$)         5.8  s   ( Atomic ) ( ! )      Option 'Match Case' unchecked   
                  
                  ---
                  
                  --- Without the ^ symbol ---
                  
                  (?-s).*\R(?=TEST$)              2.7  s                        Option 'Match Case' unchecked   
                  
                  (?-s).+\R(?=TEST$)              2.3  s                        Option 'Match Case' unchecked   
                  
                  (?-s).*+\R(?=TEST$)             2.3  s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\R(?=TEST$)             1.9  s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\r\n(?=TEST$)           1.8  s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  (?-s).++\r\n(?=TEST$)           1.45 s  ( Atomic )            Option 'Match Case' checked     
                  
                  (?-is).++\r\n(?=TEST$)         23.9  s  ( Atomic )  ( !? )    Option 'Match Case' unchecked   
                  
                  ---
                  
                  --- Without the (?-s)^ part ---
                  
                  .*\R(?=TEST$)                   2.7  s                        Option 'Match Case' unchecked   
                  
                  .+\R(?=TEST$)                   2.23 s                        Option 'Match Case' unchecked   
                  
                  .*+\R(?=TEST$)                  2.23 s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\R(?=TEST$)                  1.8  s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\r\n(?=TEST$)                1.7  s  ( Atomic )            Option 'Match Case' unchecked   
                  
                  .++\r\n(?=TEST$)                1.38 s  ( Atomic )            Option 'Match Case' checked     
                  
                  (?-i).++\r\n(?=TEST$)          24    s  ( Atomic )  ( ?! )    Option 'Match Case' unchecked   
                  
                  --- Using the @Terry-R solution ---
                  
                  (?-s).\R(?=TEST$)               6.35 s              ( ! )     Option 'Match Case' unchecked   
                  
                  (?-s).\r\n(?=TEST$)             8.6  s              ( ! )     Option 'Match Case' unchecked   
                  
                  
                  (?-s).{1}+\R(?=TEST$)           8.2  s  ( Atomic )  ( ! )     Option 'Match Case' unchecked   
                  
                  (?-s).{1}+\r\n(?=TEST$)        10.6  s  ( Atomic )  ( ! )     Option 'Match Case' unchecked   
                  
                  
                  .{1}+\R(?=TEST$)                7.5  s  ( Atomic )  ( ! )     Option 'Match Case' unchecked   
                  
                  .{1}+\r\n(?=TEST$)              9.75 s  ( Atomic )  ( ! )     Option 'Match Case' unchecked   
                  
                  --- After CONCATENATION of the line BEFORE the line TEST with the line TEST ---
                  
                  First, the regex \R(?=TEST$) is replaced with NOTHING ( 26.2 s ) => 142,908,950 bytes for 3,030,301 lines. Then :
                  
                  TEST$                           2.3  s                        Option 'Match Case' unchecked   
                  
                  TEST$                           0.95 s                        Option 'Match Case' checked     
                  
                  (?-i)TEST$                      1    s                        Option 'Match Case' unchecked   
                  
                  Last, the regex TEST$ is replaced with \r\n$0  ( 25.3 s )
                  

                  Conclusion :

                  So, given the rules above, the best syntaxes seem to be, on my new Windos 10 machine :

                  • The regex .++\r\n(?=TEST$) in 1.38 second, with the Match Case option checked.

                  • The regex TEST$ in 0.95 second, AFTER an initial contatenation of the line before the line TEST with the line TEST.

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors