• Login
Community
  • Login

Poor performance removing blank lines

Scheduled Pinned Locked Moved General Discussion
9 Posts 4 Posters 851 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    endolith
    last edited by Apr 21, 2021, 6:36 PM

    With a 600,000 line .I preprocessor file:

    Edit→Line Operations→Remove Empty Lines? >3 minutes

    Ctrl+A→TextFX→TextFX Edit→Delete Blank Lines? 1 second

    :/

    A P 2 Replies Last reply Apr 21, 2021, 7:34 PM Reply Quote 0
    • A
      Alan Kilborn @endolith
      last edited by Apr 21, 2021, 7:34 PM

      @endolith

      3 minutes does seem excessive.
      Maybe still beats the time it would take to do it by hand, though. :-)

      What’s your timing for a regex replacement operation with that data?

      find: ^\R
      repl: nothing
      search mode: Regular expression

      1 Reply Last reply Reply Quote 0
      • P
        PeterJones @endolith
        last edited by Apr 21, 2021, 7:35 PM

        @endolith ,

        For me, a 1M line file, about 30% blank and 70% with 300-character lines, in Notepad++ v7.9.5-32bit, the builtin action took no more than 30sec, whereas the TextFX took considerably longer (multiple minutes).

        The exact durations may depend on the density of text and maybe other factors.

        My guess is that if there’s a difference in time for you, and if Notepad++ really is slower, that it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

        But again, my experiment showed that the TextFX took considerably longer.

        A 1 Reply Last reply Apr 21, 2021, 7:44 PM Reply Quote 2
        • A
          Alan Kilborn @PeterJones
          last edited by Apr 21, 2021, 7:44 PM

          @PeterJones said in Poor performance removing blank lines:

          it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

          You raise an interesting point here.
          Which choice (of those two) would be faster/slower, on a fixed data set?

          It is also interesting that Notepad++ uses the regex ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R, but that could be misleading as well.

          (Note that I don’t care one iota about the obsolete TextFX)

          P 1 Reply Last reply Apr 21, 2021, 7:54 PM Reply Quote 2
          • P
            PeterJones @Alan Kilborn
            last edited by Apr 21, 2021, 7:54 PM

            @Alan-Kilborn said in Poor performance removing blank lines:

            (Note that I don’t care one iota about the obsolete TextFX)

            The reason I cared enough to install it on a 32-bit NPP was that, if I had confirmed that TextFX too 1/180th of the time as the builtin, I was going to suggest to the developers that they look into the algorithm that TextFX used and see if they could borrow from it. But since it was slower in my experiments, there isn’t anything “magical” about their algorithm.

            I also tried

            def delTrulyEmpty(contents, lineNumber, totalLines):
                if contents.strip('\r\n') == "":
                    editor.deleteLine(lineNumber)
            
            editor.beginUndoAction()
            editor.forEachLine(delTrulyEmpty)
            editor.endUndoAction()
            

            … but that was the slowest so far.

            ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R

            It probably depends on how \R is defined under the Boost regex engine’s hood.

            1 Reply Last reply Reply Quote 2
            • E
              Ekopalypse
              last edited by Apr 23, 2021, 2:52 PM

              This is quite fast

              editor.setText(''.join(x for x in editor.getText().splitlines(True) if x.strip() != ''))
              
              A 1 Reply Last reply Apr 23, 2021, 3:27 PM Reply Quote 1
              • A
                Alan Kilborn @Ekopalypse
                last edited by Apr 23, 2021, 3:27 PM

                @Ekopalypse said in Poor performance removing blank lines:

                This is quite fast

                It almost seems like we need a sample file and then some benchmarking, for all the solutions proposed. :-)

                Eko’s one-liner removes empty lines AND lines containing only whitespace. Since N++ makes a distinction (by having two separate menu commands) for those, maybe a one-liner for removing only empty lines is in order?

                I don’t know if it is totally correct, but I came up with this one:

                editor.setText('\r\n'.join(x for x in editor.getText().splitlines() if x != ''))
                
                E 1 Reply Last reply Apr 23, 2021, 3:31 PM Reply Quote 1
                • E
                  Ekopalypse @Alan Kilborn
                  last edited by Ekopalypse Apr 23, 2021, 3:32 PM Apr 23, 2021, 3:31 PM

                  @Alan-Kilborn

                  how about

                  editor.setText('A line with some content\n\n'*1000000)
                  

                  ?

                  A 1 Reply Last reply Apr 23, 2021, 4:38 PM Reply Quote 0
                  • A
                    Alan Kilborn @Ekopalypse
                    last edited by Apr 23, 2021, 4:38 PM

                    @Ekopalypse

                    Well, I guess.
                    Sometimes there’s an art to data creation, though.
                    For instance, perhaps really long lines impact how fast something will run.
                    Perhaps the ratio of empty to non-empty lines makes a difference.
                    Perhaps…perhaps…perhaps…

                    I suppose it would have been best to have the OP’s data file, since it was the one that had the original complaint…

                    1 Reply Last reply Reply Quote 0
                    7 out of 9
                    • First post
                      7/9
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors