Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Poor performance removing blank lines

    General Discussion
    4
    9
    139
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • endolith
      endolith last edited by

      With a 600,000 line .I preprocessor file:

      Edit→Line Operations→Remove Empty Lines? >3 minutes

      Ctrl+A→TextFX→TextFX Edit→Delete Blank Lines? 1 second

      :/

      Alan Kilborn PeterJones 2 Replies Last reply Reply Quote 0
      • Alan Kilborn
        Alan Kilborn @endolith last edited by

        @endolith

        3 minutes does seem excessive.
        Maybe still beats the time it would take to do it by hand, though. :-)

        What’s your timing for a regex replacement operation with that data?

        find: ^\R
        repl: nothing
        search mode: Regular expression

        1 Reply Last reply Reply Quote 0
        • PeterJones
          PeterJones @endolith last edited by

          @endolith ,

          For me, a 1M line file, about 30% blank and 70% with 300-character lines, in Notepad++ v7.9.5-32bit, the builtin action took no more than 30sec, whereas the TextFX took considerably longer (multiple minutes).

          The exact durations may depend on the density of text and maybe other factors.

          My guess is that if there’s a difference in time for you, and if Notepad++ really is slower, that it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

          But again, my experiment showed that the TextFX took considerably longer.

          Alan Kilborn 1 Reply Last reply Reply Quote 2
          • Alan Kilborn
            Alan Kilborn @PeterJones last edited by

            @PeterJones said in Poor performance removing blank lines:

            it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

            You raise an interesting point here.
            Which choice (of those two) would be faster/slower, on a fixed data set?

            It is also interesting that Notepad++ uses the regex ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R, but that could be misleading as well.

            (Note that I don’t care one iota about the obsolete TextFX)

            PeterJones 1 Reply Last reply Reply Quote 2
            • PeterJones
              PeterJones @Alan Kilborn last edited by

              @Alan-Kilborn said in Poor performance removing blank lines:

              (Note that I don’t care one iota about the obsolete TextFX)

              The reason I cared enough to install it on a 32-bit NPP was that, if I had confirmed that TextFX too 1/180th of the time as the builtin, I was going to suggest to the developers that they look into the algorithm that TextFX used and see if they could borrow from it. But since it was slower in my experiments, there isn’t anything “magical” about their algorithm.

              I also tried

              def delTrulyEmpty(contents, lineNumber, totalLines):
                  if contents.strip('\r\n') == "":
                      editor.deleteLine(lineNumber)
              
              editor.beginUndoAction()
              editor.forEachLine(delTrulyEmpty)
              editor.endUndoAction()
              

              … but that was the slowest so far.

              ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R

              It probably depends on how \R is defined under the Boost regex engine’s hood.

              1 Reply Last reply Reply Quote 2
              • Ekopalypse
                Ekopalypse last edited by

                This is quite fast

                editor.setText(''.join(x for x in editor.getText().splitlines(True) if x.strip() != ''))
                
                Alan Kilborn 1 Reply Last reply Reply Quote 1
                • Alan Kilborn
                  Alan Kilborn @Ekopalypse last edited by

                  @Ekopalypse said in Poor performance removing blank lines:

                  This is quite fast

                  It almost seems like we need a sample file and then some benchmarking, for all the solutions proposed. :-)

                  Eko’s one-liner removes empty lines AND lines containing only whitespace. Since N++ makes a distinction (by having two separate menu commands) for those, maybe a one-liner for removing only empty lines is in order?

                  I don’t know if it is totally correct, but I came up with this one:

                  editor.setText('\r\n'.join(x for x in editor.getText().splitlines() if x != ''))
                  
                  Ekopalypse 1 Reply Last reply Reply Quote 1
                  • Ekopalypse
                    Ekopalypse @Alan Kilborn last edited by Ekopalypse

                    @Alan-Kilborn

                    how about

                    editor.setText('A line with some content\n\n'*1000000)
                    

                    ?

                    Alan Kilborn 1 Reply Last reply Reply Quote 0
                    • Alan Kilborn
                      Alan Kilborn @Ekopalypse last edited by

                      @Ekopalypse

                      Well, I guess.
                      Sometimes there’s an art to data creation, though.
                      For instance, perhaps really long lines impact how fast something will run.
                      Perhaps the ratio of empty to non-empty lines makes a difference.
                      Perhaps…perhaps…perhaps…

                      I suppose it would have been best to have the OP’s data file, since it was the one that had the original complaint…

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright © 2014 NodeBB Forums | Contributors