Community
    • Login

    Poor performance removing blank lines

    Scheduled Pinned Locked Moved General Discussion
    9 Posts 4 Posters 828 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • endolithE
      endolith
      last edited by

      With a 600,000 line .I preprocessor file:

      Edit→Line Operations→Remove Empty Lines? >3 minutes

      Ctrl+A→TextFX→TextFX Edit→Delete Blank Lines? 1 second

      :/

      Alan KilbornA PeterJonesP 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @endolith
        last edited by

        @endolith

        3 minutes does seem excessive.
        Maybe still beats the time it would take to do it by hand, though. :-)

        What’s your timing for a regex replacement operation with that data?

        find: ^\R
        repl: nothing
        search mode: Regular expression

        1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @endolith
          last edited by

          @endolith ,

          For me, a 1M line file, about 30% blank and 70% with 300-character lines, in Notepad++ v7.9.5-32bit, the builtin action took no more than 30sec, whereas the TextFX took considerably longer (multiple minutes).

          The exact durations may depend on the density of text and maybe other factors.

          My guess is that if there’s a difference in time for you, and if Notepad++ really is slower, that it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

          But again, my experiment showed that the TextFX took considerably longer.

          Alan KilbornA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA
            Alan Kilborn @PeterJones
            last edited by

            @PeterJones said in Poor performance removing blank lines:

            it’s because ::removeEmptyLine() code invokes the regex engine, rather than looking at the lines manually.

            You raise an interesting point here.
            Which choice (of those two) would be faster/slower, on a fixed data set?

            It is also interesting that Notepad++ uses the regex ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R, but that could be misleading as well.

            (Note that I don’t care one iota about the obsolete TextFX)

            PeterJonesP 1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones @Alan Kilborn
              last edited by

              @Alan-Kilborn said in Poor performance removing blank lines:

              (Note that I don’t care one iota about the obsolete TextFX)

              The reason I cared enough to install it on a 32-bit NPP was that, if I had confirmed that TextFX too 1/180th of the time as the builtin, I was going to suggest to the developers that they look into the algorithm that TextFX used and see if they could borrow from it. But since it was slower in my experiments, there isn’t anything “magical” about their algorithm.

              I also tried

              def delTrulyEmpty(contents, lineNumber, totalLines):
                  if contents.strip('\r\n') == "":
                      editor.deleteLine(lineNumber)
              
              editor.beginUndoAction()
              editor.forEachLine(delTrulyEmpty)
              editor.endUndoAction()
              

              … but that was the slowest so far.

              ^$(\\r\\n|\\r|\\n) which seems like it would be more “effort” than ^\R

              It probably depends on how \R is defined under the Boost regex engine’s hood.

              1 Reply Last reply Reply Quote 2
              • EkopalypseE
                Ekopalypse
                last edited by

                This is quite fast

                editor.setText(''.join(x for x in editor.getText().splitlines(True) if x.strip() != ''))
                
                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @Ekopalypse
                  last edited by

                  @Ekopalypse said in Poor performance removing blank lines:

                  This is quite fast

                  It almost seems like we need a sample file and then some benchmarking, for all the solutions proposed. :-)

                  Eko’s one-liner removes empty lines AND lines containing only whitespace. Since N++ makes a distinction (by having two separate menu commands) for those, maybe a one-liner for removing only empty lines is in order?

                  I don’t know if it is totally correct, but I came up with this one:

                  editor.setText('\r\n'.join(x for x in editor.getText().splitlines() if x != ''))
                  
                  EkopalypseE 1 Reply Last reply Reply Quote 1
                  • EkopalypseE
                    Ekopalypse @Alan Kilborn
                    last edited by Ekopalypse

                    @Alan-Kilborn

                    how about

                    editor.setText('A line with some content\n\n'*1000000)
                    

                    ?

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @Ekopalypse
                      last edited by

                      @Ekopalypse

                      Well, I guess.
                      Sometimes there’s an art to data creation, though.
                      For instance, perhaps really long lines impact how fast something will run.
                      Perhaps the ratio of empty to non-empty lines makes a difference.
                      Perhaps…perhaps…perhaps…

                      I suppose it would have been best to have the OP’s data file, since it was the one that had the original complaint…

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors