Community
    • Login

    Remove duplicate lines removes end of line marks and sorting moves them

    Scheduled Pinned Locked Moved General Discussion
    5 Posts 4 Posters 422 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mkupperM
      mkupper
      last edited by

      This is with v8.6.4 (32-bit and 64-bit) and v8.3.3 (32-bit) meaning it’s not a recent regression.

      I have been getting hit by missing end of line marks and so today was prompted to understand what I was doing that leads to their loss. Before posting the issue to github I wanted some confirmation/feedback.

      I have a sorted set of lines and now want to merge in a new set of lines that also has duplicates. My practice as been to

      • Select the original set of sorted lines and cut/paste them into a new tab.
      • Select the set of new lines I want to merge into the sorted list and cut/paste that below the lines in the new tab.
      • In the new tab I’d do Remove Duplicate Lines, Sort Ascending, end then cut/paste the results back into the first file. It turns out that both the Remove Duplicate Lines and Sort Ascending operations can remove end of line marks.

      With this data:

      Line 1
      
      Line 1
      Line 4
      

      If I do Edit / Line Operations / Remove Duplicate Lines then I end up with

      Line 1
      
      Line 4
      

      That looks good except there is no end-of-line mark after Line 4. The blank line at line 2 in the data set triggers the issue. If for example, the original data was:

      Line 1
      Line 2
      Line 1
      Line 4
      

      then we end up with

      Line 1
      Line 2
      Line 4
      

      and there is an end of line mark after Line 4.

      Sorting has a similar issue. If we have three lines that all have end of line marks:

      Line 1
      Line 3
      Line 2
      

      and run Edit / Line Operations / Sort Lines Lexicographically Ascending then I end up with:

      
      Line 1
      Line 2
      Line 3
      

      The sort added a blank line before Line 1 and there is no end of line mark at the end of Line 3.

      A third issue is that a sort also turns the data into a selection that does not include the end of line mark for the last line, even if it has an end of line mark.

      Alan KilbornA PeterJonesP 3 Replies Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @mkupper
        last edited by Alan Kilborn

        @mkupper

        I’m not at my PC to try it, but from the description some of this this smells like it might be similar to or related to:

        • https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8739
        • https://github.com/notepad-plus-plus/notepad-plus-plus/pull/13498

        The sort added a blank line before Line 1 and there is no end of line mark at the end of Line 3.

        The author of Notepad++ seems to not mind that behavior, or feels it is correct, as issue 8739 got stamped with a rejection.

        I find that to do what I feel is correct sorting, I must do a Select All (Ctrl+a) immediately before running a sort command. This was a “tip” in one of the comments to issue 8739.

        1 Reply Last reply Reply Quote 2
        • PeterJonesP
          PeterJones @mkupper
          last edited by PeterJones

          @mkupper ,

          I was in the process of confirming when Alan posted his reply… After reading the links he posted, I agree with his assessment that Don feels that treating the newline followed by end-of-file as a “blank” line is correct.

          While it’s obvious how that explains the sort, since sorting was the focus of the links, it also explains the delete-duplicate seemingly-contadictory behavior: when you have line2 as a blank, then it and the newline at the end of line4 are “duplicate blank lines”, so the newline after line4 is deleted. Based on this, my interpretation of Don’s definition of a line is “newline or start-of-file, followed by zero or more newline characters” – so a newline is a prefix on the current line, instead of a suffix on the current line.

          If Scintilla (and thus Notepad++) followed other quality text editors, like vim, and didn’t display a blank line when a file ends in EOL, I think he would have been more apt to have understood and accepted the issue and proposed fix. As it is, he’d likely just consider it “already decided” and refuse to do anything.

          Fortunately for me, deleting the end-of-line after such a sort or remove-dup isn’t an issue for me, because I use EditorConfig plugin to always apply the final EOL whenever I save.

          1 Reply Last reply Reply Quote 3
          • Alan KilbornA
            Alan Kilborn @mkupper
            last edited by

            It’s a somewhat hotly contested thing: Is a line a line without a line-ending at its end? For me, it’s a No…and, because Notepad++ doesn’t enforce that, I, like Peter, use the editorconfig plugin, which forces the final line in a file to have a line-ending. An unfortunate side effect of that is that Notepad++ won’t sort “correctly”.

            1 Reply Last reply Reply Quote 1
            • Sarah SmithS
              Sarah Smith
              last edited by

              This post is deleted!
              1 Reply Last reply Reply Quote -1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors