Remove duplicate lines removes end of line marks and sorting moves them
-
This is with v8.6.4 (32-bit and 64-bit) and v8.3.3 (32-bit) meaning it’s not a recent regression.
I have been getting hit by missing end of line marks and so today was prompted to understand what I was doing that leads to their loss. Before posting the issue to github I wanted some confirmation/feedback.
I have a sorted set of lines and now want to merge in a new set of lines that also has duplicates. My practice as been to
- Select the original set of sorted lines and cut/paste them into a new tab.
- Select the set of new lines I want to merge into the sorted list and cut/paste that below the lines in the new tab.
- In the new tab I’d do Remove Duplicate Lines, Sort Ascending, end then cut/paste the results back into the first file. It turns out that both the
Remove Duplicate Lines
andSort Ascending
operations can remove end of line marks.
With this data:
Line 1 Line 1 Line 4
If I do
Edit / Line Operations / Remove Duplicate Lines
then I end up withLine 1 Line 4
That looks good except there is no end-of-line mark after
Line 4
. The blank line at line 2 in the data set triggers the issue. If for example, the original data was:Line 1 Line 2 Line 1 Line 4
then we end up with
Line 1 Line 2 Line 4
and there is an end of line mark after
Line 4
.Sorting has a similar issue. If we have three lines that all have end of line marks:
Line 1 Line 3 Line 2
and run
Edit / Line Operations / Sort Lines Lexicographically Ascending
then I end up with:Line 1 Line 2 Line 3
The sort added a blank line before
Line 1
and there is no end of line mark at the end ofLine 3
.A third issue is that a sort also turns the data into a selection that does not include the end of line mark for the last line, even if it has an end of line mark.
-
I’m not at my PC to try it, but from the description some of this this smells like it might be similar to or related to:
- https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8739
- https://github.com/notepad-plus-plus/notepad-plus-plus/pull/13498
The sort added a blank line before Line 1 and there is no end of line mark at the end of Line 3.
The author of Notepad++ seems to not mind that behavior, or feels it is correct, as issue 8739 got stamped with a rejection.
I find that to do what I feel is correct sorting, I must do a Select All (Ctrl+a) immediately before running a sort command. This was a “tip” in one of the comments to issue 8739.
-
@mkupper ,
I was in the process of confirming when Alan posted his reply… After reading the links he posted, I agree with his assessment that Don feels that treating the newline followed by end-of-file as a “blank” line is correct.
While it’s obvious how that explains the sort, since sorting was the focus of the links, it also explains the delete-duplicate seemingly-contadictory behavior: when you have line2 as a blank, then it and the newline at the end of line4 are “duplicate blank lines”, so the newline after line4 is deleted. Based on this, my interpretation of Don’s definition of a line is “newline or start-of-file, followed by zero or more newline characters” – so a newline is a prefix on the current line, instead of a suffix on the current line.
If Scintilla (and thus Notepad++) followed other quality text editors, like vim, and didn’t display a blank line when a file ends in EOL, I think he would have been more apt to have understood and accepted the issue and proposed fix. As it is, he’d likely just consider it “already decided” and refuse to do anything.
Fortunately for me, deleting the end-of-line after such a sort or remove-dup isn’t an issue for me, because I use EditorConfig plugin to always apply the final EOL whenever I save.
-
It’s a somewhat hotly contested thing: Is a line a line without a line-ending at its end? For me, it’s a No…and, because Notepad++ doesn’t enforce that, I, like Peter, use the editorconfig plugin, which forces the final line in a file to have a line-ending. An unfortunate side effect of that is that Notepad++ won’t sort “correctly”.
-
This post is deleted!