Sort Lines Lexicographically did not work
-
@gitberry said in Sort Lines Lexicographically did not work:
So true! I don’t think there is a good reason. When it happens (ie received from am uncaring/uncareful source etc) and the sort doesn’t work…
If you are likely to get files of that nature from another source, suggest you “sanitize” them before beginning to work with them.
For example, do a line-ending conversion in Notepad++, which will unify the line-endings all to one type (whichever type you desire). After that, do your sort, or whatever other data manipulations you need to do.
(I guess Terry already said the same thing; sorry, didn’t see that first before crafting this reply)
-
@Terry-R said in Sort Lines Lexicographically did not work:
I don’t see why a file would contain a mixture of the 3 types, unless there had been an error in encoding or reading of the file. So AFAIK a file would ONLY ever contain 1 type of line ending, and that depends on the use/environment the file is being used in.
Amen, brother, amen.
However, Notepad++ (Scintilla) doesn’t enforce this.
And, by default, it doesn’t let you know that you have “screwed up” files when this situation happens to occur.One way for it to occur is the aforementioned reception of files from another source.
Another way for it to happen is a regex replacement where uses think that
\n
works to match a line-ending of any type. It does NOT ;\R
should be used instead for this purpose. But, again, Notepad++ lets you do it, so line-ending weirdness can happen from this.A good way to “set it and forget it” to avoid this type of problem is by using the EditorConfig plugin. With that, you specify your desired line-ending type, and when files are saved in Notepad++, the plugin steps in and corrects any improper line-endings to your desired type.
An alternative way to monitor the situation is to turn on visible line-endings and then hope you notice a mismatch. However, looking at the “heavy” line-ending character representation is too visually overwhelming for me. YMMV.
-
@gitberry said in Sort Lines Lexicographically did not work:
[ X ] SORT Line Ending Agnostic (will treat CR, CR-LF and LF equally as line endings when sorting)
There’s an open issue on the ISSUE-TRACKER for this; perhaps you wanna add your voice there so it can be heard by developers?
Personally, I don’t think this needs a setting, I think it should ignore line-endings when sorting.
But, for myself, I use the EditorConfig plugin so that I just don’t get into a situation where a sorting problem (and other problems that could occur from this) doesn’t happen.
-
@Alan-Kilborn said in Sort Lines Lexicographically did not work:
For example, do a line-ending conversion in Notepad++, which will unify the line-endings all to one type (whichever type you desire). After that, do your sort, or whatever other data manipulations you need to do.
actually, if you open this menu on a file with a mixture of line endings, the original selection is greyed out; NP++ thinks everything is still unified. perhaps this is a bug?
-
@mathlete2 said in Sort Lines Lexicographically did not work:
actually, if you open this menu on a file with a mixture of line endings, the original selection is greyed out; NP++ thinks everything is still unified. perhaps this is a bug?
No. It just picked one (probably based on the line1 ending). To unify, you need to trigger at least one conversion, so picking the wrong one (like LF), and then convert back to the right one (CRLF). This is why Alan phrased is as “do a line-ending conversion”, not just “pick the line ending you want”.
-
also, FWIW, you can get yourself into these situations if you do a RegEx replacement similar to the one below to separate objects into separate lines. visually, this gets you to a the sortable state you want, but the EOL codes interfere with the actual sorting.
-
@mathlete2 said in Sort Lines Lexicographically did not work:
also, FWIW, you can get yourself into these situations if you do a RegEx replacement similar to the one below to separate objects into separate lines. visually, this gets you to a the sortable state you want, but the EOL codes interfere with the actual sorting.
I’m guessing from your screenshot that you have the belief (like others have in the past) that using
\n
in your replacement will get you\r\n
in files that have “Windows (CR LF)” type. It is NOT true. You get what you ask for, in this case you will get exactly\n
…another way to end up with a mismash of line ending types in your file, as maybe you found out…and yes, as the rest of this thread indicates, that will affect sorting.It is never really a good idea to change a file’s line-ending type with a regular expression replacement. Best to use the status-bar menu already discussed.
Or, here’s a super-secret hack to unify line-ending characters:
Paste some data into a file with mixed line-endings using the Edit menu’s Paste command and you will observe that all line-endings become the same! Note that using Ctrl+v to paste does NOT get you the same effect! -
@PeterJones well, if you select the “wrong” one that matches the other “wrong” ones already there, NP++ doesn’t add a second EOL character to those lines. So, “self conversions” are already possible to a certain extent, just not for the active selection
-
@Alan-Kilborn said in Sort Lines Lexicographically did not work:
I’m guessing from your screenshot that you have the belief (like others have in the past) that using \n in your replacement will get you \r\n in files that have “Windows (CR LF)” type
nope, just wasn’t aware that Windows used a weird EOL character coding until I came across this thread ;)
-
Hello, @mathlete2 and All,
- Regarding your regex, the trick is to capture the common line-ending, of each line in a group and places it, twice, in the replacement regex ! So, this new regex S/R :
SEARCH
(\w+(\\[w+\\])*)(\R)
REPLACE
\1\3\3
-
A simple way to get uniform line-endings, with a regex, is to run :
-
SEARCH
\R
( OK, whatever the effective line-ending of each line ) -
REPLACE
\r\n
Case ofWindows
line-endings wanted -
REPLACE
\n
Case ofUnix
line-endings wanted -
REPLACE
\r
Case ofMacintosh
line-endings wanted
-
Of course :
-
Tick the
Wrap around
option -
Click on the
Replace All
button
Best Regards,
guy038
-
@mathlete2 said in Sort Lines Lexicographically did not work:
…just wasn’t aware that Windows used a weird EOL character coding
LOL. I’m guessing again: You’re a young person! :-)
Us “old timers” stopped wondering about the weirdness of line-endings a long time ago. -
@mathlete2 said in Sort Lines Lexicographically did not work:
well, if you select the “wrong” one that matches the other “wrong” ones already there, NP++ doesn’t add a second EOL character to those lines. So, “self conversions” are already possible to a certain extent, just not for the active selection
It certainly “knows” about which characters are possible line-ending characters, i.e.,
\n
and\r
, so yes, when doing a conversion it considers a line to be a group of characters followed by any conglomeration of line-ending characters. Meaning that it knows how to strip off everything there before applying fresh line-ends.But to your larger point (I think), it certainly would be possible to have all 3 of those choice enabled at all times (if the developers so decided), so that if you have a mismash, and N++ thinks you have a “Windows” file, and you want to clear up the mismash and indeed end up with a “Windows” file, you really shouldn’t have to do TWO conversions.
If you find you have to do this often, perhaps making a macro out of @guy038 's suggested operation(s) is a wise course of action (going against my advise to not use regex for this).
Or, better yet, check out the EditorConfig plugin, set it up for what you want, and (after a save of your file), never think about these “weird” line endings again.
-
@mathlete2 said in Sort Lines Lexicographically did not work:
nope, just wasn’t aware that Windows used a weird EOL character coding until I came across this thread ;)
Us Windows users might take exception to that description, LOL!
If you’re of similar age to me you will know about the manual typewriter which had a “carriage return lever” on the left which performed the \r and \n functions in one movement.
Terry
PS actually I’m not quite as old as the one in the picture, but I did get a hold of one similar that I intend on restoring sometime.
-
@guy038 ah, there we go! this is the way that the EOL conversion should be working. if it already is, there’s no need to disable the active scheme from the list - you just need a marker to indicate which one is used when the user hits Enter
-
Hi, @mathlete2, @alan-kilborn and All,
Folllowing the Alan’s idea of a
Ctrl + V
operation, I would say that, if you suppose your current file to contain a mix of different line-endings, simply use these three shortcuts, successively :-
Ctrl + A
-
Ctrl + C
-
Ctrl + V
And all the lines of your current file should adopt the line-ending, indicated in right part of the status bar ! Et voilà !
BR
guy038
-
-
@Terry-R well, if you’re a NP++ user, you’re a Windows user ;)
pretty much all of my EOL character knowledge prior to this thread came from string extraction; I knew of
\r
, but hadn’t really seen it come up muchand yes, I’m old enough to remember using typewriters, but I don’t think ours had the lever - I only remember using Enter
-
@mathlete2 said in Sort Lines Lexicographically did not work:
but I don’t think ours had the lever - I only remember using Enter
So your typewriter likely would have been electric. The manuals ones all relied on mechanical levers of one sort or another. The term “carriage return” (\r) arises from this lever on the typewriter I believe. When pushed across to the right it moved the platen (roller holding the paper) to the right and advanced (turned) the roller 1 line or more (the \n).
Oh the memories of high school and learning the “asdf” and “;lkj”. Unfortunately I’m still a 2 finger typist, but oh what speed I can inflict, the fastest 2 fingers in the west! ;-))
Terry
-
Hello, @terry-r,
Terry, your image is… comforting !
What a concentrate of technology for the time and what apparent robustness !
A simple look at the machine’s feet shows that it was made to last and seemed to be built with common sense !
Not like this modern concept of programmed obsolescence :-(
Also, good restoration, deserved for this wonderful object ;-))
Best Regards,
guy038
P.S. : It was my quarter of an hour of nostalgia ;-))
-
@guy038 said in Sort Lines Lexicographically did not work:
Folllowing the Alan’s idea of a Ctrl + V operation,
I think you read what I said wrongly.
Ctrl+v does NOT do anything to line-endings in the remainder of the file.
It will, however, convert line-endings on the lines that are being pasted.It is the Edit menu’s Paste command that does it to the ENTIRE file.
There doesn’t even have to be a line-ending in what you are pasting; it can be a simple as a single character in the clipboard, e.g. ana
, and the entire-file line-ending conversion takes place upon (menu-invoked) Paste.This is amazing, because from this you’d think they are one and the same:
But they are not equivalent. Give it a try!!
-
Hi, @alan-kilborn,
I still don’t fully understand your last post !
Do you mean that using the
Edit > Paste
menu option OR theCtrl + V
shortcut do not produce the same results ?With my old
XP SP3
laptop, ( for still few days ! ) it seems identical !?To my mind :
-
After the
Ctrl + C
operation, lines, placed in the clipboard, still contain their initial different line-endings -
After the
Ctrl + V
operation, the clipboard replaces the current selection with its contents, using the current line-ending, defined in the status bar
BR
guy038
-