File sorting

guy038

Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

Anyway, we just did our best to solve the OP’s problem ;-))

BR

guy038

freezer2022

@ Dave-Pruce said :

Is it possible to sort a file by line length??

Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/

Coises

This post is deleted!

mkupper

@guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

guy038

Hello, All,

Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

Like in my previous post, I will use this list of rivers, below :

https://en.wikipedia.org/wiki/List_of_rivers_by_length

After suppression of some doublons, we get an INPUT text of 238 river’s names :

Nile
White Nile
Kagera
Nyabarongo
Mwogo
Rukarara
Amazon
Ucayali
Tambo
Ene
Mantaro
Yangtze
Mississippi
Missouri
Jefferson
Beaverhead
Red Rock
Hell Roaring
Yenisei
Angara
Selenge
Ider
Yellow River
Ob
Irtysh
Río de la Plata
Paraná
Congo
Chambeshi
Amur
Argun
Kherlen
Lena
Mekong
Mackenzie
Slave
Peace
Finlay
Niger
Brahmaputra
Tsangpo
Murray
Darling
Culgoa
Balonne
Condamine
Tocantins
Araguaia
Volga
Indus
Sênggê Zangbo
Shatt al-Arab
Euphrates
Murat
Madeira
Mamoré
Caine
Rocha
Purús
Yukon
São Francisco
Syr Darya
Naryn
Salween
Saint Lawrence
Niagara
Detroit
Saint Clair
Saint Marys
Saint Louis
North
Nizhnyaya Tunguska
Danube
Breg
Zambezi
Vilyuy
Ganges
Hooghly
Padma
Amu Darya
Panj
Japurá
Nelson
Saskatchewan
Paraguay
Kolyma
Pilcomayo
Biya
Katun
Ishim
Juruá
Ural
Arkansas
Colorado
Olenyok
Dnieper
Aldan
Ubangi
Uele
Negro
Columbia
Zhujiang
Red
Ayeyarwady
Kasai
Ohio
Allegheny
Orinoco
Tarim
Xingu
Orange
Salado
Vitim
Tigris
Songhua
Tapajós
Don
Podkamennaya Tunguska
Pechora
Kama
Limpopo
Chulym
Guaporé
Indigirka
Snake
Senegal
Uruguay
Blue Nile
Churchill
Khatanga
Okavango
Volta
Beni
Platte
Tobol
Alazeya
Jubba
Shebelle
Içá
Magdalena
Han
Kura
Oka
Guaviare
Pecos
Murrumbidgee
Godavari
Río Grande
Belaya
Cooper
Barcoo
Marañón
Dniester
Benue
Ili
Warburton
Georgina
Sutlej
Yamuna
Vyatka
Fraser
Brazos
Liao
Lachlan
Yalong
Iguaçu
Olyokma
Northern Dvina
Sukhona
Krishna
Iriri
Narmada
Lomami
Ottawa
Lerma
Grande de Santiago
Elbe
Vltava
Zeya
Juruena
Rhine
Athabasca
Canadian
North Saskatchewan
Vistula
Bug
Vaal
Shire
Ogooué
Nen
Kızılırmak
Markha
Green
Milk
Chindwin
Sankuru
Wu
James
Kapuas
Desna
Helmand
Madre de Dios
Tietê
Vychegda
Sepik
Cimarron
Anadyr
Paraíba do Sul
Jialing
Liard
Cumberland
White
Huallaga
Kwango
Draa
Gambia
Tyung
Chenab
Yellowstone
Ghaghara
Huai
Aras
Chu
Seversky Donets
Bermejo
Fly
Kuskokwim
Tennessee
Oder
Warta
Aruwimi
Daugava
Gila
Loire
Essequibo
Khoper
Tagus
Flinders

At end of the first line, we add some space chars till column 100
Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

=> We get this temporary text ( I just listed the first lines and the last lines ) :

Nile                                                                                               !
White Nile                                                                                         !
Kagera                                                                                             !
Nyabarongo                                                                                         !
Mwogo                                                                                              !
Rukarara                                                                                           !
Amazon                                                                                             !
Ucayali                                                                                            !
Tambo                                                                                              !
Ene                                                                                                !
Mantaro                                                                                            !
Yangtze                                                                                            !
Mississippi                                                                                        !
Missouri                                                                                           !
......                                                                                             !
......                                                                                             !
......                                                                                             !
......                                                                                             !
Seversky Donets                                                                                    !
Bermejo                                                                                            !
Fly                                                                                                !
Kuskokwim                                                                                          !
Tennessee                                                                                          !
Oder                                                                                               !
Warta                                                                                              !
Aruwimi                                                                                            !
Daugava                                                                                            !
Gila                                                                                               !
Loire                                                                                              !
Essequibo                                                                                          !
Khoper                                                                                             !
Tagus                                                                                              !
Flinders                                                                                           !

Now, we perform this regex S/R :
- SEARCH ^([\w -]+?)(\x20+)(?=!)
- REPLACE \2\1

=> Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                               Nile!
                                                                                         White Nile!
                                                                                             Kagera!
                                                                                         Nyabarongo!
                                                                                              Mwogo!
                                                                                           Rukarara!
                                                                                             Amazon!
                                                                                            Ucayali!
                                                                                              Tambo!
                                                                                                Ene!
                                                                                            Mantaro!
                                                                                            Yangtze!
                                                                                        Mississippi!
                                                                                           Missouri!
                                                                                             ......!
                                                                                             ......!
                                                                                             ......!
                                                                                             ......!
                                                                                    Seversky Donets!
                                                                                            Bermejo!
                                                                                                Fly!
                                                                                          Kuskokwim!
                                                                                          Tennessee!
                                                                                               Oder!
                                                                                              Warta!
                                                                                            Aruwimi!
                                                                                            Daugava!
                                                                                               Gila!
                                                                                              Loire!
                                                                                          Essequibo!
                                                                                             Khoper!
                                                                                              Tagus!
                                                                                           Flinders!

Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                 Ob!
                                                                                                 Wu!
                                                                                                Bug!
                                                                                                Chu!
                                                                                                Don!
                                                                                                Ene!
                                                                                                Fly!
                                                                                                Han!
                                                                                                Ili!
                                                                                                Içá!
                                                                                                Nen!
                                                                                                Oka!
                                                                                                Red!
                                                                                               Amur!
                                                                                               Aras!
                                                                                             ......!
                                                                                             ......!
                                                                                             ......!
                                                                                             ......!
                                                                                       Saskatchewan!
                                                                                       Yellow River!
                                                                                      Madre de Dios!
                                                                                      Shatt al-Arab!
                                                                                      São Francisco!
                                                                                      Sênggê Zangbo!
                                                                                     Northern Dvina!
                                                                                     Paraíba do Sul!
                                                                                     Saint Lawrence!
                                                                                    Río de la Plata!
                                                                                    Seversky Donets!
                                                                                 Grande de Santiago!
                                                                                 Nizhnyaya Tunguska!
                                                                                 North Saskatchewan!
                                                                              Podkamennaya Tunguska!

Finally, let’s run this last regex S/R
- SEARCH ^\x20+|!$
- REPLACE Leave EMPTY

=> It remains our expected OUTPUT text, sorted by line length :

Ob
Wu
Bug
Chu
Don
Ene
Fly
Han
Ili
Içá
Nen
Oka
Red
Amur
Aras
Beni
Biya
Breg
Draa
Elbe
Gila
Huai
Ider
Kama
Kura
Lena
Liao
Milk
Nile
Oder
Ohio
Panj
Uele
Ural
Vaal
Zeya
Aldan
Argun
Benue
Caine
Congo
Desna
Green
Indus
Iriri
Ishim
James
Jubba
Juruá
Kasai
Katun
Lerma
Liard
Loire
Murat
Mwogo
Naryn
Negro
Niger
North
Padma
Peace
Pecos
Purús
Rhine
Rocha
Sepik
Shire
Slave
Snake
Tagus
Tambo
Tarim
Tietê
Tobol
Tyung
Vitim
Volga
Volta
Warta
White
Xingu
Yukon
Amazon
Anadyr
Angara
Barcoo
Belaya
Brazos
Chenab
Chulym
Cooper
Culgoa
Danube
Finlay
Fraser
Gambia
Ganges
Iguaçu
Irtysh
Japurá
Kagera
Kapuas
Khoper
Kolyma
Kwango
Lomami
Mamoré
Markha
Mekong
Murray
Nelson
Ogooué
Orange
Ottawa
Paraná
Platte
Salado
Sutlej
Tigris
Ubangi
Vilyuy
Vltava
Vyatka
Yalong
Yamuna
Alazeya
Aruwimi
Balonne
Bermejo
Darling
Daugava
Detroit
Dnieper
Guaporé
Helmand
Hooghly
Jialing
Juruena
Kherlen
Krishna
Lachlan
Limpopo
Madeira
Mantaro
Marañón
Narmada
Niagara
Olenyok
Olyokma
Orinoco
Pechora
Salween
Sankuru
Selenge
Senegal
Songhua
Sukhona
Tapajós
Tsangpo
Ucayali
Uruguay
Vistula
Yangtze
Yenisei
Zambezi
Araguaia
Arkansas
Canadian
Chindwin
Cimarron
Colorado
Columbia
Dniester
Flinders
Georgina
Ghaghara
Godavari
Guaviare
Huallaga
Khatanga
Missouri
Okavango
Paraguay
Red Rock
Rukarara
Shebelle
Vychegda
Zhujiang
Allegheny
Amu Darya
Athabasca
Blue Nile
Chambeshi
Churchill
Condamine
Essequibo
Euphrates
Indigirka
Jefferson
Kuskokwim
Mackenzie
Magdalena
Pilcomayo
Syr Darya
Tennessee
Tocantins
Warburton
Ayeyarwady
Beaverhead
Cumberland
Kızılırmak
Nyabarongo
Río Grande
White Nile
Brahmaputra
Mississippi
Saint Clair
Saint Louis
Saint Marys
Yellowstone
Hell Roaring
Murrumbidgee
Saskatchewan
Yellow River
Madre de Dios
Shatt al-Arab
São Francisco
Sênggê Zangbo
Northern Dvina
Paraíba do Sul
Saint Lawrence
Río de la Plata
Seversky Donets
Grande de Santiago
Nizhnyaya Tunguska
North Saskatchewan
Podkamennaya Tunguska

That’s all ! Neat, isn’t it ?

Best Regards,

guy038

Coises

@Thomas-Knoefel

I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

Mark Olson

JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

This converts

abcdefg
ab
abcdefgh
a
abcdefghi
abcde
abcd
abc

into

a
ab
abc
abcd
abcde
abcdefg
abcdefgh
abcdefghi

Coises

In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

Mahmoud Madkour

@Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
1- how to open the file in tree view
2- how to go to REGEX mode to enter the query

many thanks

Mark Olson

@Mahmoud-Madkour
To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.