File sorting
-
Yes, your new attempt, Alan, is the solution, when working with
UTF8
encoded files, which may content multi-bytes encoded chars !As for me, I was thinking about the opposite solution : to convert
UTf8
-files toANSI
. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of256
characters !For instance, in my previous list of rivers, the Turkish
Kızılırmak
river, containing the Latin lowercase pointless letterı
, ( of code-point\x{0131}
), is changed into the approximate nameKizilirmak
, after conversion toANSI
!Anyway, we just did our best to solve the OP’s problem ;-))
BR
guy038
-
@ Dave-Pruce said :
Is it possible to sort a file by line length??
Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :
https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
-
This post is deleted! -
@guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)
That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.
-
Hello, All,
Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !
Like in my previous post, I will use this list of rivers, below :
https://en.wikipedia.org/wiki/List_of_rivers_by_length
After suppression of some doublons, we get an INPUT text of
238
river’s names :Nile White Nile Kagera Nyabarongo Mwogo Rukarara Amazon Ucayali Tambo Ene Mantaro Yangtze Mississippi Missouri Jefferson Beaverhead Red Rock Hell Roaring Yenisei Angara Selenge Ider Yellow River Ob Irtysh Río de la Plata Paraná Congo Chambeshi Amur Argun Kherlen Lena Mekong Mackenzie Slave Peace Finlay Niger Brahmaputra Tsangpo Murray Darling Culgoa Balonne Condamine Tocantins Araguaia Volga Indus Sênggê Zangbo Shatt al-Arab Euphrates Murat Madeira Mamoré Caine Rocha Purús Yukon São Francisco Syr Darya Naryn Salween Saint Lawrence Niagara Detroit Saint Clair Saint Marys Saint Louis North Nizhnyaya Tunguska Danube Breg Zambezi Vilyuy Ganges Hooghly Padma Amu Darya Panj Japurá Nelson Saskatchewan Paraguay Kolyma Pilcomayo Biya Katun Ishim Juruá Ural Arkansas Colorado Olenyok Dnieper Aldan Ubangi Uele Negro Columbia Zhujiang Red Ayeyarwady Kasai Ohio Allegheny Orinoco Tarim Xingu Orange Salado Vitim Tigris Songhua Tapajós Don Podkamennaya Tunguska Pechora Kama Limpopo Chulym Guaporé Indigirka Snake Senegal Uruguay Blue Nile Churchill Khatanga Okavango Volta Beni Platte Tobol Alazeya Jubba Shebelle Içá Magdalena Han Kura Oka Guaviare Pecos Murrumbidgee Godavari Río Grande Belaya Cooper Barcoo Marañón Dniester Benue Ili Warburton Georgina Sutlej Yamuna Vyatka Fraser Brazos Liao Lachlan Yalong Iguaçu Olyokma Northern Dvina Sukhona Krishna Iriri Narmada Lomami Ottawa Lerma Grande de Santiago Elbe Vltava Zeya Juruena Rhine Athabasca Canadian North Saskatchewan Vistula Bug Vaal Shire Ogooué Nen Kızılırmak Markha Green Milk Chindwin Sankuru Wu James Kapuas Desna Helmand Madre de Dios Tietê Vychegda Sepik Cimarron Anadyr Paraíba do Sul Jialing Liard Cumberland White Huallaga Kwango Draa Gambia Tyung Chenab Yellowstone Ghaghara Huai Aras Chu Seversky Donets Bermejo Fly Kuskokwim Tennessee Oder Warta Aruwimi Daugava Gila Loire Essequibo Khoper Tagus Flinders
-
At end of the first line, we add some
space
chars till column100
-
Then, with a zero-length selection, at column
100
, we insert a exclamation mark (!
) at end of all lines of the list :
=> We get this temporary text ( I just listed the first lines and the last lines ) :
Nile ! White Nile ! Kagera ! Nyabarongo ! Mwogo ! Rukarara ! Amazon ! Ucayali ! Tambo ! Ene ! Mantaro ! Yangtze ! Mississippi ! Missouri ! ...... ! ...... ! ...... ! ...... ! Seversky Donets ! Bermejo ! Fly ! Kuskokwim ! Tennessee ! Oder ! Warta ! Aruwimi ! Daugava ! Gila ! Loire ! Essequibo ! Khoper ! Tagus ! Flinders !
-
Now, we perform this regex S/R :
-
SEARCH
^([\w -]+?)(\x20+)(?=!)
-
REPLACE
\2\1
-
=> Again, we get this temporary text ( I just listed the first lines and the last lines ) :
Nile! White Nile! Kagera! Nyabarongo! Mwogo! Rukarara! Amazon! Ucayali! Tambo! Ene! Mantaro! Yangtze! Mississippi! Missouri! ......! ......! ......! ......! Seversky Donets! Bermejo! Fly! Kuskokwim! Tennessee! Oder! Warta! Aruwimi! Daugava! Gila! Loire! Essequibo! Khoper! Tagus! Flinders!
- Then, we run the
Edit > Line Operations > Sort Lines Lexicographically Ascending
option
==> Here is our sorted text ( I just listed the first lines and the last lines ) :
Ob! Wu! Bug! Chu! Don! Ene! Fly! Han! Ili! Içá! Nen! Oka! Red! Amur! Aras! ......! ......! ......! ......! Saskatchewan! Yellow River! Madre de Dios! Shatt al-Arab! São Francisco! Sênggê Zangbo! Northern Dvina! Paraíba do Sul! Saint Lawrence! Río de la Plata! Seversky Donets! Grande de Santiago! Nizhnyaya Tunguska! North Saskatchewan! Podkamennaya Tunguska!
-
Finally, let’s run this last regex S/R
-
SEARCH
^\x20+|!$
-
REPLACE
Leave EMPTY
-
=> It remains our expected OUTPUT text, sorted by
line length
:Ob Wu Bug Chu Don Ene Fly Han Ili Içá Nen Oka Red Amur Aras Beni Biya Breg Draa Elbe Gila Huai Ider Kama Kura Lena Liao Milk Nile Oder Ohio Panj Uele Ural Vaal Zeya Aldan Argun Benue Caine Congo Desna Green Indus Iriri Ishim James Jubba Juruá Kasai Katun Lerma Liard Loire Murat Mwogo Naryn Negro Niger North Padma Peace Pecos Purús Rhine Rocha Sepik Shire Slave Snake Tagus Tambo Tarim Tietê Tobol Tyung Vitim Volga Volta Warta White Xingu Yukon Amazon Anadyr Angara Barcoo Belaya Brazos Chenab Chulym Cooper Culgoa Danube Finlay Fraser Gambia Ganges Iguaçu Irtysh Japurá Kagera Kapuas Khoper Kolyma Kwango Lomami Mamoré Markha Mekong Murray Nelson Ogooué Orange Ottawa Paraná Platte Salado Sutlej Tigris Ubangi Vilyuy Vltava Vyatka Yalong Yamuna Alazeya Aruwimi Balonne Bermejo Darling Daugava Detroit Dnieper Guaporé Helmand Hooghly Jialing Juruena Kherlen Krishna Lachlan Limpopo Madeira Mantaro Marañón Narmada Niagara Olenyok Olyokma Orinoco Pechora Salween Sankuru Selenge Senegal Songhua Sukhona Tapajós Tsangpo Ucayali Uruguay Vistula Yangtze Yenisei Zambezi Araguaia Arkansas Canadian Chindwin Cimarron Colorado Columbia Dniester Flinders Georgina Ghaghara Godavari Guaviare Huallaga Khatanga Missouri Okavango Paraguay Red Rock Rukarara Shebelle Vychegda Zhujiang Allegheny Amu Darya Athabasca Blue Nile Chambeshi Churchill Condamine Essequibo Euphrates Indigirka Jefferson Kuskokwim Mackenzie Magdalena Pilcomayo Syr Darya Tennessee Tocantins Warburton Ayeyarwady Beaverhead Cumberland Kızılırmak Nyabarongo Río Grande White Nile Brahmaputra Mississippi Saint Clair Saint Louis Saint Marys Yellowstone Hell Roaring Murrumbidgee Saskatchewan Yellow River Madre de Dios Shatt al-Arab São Francisco Sênggê Zangbo Northern Dvina Paraíba do Sul Saint Lawrence Río de la Plata Seversky Donets Grande de Santiago Nizhnyaya Tunguska North Saskatchewan Podkamennaya Tunguska
That’s all ! Neat, isn’t it ?
Best Regards,
guy038
-
-
I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.
I believe multi-replace can be set up to find
^.*$
and replace withset(string.len(MATCH).." "..MATCH)
.Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then
^\d+\x20
replaced with nothing would remove the lengths. -
JsonTools v6.0 or higher, open treeview for document, go to
REGEX mode
, enter query@ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
Hopefully the syntax is reasonably easy to understand- split the file by\r\n
, sort the list of lines by string length, then set the document’s text (@
) to the result of string-joining the list back together with\r\n
.This converts
abcdefg ab abcdefgh a abcdefghi abcde abcd abc
into
a ab abc abcd abcde abcdefg abcdefgh abcdefghi
-
In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.
Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).
Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.
The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.
-
@Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
1- how to open the file in tree view
2- how to go to REGEX mode to enter the querymany thanks
-
@Mahmoud-Madkour
To open a tree view for a file inREGEX
mode, just use theRegex search to JSON
command from theJsonTools
plugin menu.
Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click theSubmit query
button next to the text box.