Community
    • Login

    File sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 8 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      @dave-pruce, @alan-kilborn,

      Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

      As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

      For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

      Anyway, we just did our best to solve the OP’s problem ;-))

      BR

      guy038

      1 Reply Last reply Reply Quote 1
      • F
        freezer2022 @Dave Pruce
        last edited by freezer2022

        @ Dave-Pruce said :

        Is it possible to sort a file by line length??

        Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

        https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
        

        linesort.png

        1 Reply Last reply Reply Quote 1
        • CoisesC
          Coises @Alan Kilborn
          last edited by Coises

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • mkupperM
            mkupper @guy038
            last edited by

            @guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

            That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, All,

              Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

              Like in my previous post, I will use this list of rivers, below :

              https://en.wikipedia.org/wiki/List_of_rivers_by_length

              After suppression of some doublons, we get an INPUT text of 238 river’s names :

              Nile
              White Nile
              Kagera
              Nyabarongo
              Mwogo
              Rukarara
              Amazon
              Ucayali
              Tambo
              Ene
              Mantaro
              Yangtze
              Mississippi
              Missouri
              Jefferson
              Beaverhead
              Red Rock
              Hell Roaring
              Yenisei
              Angara
              Selenge
              Ider
              Yellow River
              Ob
              Irtysh
              Río de la Plata
              Paraná
              Congo
              Chambeshi
              Amur
              Argun
              Kherlen
              Lena
              Mekong
              Mackenzie
              Slave
              Peace
              Finlay
              Niger
              Brahmaputra
              Tsangpo
              Murray
              Darling
              Culgoa
              Balonne
              Condamine
              Tocantins
              Araguaia
              Volga
              Indus
              Sênggê Zangbo
              Shatt al-Arab
              Euphrates
              Murat
              Madeira
              Mamoré
              Caine
              Rocha
              Purús
              Yukon
              São Francisco
              Syr Darya
              Naryn
              Salween
              Saint Lawrence
              Niagara
              Detroit
              Saint Clair
              Saint Marys
              Saint Louis
              North
              Nizhnyaya Tunguska
              Danube
              Breg
              Zambezi
              Vilyuy
              Ganges
              Hooghly
              Padma
              Amu Darya
              Panj
              Japurá
              Nelson
              Saskatchewan
              Paraguay
              Kolyma
              Pilcomayo
              Biya
              Katun
              Ishim
              Juruá
              Ural
              Arkansas
              Colorado
              Olenyok
              Dnieper
              Aldan
              Ubangi
              Uele
              Negro
              Columbia
              Zhujiang
              Red
              Ayeyarwady
              Kasai
              Ohio
              Allegheny
              Orinoco
              Tarim
              Xingu
              Orange
              Salado
              Vitim
              Tigris
              Songhua
              Tapajós
              Don
              Podkamennaya Tunguska
              Pechora
              Kama
              Limpopo
              Chulym
              Guaporé
              Indigirka
              Snake
              Senegal
              Uruguay
              Blue Nile
              Churchill
              Khatanga
              Okavango
              Volta
              Beni
              Platte
              Tobol
              Alazeya
              Jubba
              Shebelle
              Içá
              Magdalena
              Han
              Kura
              Oka
              Guaviare
              Pecos
              Murrumbidgee
              Godavari
              Río Grande
              Belaya
              Cooper
              Barcoo
              Marañón
              Dniester
              Benue
              Ili
              Warburton
              Georgina
              Sutlej
              Yamuna
              Vyatka
              Fraser
              Brazos
              Liao
              Lachlan
              Yalong
              Iguaçu
              Olyokma
              Northern Dvina
              Sukhona
              Krishna
              Iriri
              Narmada
              Lomami
              Ottawa
              Lerma
              Grande de Santiago
              Elbe
              Vltava
              Zeya
              Juruena
              Rhine
              Athabasca
              Canadian
              North Saskatchewan
              Vistula
              Bug
              Vaal
              Shire
              Ogooué
              Nen
              Kızılırmak
              Markha
              Green
              Milk
              Chindwin
              Sankuru
              Wu
              James
              Kapuas
              Desna
              Helmand
              Madre de Dios
              Tietê
              Vychegda
              Sepik
              Cimarron
              Anadyr
              Paraíba do Sul
              Jialing
              Liard
              Cumberland
              White
              Huallaga
              Kwango
              Draa
              Gambia
              Tyung
              Chenab
              Yellowstone
              Ghaghara
              Huai
              Aras
              Chu
              Seversky Donets
              Bermejo
              Fly
              Kuskokwim
              Tennessee
              Oder
              Warta
              Aruwimi
              Daugava
              Gila
              Loire
              Essequibo
              Khoper
              Tagus
              Flinders
              
              • At end of the first line, we add some space chars till column 100

              • Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

              => We get this temporary text ( I just listed the first lines and the last lines ) :

              Nile                                                                                               !
              White Nile                                                                                         !
              Kagera                                                                                             !
              Nyabarongo                                                                                         !
              Mwogo                                                                                              !
              Rukarara                                                                                           !
              Amazon                                                                                             !
              Ucayali                                                                                            !
              Tambo                                                                                              !
              Ene                                                                                                !
              Mantaro                                                                                            !
              Yangtze                                                                                            !
              Mississippi                                                                                        !
              Missouri                                                                                           !
              ......                                                                                             !
              ......                                                                                             !
              ......                                                                                             !
              ......                                                                                             !
              Seversky Donets                                                                                    !
              Bermejo                                                                                            !
              Fly                                                                                                !
              Kuskokwim                                                                                          !
              Tennessee                                                                                          !
              Oder                                                                                               !
              Warta                                                                                              !
              Aruwimi                                                                                            !
              Daugava                                                                                            !
              Gila                                                                                               !
              Loire                                                                                              !
              Essequibo                                                                                          !
              Khoper                                                                                             !
              Tagus                                                                                              !
              Flinders                                                                                           !
              
              
              • Now, we perform this regex S/R :

                • SEARCH ^([\w -]+?)(\x20+)(?=!)

                • REPLACE \2\1

              => Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                                             Nile!
                                                                                                       White Nile!
                                                                                                           Kagera!
                                                                                                       Nyabarongo!
                                                                                                            Mwogo!
                                                                                                         Rukarara!
                                                                                                           Amazon!
                                                                                                          Ucayali!
                                                                                                            Tambo!
                                                                                                              Ene!
                                                                                                          Mantaro!
                                                                                                          Yangtze!
                                                                                                      Mississippi!
                                                                                                         Missouri!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                  Seversky Donets!
                                                                                                          Bermejo!
                                                                                                              Fly!
                                                                                                        Kuskokwim!
                                                                                                        Tennessee!
                                                                                                             Oder!
                                                                                                            Warta!
                                                                                                          Aruwimi!
                                                                                                          Daugava!
                                                                                                             Gila!
                                                                                                            Loire!
                                                                                                        Essequibo!
                                                                                                           Khoper!
                                                                                                            Tagus!
                                                                                                         Flinders!
              
              • Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

              ==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                               Ob!
                                                                                                               Wu!
                                                                                                              Bug!
                                                                                                              Chu!
                                                                                                              Don!
                                                                                                              Ene!
                                                                                                              Fly!
                                                                                                              Han!
                                                                                                              Ili!
                                                                                                              Içá!
                                                                                                              Nen!
                                                                                                              Oka!
                                                                                                              Red!
                                                                                                             Amur!
                                                                                                             Aras!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                           ......!
                                                                                                     Saskatchewan!
                                                                                                     Yellow River!
                                                                                                    Madre de Dios!
                                                                                                    Shatt al-Arab!
                                                                                                    São Francisco!
                                                                                                    Sênggê Zangbo!
                                                                                                   Northern Dvina!
                                                                                                   Paraíba do Sul!
                                                                                                   Saint Lawrence!
                                                                                                  Río de la Plata!
                                                                                                  Seversky Donets!
                                                                                               Grande de Santiago!
                                                                                               Nizhnyaya Tunguska!
                                                                                               North Saskatchewan!
                                                                                            Podkamennaya Tunguska!
              
              • Finally, let’s run this last regex S/R

                • SEARCH ^\x20+|!$

                • REPLACE Leave EMPTY

              => It remains our expected OUTPUT text, sorted by line length :

              Ob
              Wu
              Bug
              Chu
              Don
              Ene
              Fly
              Han
              Ili
              Içá
              Nen
              Oka
              Red
              Amur
              Aras
              Beni
              Biya
              Breg
              Draa
              Elbe
              Gila
              Huai
              Ider
              Kama
              Kura
              Lena
              Liao
              Milk
              Nile
              Oder
              Ohio
              Panj
              Uele
              Ural
              Vaal
              Zeya
              Aldan
              Argun
              Benue
              Caine
              Congo
              Desna
              Green
              Indus
              Iriri
              Ishim
              James
              Jubba
              Juruá
              Kasai
              Katun
              Lerma
              Liard
              Loire
              Murat
              Mwogo
              Naryn
              Negro
              Niger
              North
              Padma
              Peace
              Pecos
              Purús
              Rhine
              Rocha
              Sepik
              Shire
              Slave
              Snake
              Tagus
              Tambo
              Tarim
              Tietê
              Tobol
              Tyung
              Vitim
              Volga
              Volta
              Warta
              White
              Xingu
              Yukon
              Amazon
              Anadyr
              Angara
              Barcoo
              Belaya
              Brazos
              Chenab
              Chulym
              Cooper
              Culgoa
              Danube
              Finlay
              Fraser
              Gambia
              Ganges
              Iguaçu
              Irtysh
              Japurá
              Kagera
              Kapuas
              Khoper
              Kolyma
              Kwango
              Lomami
              Mamoré
              Markha
              Mekong
              Murray
              Nelson
              Ogooué
              Orange
              Ottawa
              Paraná
              Platte
              Salado
              Sutlej
              Tigris
              Ubangi
              Vilyuy
              Vltava
              Vyatka
              Yalong
              Yamuna
              Alazeya
              Aruwimi
              Balonne
              Bermejo
              Darling
              Daugava
              Detroit
              Dnieper
              Guaporé
              Helmand
              Hooghly
              Jialing
              Juruena
              Kherlen
              Krishna
              Lachlan
              Limpopo
              Madeira
              Mantaro
              Marañón
              Narmada
              Niagara
              Olenyok
              Olyokma
              Orinoco
              Pechora
              Salween
              Sankuru
              Selenge
              Senegal
              Songhua
              Sukhona
              Tapajós
              Tsangpo
              Ucayali
              Uruguay
              Vistula
              Yangtze
              Yenisei
              Zambezi
              Araguaia
              Arkansas
              Canadian
              Chindwin
              Cimarron
              Colorado
              Columbia
              Dniester
              Flinders
              Georgina
              Ghaghara
              Godavari
              Guaviare
              Huallaga
              Khatanga
              Missouri
              Okavango
              Paraguay
              Red Rock
              Rukarara
              Shebelle
              Vychegda
              Zhujiang
              Allegheny
              Amu Darya
              Athabasca
              Blue Nile
              Chambeshi
              Churchill
              Condamine
              Essequibo
              Euphrates
              Indigirka
              Jefferson
              Kuskokwim
              Mackenzie
              Magdalena
              Pilcomayo
              Syr Darya
              Tennessee
              Tocantins
              Warburton
              Ayeyarwady
              Beaverhead
              Cumberland
              Kızılırmak
              Nyabarongo
              Río Grande
              White Nile
              Brahmaputra
              Mississippi
              Saint Clair
              Saint Louis
              Saint Marys
              Yellowstone
              Hell Roaring
              Murrumbidgee
              Saskatchewan
              Yellow River
              Madre de Dios
              Shatt al-Arab
              São Francisco
              Sênggê Zangbo
              Northern Dvina
              Paraíba do Sul
              Saint Lawrence
              Río de la Plata
              Seversky Donets
              Grande de Santiago
              Nizhnyaya Tunguska
              North Saskatchewan
              Podkamennaya Tunguska
              

              That’s all ! Neat, isn’t it ?

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 2
              • CoisesC
                Coises
                last edited by Coises

                @Thomas-Knoefel

                I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

                I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

                Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
                  Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

                  This converts

                  abcdefg
                  ab
                  abcdefgh
                  a
                  abcdefghi
                  abcde
                  abcd
                  abc
                  

                  into

                  a
                  ab
                  abc
                  abcd
                  abcde
                  abcdefg
                  abcdefgh
                  abcdefghi
                  
                  Mahmoud MadkourM 1 Reply Last reply Reply Quote 2
                  • CoisesC
                    Coises
                    last edited by

                    In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

                    Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

                    Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

                    The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

                    1 Reply Last reply Reply Quote 4
                    • Mahmoud MadkourM
                      Mahmoud Madkour @Mark Olson
                      last edited by

                      @Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
                      1- how to open the file in tree view
                      2- how to go to REGEX mode to enter the query

                      many thanks

                      Mark OlsonM 1 Reply Last reply Reply Quote 0
                      • Mark OlsonM
                        Mark Olson @Mahmoud Madkour
                        last edited by

                        @Mahmoud-Madkour
                        To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
                        Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors