Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    File sorting

    Help wanted · · · – – – · · ·
    3
    11
    1028
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dave Pruce
      Dave Pruce last edited by

      I’m new here so please be gentle!!!
      Is it possible to sort a file by line length??
      Thats all!
      Thanks
      Dave

      Alan Kilborn 1 Reply Last reply Reply Quote 0
      • Alan Kilborn
        Alan Kilborn @Dave Pruce last edited by

        @Dave-Pruce

        As gently as possible…quiet now…here it comes: no (sorry), not with Notepad++ itself.

        A lot of other ways, though: think “programming”.

        1 Reply Last reply Reply Quote 0
        • guy038
          guy038 last edited by guy038

          Hello, @dave-pruce, @alan-kilborn and All,

          @dave-pruce :

          Still, as gently as possible, I can whisper to you : There a possible work-around, which only uses native N++ features ;-))

          As it’s about 1.40 a.m, presently, in France, I hope to be able to post my solution, tomorrow. So just be patient, a while !

          Best Regards,

          guy038

          Alan Kilborn 1 Reply Last reply Reply Quote 1
          • Alan Kilborn
            Alan Kilborn @guy038 last edited by

            @guy038

            I’ll have to see the number of steps involved to see if it invalidates my original “no”. :)

            1 Reply Last reply Reply Quote 0
            • Alan Kilborn
              Alan Kilborn last edited by

              While we’re waiting for @guy038, while not native to Notepad++, a Pythonscript one-liner can do the job:

              editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=len)))
              

              And since it is a one-liner, one doesn’t even have to create a file for it. Just open a Pythonscript Console window (Plugins > Pythonscript > Show Console) and then find the little box that has >>> to its left in the console window and paste the above there. Press Enter to execute it for the active Notepad++ file.

              Sadly, the bigger hurdle would be getting Pythonscript installed. :-(

              1 Reply Last reply Reply Quote 3
              • guy038
                guy038 last edited by guy038

                Hi, @dave-pruce, @alan-kilborn and All,

                The work-around comes from a simple idea. Imagine these 5 lines, below !

                wxyz
                defghijklm
                no
                abcd
                pqrstuv
                

                To begin with, right justify these 5 lines. So you get :

                      wxyz
                defghijklm
                        no
                      abcd
                   pqrstuv
                

                Now, run a simple ascending alphabetic sort

                        no
                      abcd
                      wxyz
                   pqrstuv
                defghijklm
                

                Nice ! We have, automatically, all the lines sorted by line length.

                To end, you just have to get rid of the leading spaces, giving the expected text :

                no
                abcd
                wxyz
                pqrstuv
                defghijklm
                

                In addition, notice that lines of same length are, also, sorted alphabetically, too ;-))


                OK ! Let’s use a real list. From, the link, below :

                https://en.wikipedia.org/wiki/List_of_rivers_by_length

                I got, for instance, after some re-formating, an English world list of 243 rivers, below, pasted in a new N++ tab :

                Nile
                White Nile
                Kagera
                Nyabarongo
                Mwogo
                Rukarara
                Amazon
                Ucayali
                Tambo
                Ene
                Mantaro
                Yangtze
                Mississippi
                Missouri
                Jefferson
                Beaverhead
                Red Rock
                Hell Roaring
                Yenisei
                Angara
                Selenge
                Ider
                Yellow River
                Ob
                Irtysh
                Río de la Plata
                Paraná
                Congo
                Chambeshi
                Amur
                Argun
                Kherlen
                Lena
                Mekong
                Mackenzie
                Slave
                Peace
                Finlay
                Niger
                Brahmaputra
                Tsangpo
                Murray
                Darling
                Culgoa
                Balonne
                Condamine
                Tocantins
                Araguaia
                Volga
                Indus
                Sênggê Zangbo
                Shatt al-Arab
                Euphrates
                Murat
                Madeira
                Mamoré
                Caine
                Rocha
                Purús
                Yukon
                São Francisco
                Syr Darya
                Naryn
                Salween
                Saint Lawrence
                Niagara
                Detroit
                Saint Clair
                Saint Marys
                Saint Louis
                North
                Nizhnyaya Tunguska
                Danube
                Breg
                Zambezi
                Vilyuy
                Araguaia
                Ganges
                Hooghly
                Padma
                Amu Darya
                Panj
                Japurá
                Nelson
                Saskatchewan
                Paraguay
                Kolyma
                Pilcomayo
                Biya
                Katun
                Ishim
                Juruá
                Ural
                Arkansas
                Colorado
                Olenyok
                Dnieper
                Aldan
                Ubangi
                Uele
                Negro
                Columbia
                Zhujiang
                Red
                Ayeyarwady
                Kasai
                Ohio
                Allegheny
                Orinoco
                Tarim
                Xingu
                Orange
                Salado
                Vitim
                Tigris
                Songhua
                Tapajós
                Don
                Podkamennaya Tunguska
                Pechora
                Kama
                Limpopo
                Chulym
                Guaporé
                Indigirka
                Snake
                Senegal
                Uruguay
                Blue Nile
                Churchill
                Khatanga
                Okavango
                Volta
                Beni
                Platte
                Tobol
                Alazeya
                Jubba
                Shebelle
                Içá
                Magdalena
                Han
                Kura
                Oka
                Murray
                Guaviare
                Pecos
                Murrumbidgee
                Yenisei
                Godavari
                Colorado
                Río Grande
                Belaya
                Cooper
                Barcoo
                Marañón
                Dniester
                Benue
                Ili
                Warburton
                Georgina
                Sutlej
                Yamuna
                Vyatka
                Fraser
                Brazos
                Liao
                Lachlan
                Yalong
                Iguaçu
                Olyokma
                Northern Dvina
                Sukhona
                Krishna
                Iriri
                Narmada
                Lomami
                Ottawa
                Lerma
                Grande de Santiago
                Elbe
                Vltava
                Zeya
                Juruena
                Rhine
                Athabasca
                Canadian
                North Saskatchewan
                Vistula
                Bug
                Vaal
                Shire
                Ogooué
                Nen
                Kızılırmak
                Markha
                Green
                Milk
                Chindwin
                Sankuru
                Wu
                Red
                James
                Kapuas
                Desna
                Helmand
                Madre de Dios
                Tietê
                Vychegda
                Sepik
                Cimarron
                Anadyr
                Paraíba do Sul
                Jialing
                Liard
                Cumberland
                White
                Huallaga
                Kwango
                Draa
                Gambia
                Tyung
                Chenab
                Yellowstone
                Ghaghara
                Huai
                Aras
                Chu
                Seversky Donets
                Bermejo
                Fly
                Kuskokwim
                Tennessee
                Oder
                Warta
                Aruwimi
                Daugava
                Gila
                Loire
                Essequibo
                Khoper
                Tagus
                Flinders
                

                Ironically, we’re going to classify them, according to the length of their name and not according to their length ;-))


                First, we’ll, roughly, estimate the maximum length of the listed names, with the generic regex (?-s)^.{N,}

                • Open the Replace window ( Ctrl + H )

                • Select the Regular expression search mode

                  • (?-s)^.{30,} and a click on the Count button => 0 matches

                  • (?-s)^.{25,} and a click on the Count button => 0 matches

                  • (?-s)^.{20,} and a click on the Count button => 1 match

                => The maximum length is between 20 and 25. So, we’ll rely on the upper boundary 25 in the subsequent regexes :


                For all the subsequent regex S/R :

                • Tick the Wrap around option

                • Click on the Replace All button, exclusively, to process each S/R

                We’ll begin to add 25 space chars, at end of each line of the list :

                SEARCH (?-s)^.+

                REPLACE $0 ( and type in 25 space characters, right after $0, in the Replace zone

                Note : In case, you would need, for an other list, additional space chars, at end of lines, just re-run this S/R to get 50, 75, 100, spaces and so on !


                Then, use the following regex S/R, in order to truncate any standard character, located after the 25 column :

                SEARCH (?-s)^.{25}\K.+

                REPLACE Leave EMPTY


                Now, we’re going to right justify all these names, with the regex S/R :

                SEARCH (?-s)^(.+?)(\x20{2,})$

                REPLACE \2\1

                You should get the following text ( I simply put the beginning and end of the list, in order to limit my post length ! ) :

                                     Nile
                               White Nile
                                   Kagera
                               Nyabarongo
                                    Mwogo
                                 Rukarara
                                   Amazon
                                  Ucayali
                                    Tambo
                                      Ene
                .........................
                .........................
                .........................
                                     Oder
                                    Warta
                                  Aruwimi
                                  Daugava
                                     Gila
                                    Loire
                                Essequibo
                                   Khoper
                                    Tagus
                                 Flinders
                

                Now, we perform the usual alphabetic sort ( Edit > Line Operations > Sort Lines Lexicographically Ascending ) and we get :

                                       Ob
                                       Wu
                                      Bug
                                      Chu
                                      Don
                                      Ene
                                      Fly
                                      Han
                                      Ili
                                      Içá
                                      Nen
                                      Oka
                                      Red
                                      Red
                                     Amur
                                     Aras
                                     Beni
                .........................
                .........................
                .........................
                             Hell Roaring
                             Murrumbidgee
                             Saskatchewan
                             Yellow River
                            Madre de Dios
                            Shatt al-Arab
                            São Francisco
                            Sênggê Zangbo
                           Northern Dvina
                           Paraíba do Sul
                           Saint Lawrence
                          Río de la Plata
                          Seversky Donets
                       Grande de Santiago
                       Nizhnyaya Tunguska
                       North Saskatchewan
                    Podkamennaya Tunguska
                

                To end, we get rid of all the leading spaces, with :

                SEARCH ^\x20+

                REPLACE Leave EMPTY

                and we get our expected list :

                Ob
                Wu
                Bug
                Chu
                Don
                Ene
                Fly
                Han
                Ili
                Içá
                Nen
                Oka
                Red
                Red
                Amur
                Aras
                Beni
                Biya
                Breg
                Draa
                ..............
                ..............
                ..............
                Saint Louis
                Saint Marys
                Yellowstone
                Hell Roaring
                Murrumbidgee
                Saskatchewan
                Yellow River
                Madre de Dios
                Shatt al-Arab
                São Francisco
                Sênggê Zangbo
                Northern Dvina
                Paraíba do Sul
                Saint Lawrence
                Río de la Plata
                Seversky Donets
                Grande de Santiago
                Nizhnyaya Tunguska
                North Saskatchewan
                Podkamennaya Tunguska
                

                Note that this kind of text manipulation should certainly be programmed, in a more elegant way, with a Python or Lua script ;-)) Unfortunately, my skills in that matter are quite poor :-((

                However, I’m sure that some gurus, as @alan-kilborn, @ekopalypse @peterjones or dail, will probably be able to give you a script solution, that, of course, will require you to install the Python or Lua interpreter !

                Hey, guys, it’s not a competition, OK !

                Best Regards,

                guy038

                Alan Kilborn 1 Reply Last reply Reply Quote 1
                • Alan Kilborn
                  Alan Kilborn @guy038 last edited by Alan Kilborn

                  @guy038 said:

                  Hey, guys, it’s not a competition, OK !

                  Haha. No, definitely not. A support forum is about giving posters options to solving problems where there is not a very clear answer. It seems we’ve done that so far in this thread! :)

                  BTW, that was what I anticipated: A lot of manual steps. :)

                  1 Reply Last reply Reply Quote 2
                  • guy038
                    guy038 last edited by

                    Hi, @dave-pruce, @alan-kilborn and All,

                    My previous list of rivers contained 5 duplicate names :

                    Red, Murray, Yenisei, Araguaia and Colorado

                    But this is not important, regarding our problem, anyway !

                    As you can see, @@dave-pruce, the Python solution, from Alan, is neater ! Isn’t it ?


                    Now, Alan, I’ve just tested your one-line script and, to my mind, there’s two problems :

                    • Inside a section of river names, of a same length, the names are not sorted alphabetically !

                    • Secondly, some names, containing accentuated characters, as, for instance, the Içá river, are located outside their section, as noticed, below :

                    Snake
                    Volta
                    Tobol
                    Jubba
                    Içá
                    Pecos
                    Benue
                    Iriri
                    Lerma
                    

                    Cheers,

                    guy038

                    Alan Kilborn 1 Reply Last reply Reply Quote 0
                    • Alan Kilborn
                      Alan Kilborn @guy038 last edited by

                      @guy038 said:

                      names are not sorted alphabetically

                      This is outside the scope of the originally stated problem! :)

                      containing accentuated characters…are located outside their section

                      The Python len function is apparently simple-minded in this case (using a simple byte count for the length of these strings containing multibyte characters).

                      Alan Kilborn 1 Reply Last reply Reply Quote 2
                      • Alan Kilborn
                        Alan Kilborn @Alan Kilborn last edited by

                        @Alan-Kilborn said:

                        The Python len function is apparently simple-minded in this case

                        Perhaps this new one-liner is better, for the case where the OP has Unicode data:

                        editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=lambda x:len(unicode(x,'utf-8')))))
                        

                        Of course, still big assumption that the OP is using (or is willing to use) Pythonscript! ;)

                        1 Reply Last reply Reply Quote 2
                        • guy038
                          guy038 last edited by guy038

                          @dave-pruce, @alan-kilborn,

                          Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

                          As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

                          For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

                          Anyway, we just did our best to solve the OP’s problem ;-))

                          BR

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright © 2014 NodeBB Forums | Contributors