Community
    • Login

    Column-aligning jagged data

    Scheduled Pinned Locked Moved General Discussion
    17 Posts 9 Posters 12.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by

      What’s the best/quickest/any way using Notepad++ to turn this text:

      trade                               Ground
      list                                Cry
      free                       print
      Told                                Supply
      square              stood
      metal                 do
      held                    shine
      large                              boy
      map                 table
      book                                car
      process               also
      thank                        young
      held                             if
      ship                       atom
      Have                         game
      thousand                          strong
      case              most
      head                      Tube
      those                          wait
      sudden            triangle
      while                                feed
      human                            order
      paint                   sight
      mouth                            rope
      Hair                     suffix
      want                        this
      hot                           salt
      call                            house
      similar                  experiment
      count                      rub
      quite            won't
      opposite                      no
      note              low
      process                       term
      to                              Fine
      Solution                       Season
      band                         block
      among                            direct
      who               These
      between                  sugar
      ice                              leg
      took                                symbol
      between                 Leg
      Design                Share
      quotient               segment
      

      Into this text, aside from hand-editing every line?:

      trade               Ground
      list                Cry
      free                print
      Told                Supply
      square              stood
      metal               do
      held                shine
      large               boy
      map                 table
      book                car
      process             also
      thank               young
      held                if
      ship                atom
      Have                game
      thousand            strong
      case                most
      head                Tube
      those               wait
      sudden              triangle
      while               feed
      human               order
      paint               sight
      mouth               rope
      Hair                suffix
      want                this
      hot                 salt
      call                house
      similar             experiment
      count               rub
      quite               won't
      opposite            no
      note                low
      process             term
      to                  Fine
      Solution            Season
      band                block
      among               direct
      who                 These
      between             sugar
      ice                 leg
      took                symbol
      between             Leg
      Design              Share
      quotient            segment
      

      Note that I don’t care about the amount of spaces between the 2 columns, just that the second column is aligned vertically. I can always quickly get a column-cursor between the 2 columns and adjust the quantity of whitespacing.

      1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn
        last edited by

        Note, doing an involved regex replacement (with potentially lots of precalculation) isn’t what I’m after here, sorry @guy038

        Tab characters aren’t in play either.

        1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones
          last edited by

          Personally, this looks like a question begging for a scripting environment solution, not a pure-NPP solution.

          I’m sure @Claudia-Frank could whip up the PythonScript solution in no time.

          Since I’m not a python guru, I’d choose my go-to language of Perl, and send it through a one-liner:

          perl -lpi.bak -e "$_ = sprintf qq(%-256s %s), split ' ', $_, 2" $(FULL_CURRENT_PATH)
          

          (It will run in NppExec or even through Run > Run)

          That oneliner assumes the first word is never more than 256 characters; you can use any width you want by replacing %-256s. (Nifty aside: if you don’t know the maximum length of the first-word strings, but didn’t care about characters beyond N [say, for example, 40 characters], you could use %-40.40s, which would then pad to 40characters when the first word is short, and truncate to 40 if it’s longer than 40.)

          That word-length assumption possibly violates your rejection of “precalculation”, but I am not a miracle worker, sorry; that’s why I picked 256 – I doubted there would be a “word” more than that, except in binary or genetic data. I could have made a two-pass script, but depending on file size, that might involve a lot of memory. I could have opened/parsed the file twice to avoid keeping it in memory and to auto-pre-compute the maximum width of the first word. But since it’s unlikely a random NPP user has perl installed, I won’t bother with a more-complicated perl solution, unless you ask for it.

          (Golf challenge: I’d like to see if Claudia, or other python guru, could make the Python Script or python.exe one-liner shorter…)

          Claudia FrankC 1 Reply Last reply Reply Quote 3
          • Claudia FrankC
            Claudia Frank @PeterJones
            last edited by

            @PeterJones

            I think I can’t make it shorter ;-)

            editor.setText('\n'.join(['{:<20} {}'.format(*x.split()) for x in editor.getText().splitlines()]))
            

            whereas 20 is assumed to be the length of the longest string in first column.

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 3
            • guy038G
              guy038
              last edited by guy038

              Hi, @alan-kilborn, @claudia-frank, @peterjones, and All,

              Nevertheless, it’s quite simple, indeed !! I propose to you 3 different regex S/R :

              SEARCH ^.{12}\K + , with a space before the plus sign

              REPLACE EMPTY

              or

              SEARCH (?<=^.{12}) + , with a space before the plus sign

              REPLACE EMPTY

              or

              SEARCH ^(.{12}) + , with a space before the plus sign

              REPLACE \1

              Notes :

              • For the first two S/R, you must use the Replace All button only ( The step by step replacement does NOT work, due to the \K syntax or the look-behind )

              • The last S/R accept hitting on the Replace button, too !

              • Note that these regexes need that the blank character, is, exclusively, the space character !


              Now, Alan, let’s try something more tricky : I simply copy all your list again, on the right, using the column mode !

              ----|----1----|----2----|----3----|----4----|----5----|----6----|----7----|----8----|----9----|----A----|----B
              
              trade                               Ground           trade                               Ground
              list                                Cry              list                                Cry
              free                       print                     free                       print
              Told                                Supply           Told                                Supply
              square              stood                            square              stood
              metal                 do                             metal                 do
              held                    shine                        held                    shine
              large                              boy               large                              boy
              map                 table                            map                 table
              book                                car              book                                car
              process               also                           process               also
              thank                        young                   thank                        young
              held                             if                  held                             if
              ship                       atom                      ship                       atom
              Have                         game                    Have                         game
              thousand                          strong             thousand                          strong
              case              most                               case              most
              head                      Tube                       head                      Tube
              those                          wait                  those                          wait
              sudden            triangle                           sudden            triangle
              while                                feed            while                                feed
              human                            order               human                            order
              paint                   sight                        paint                   sight
              mouth                            rope                mouth                            rope
              Hair                     suffix                      Hair                     suffix
              want                        this                     want                        this
              hot                           salt                   hot                           salt
              call                            house                call                            house
              similar                  experiment                  similar                  experiment
              count                      rub                       count                      rub
              quite            won't                               quite            won't
              opposite                      no                     opposite                      no
              note              low                                note              low
              process                       term                   process                       term
              to                              Fine                 to                              Fine
              Solution                       Season                Solution                       Season
              band                         block                   band                         block
              among                            direct              among                            direct
              who               These                              who               These
              between                  sugar                       between                  sugar
              ice                              leg                 ice                              leg
              took                                symbol           took                                symbol
              between                 Leg                          between                 Leg
              Design                Share                          Design                Share
              quotient               segment                       quotient               segment
              

              Then :

              • Place your cursor just, under the ruler and before the first item trade

              • Open the Replace dialog

              • Leave the Replace with: zone EMPTY

              • Type, in the Find what: zone, the regex (?-s)^.{12}\K + , with a space before the plus sign

              • Click on the Replace All button

              => The second column is aligned :-)) Of course, the third and fourth ones are not aligned

              • Now, change the number 12 by the number 27, in the Find what: zone

              • Click, again, on the Replace All button

              => The third column is now aligned :-))

              • Now, change the number 27 by 43, in the Find what: zone

              • Click, a last time, on the Replace All button

              => All the columns are well aligned…, as below. Et voilà ! Note that the columns begin at positions 12+1, 27+1 and 43+1

              ----|----1----|----2----|----3----|----4----|----5----|----6----|----7----|----8----|----9----|----A----|----B
              
              trade       Ground         trade           Ground
              list        Cry            list            Cry
              free        print          free            print
              Told        Supply         Told            Supply
              square      stood          square          stood
              metal       do             metal           do
              held        shine          held            shine
              large       boy            large           boy
              map         table          map             table
              book        car            book            car
              process     also           process         also
              thank       young          thank           young
              held        if             held            if
              ship        atom           ship            atom
              Have        game           Have            game
              thousand    strong         thousand        strong
              case        most           case            most
              head        Tube           head            Tube
              those       wait           those           wait
              sudden      triangle       sudden          triangle
              while       feed           while           feed
              human       order          human           order
              paint       sight          paint           sight
              mouth       rope           mouth           rope
              Hair        suffix         Hair            suffix
              want        this           want            this
              hot         salt           hot             salt
              call        house          call            house
              similar     experiment     similar         experiment
              count       rub            count           rub
              quite       won't          quite           won't
              opposite    no             opposite        no
              note        low            note            low
              process     term           process         term
              to          Fine           to              Fine
              Solution    Season         Solution        Season
              band        block          band            block
              among       direct         among           direct
              who         These          who             These
              between     sugar          between         sugar
              ice         leg            ice             leg
              took        symbol         took            symbol
              between     Leg            between         Leg
              Design      Share          Design          Share
              quotient    segment        quotient        segment
              

              Of course, I just evaluated, roughly, at each step, where the next column should begin, according to the longest string of the previous column. I don’t know, Alan, if you consider this way as a lot of pre-calculation steps !!

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 4
              • Scott SumnerS
                Scott Sumner
                last edited by

                I will play reverse-golf and make @Claudia-Frank 's version longer but IMO better…and still one line:

                editor.setText(['\r\n', '\r', '\n'][notepad.getFormatType()].join([('{:<' + str(editor.getColumn(editor.getCurrentPos())-1) + '} {}').format(*x.split()) for x in editor.getText().splitlines()]))
                

                Two changes:

                • do correct line-endings, not Linux–sorry Claudia!–line-endings
                • start the aligned data in the column the caret is in when the script is run (be sure to leave the caret in a column greater than the longest entry in the leftmost data “column”!)
                1 Reply Last reply Reply Quote 3
                • Alan KilbornA
                  Alan Kilborn
                  last edited by

                  I deserve what I get because I didn’t quite ask in the right way. I was sort of looking for the solution to the general case. But in presenting example text I got specific answers to solve that specific thing (2 columns, whole file). Don’t get me wrong, the answers I got were awesome!–thanks to responders! Good ideas, all!

                  Of the answers I think Scott’s (put caret in column…and then run script) starts getting at the interactivity I was hoping for. Another clarifying situation might be what if I want this to only affect certain lines, or only after a certain column point on specific lines…

                  So I guess the main answer is something like this is best served by scripting, although in the end I did like Guy’s regexes (although i did try to head off his enthusiasm for them with my earlier post).

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @alan-kilborn,

                    Another clarifying situation might be what if I want this to only affect certain lines, or only after a certain column point on specific lines…

                    • Concerning the possibility to change text, after a specific column point c, simply use the regex ^.{c+ε}\K\x20+

                    • Concerning reducing text changed to a specific block of lines, do a normal selection of your range of lines, first. So, when opening the Replace dialog, the In selection option is automatically ticked, and the Replace All operation is performed on the selection, only :-))

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Jim DaileyJ
                      Jim Dailey
                      last edited by

                      @PeterJones

                      The old man wasn’t invited to the tournament. Nevertheless, he ambled over to the tee box and took a swing with an ancient wooden driver that has been meticulously maintained for more than 40 years:

                      gawk "{printf \"%-256s%s\n\",$1,$2}" $(FULL_CURRENT_PATH)
                      

                      :-)

                      1 Reply Last reply Reply Quote 5
                      • dailD
                        dail
                        last edited by

                        Somewhat tangential but possibly a solution is the Elastic Tabstops plugin. Its would only require a single tab between columns but has the disadvantage of only working within Notepad++ itself.

                        1 Reply Last reply Reply Quote 4
                        • cipher-1024C
                          cipher-1024
                          last edited by

                          Neither was the simpleton invited to the tournament but he stumbled up to the tee and out from his bag fell a TextFX plugin and hideous python script that would make a crow blush:

                          # coding: iso-8859-1
                          selected = editor.getSelText()
                          selStart = editor.getSelectionStart()
                          #replace any existing commas with a weird char
                          selected = selected.replace(",", chr(174))
                          #replace the double spaces
                          while ( selected.find("  ") > 0 ):
                          	selected = selected.replace("  ", " ")
                          #replace the spaces with commas since our 'line up' function uses commas
                          selected = selected.replace(" ", ",")
                          selEnd = len(selected)
                          editor.replaceSel(selected)
                          #re-select the selection
                          editor.setSelectionStart(selStart)
                          editor.setSelectionEnd(selStart + selEnd)
                          notepad.runMenuCommand("TextFX Edit", "Line up multiple lines by (,)")
                          notepad.runMenuCommand("TextFX Edit", "E:Line up multiple lines by (,)")
                          selected = editor.getSelText()
                          #take out the lineup commas
                          selected = selected.replace(",", " ")
                          #put back any original commas
                          selected = selected.replace(chr(174), ",")
                          editor.replaceSel(selected)
                          

                          This works for any number of columns, and only on lines in the current selection. It makes the columns as narrow as possible. I’m not really sure how you would line up things after a certain column point though.

                          1 Reply Last reply Reply Quote 4
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @cipher-1024, and All,

                            I’m thinking about an other solution, which still use the TextFX plugin but which avoids this [ hideous :-D ] Python Script !

                            • First, use the following regex S/R :

                            SEARCH \x20+

                            REPLACE \x60

                            Note : I, specially, chose the Unicode Grave Accent character ( U+0060 ) , as a dummy character, because it is, both, rarely used in programming languages, ( AFAIK ! ) and part of all character encodings, as belonging to the international ASCII encoding ( from Unicode U+0000 to U+007F )

                            • Copy a single ` ( Grave Accent ) in the clipboard, hitting the Ctrl + C shortcut ( IMPORTANT )

                            • Now, do a normal selection of the text, which is to be aligned

                            • Click on the menu choice TextFX > TextFX Edit > Line up multiples lines by (Clipboard Character)

                            • Finally, use the regex, below, to delete the dummy Grave Accent character ` and add some space characters between columns, with a possible delimiter character !

                            SEARCH \x60

                            REPLACE \x20\x20\x20

                            OR, for instance :

                            SEARCH \x60

                            REPLACE \x20\x20|\x20\x20

                            Cheers,

                            guy038

                            1 Reply Last reply Reply Quote 2
                            • PeterJonesP
                              PeterJones
                              last edited by PeterJones

                              Other than “rarely used in programming languages,” I like that answer.

                              Perl uses a pair of Grave Accents (aka “backticks”) as an often-used alternate for the qx// quote-like syntax for running a shell command and placing the command’s output in a string.

                              SQL uses backticks for denoting identifiers, such as field names.

                              Markdown uses it for embedding inline fixed width text, like:

                              embedding `inline` fixed width text
                              

                              But if you know your text has no backticks, then it’s a great choice.

                              If your data might have backticks, I would use U+001C (\x1c), the Field Separator FS character, which is a control code found in ASCII. (I won’t make the claim that it’s “rarely used” in text files or programming language source code… but I’ve never seen it intentionally used in such. :-) )

                              I think this style of solution meets the original requirements of not requiring complicated S/R regex or precomputing, which is nice.

                              1 Reply Last reply Reply Quote 2
                              • guy038G
                                guy038
                                last edited by guy038

                                Hi, @PeterJones and All,

                                So, I strongly apologize ! My programming skills are weaker than most N++ users’s ones :-D.

                                BTW, Peter, just have a look to the link, below :

                                https://en.wikipedia.org/wiki/C0_and_C1_control_codes

                                it seems, that the C0 Control character ( \x1C ) rather refers to the File Separator control character ! Anyway, your idea, about using a Control character, is great ! And, if we follow the description notes, it would be logical to prefer the US Control character \x1F :-D

                                Cheers,

                                guy038

                                1 Reply Last reply Reply Quote 1
                                • Alan KilbornA
                                  Alan Kilborn
                                  last edited by

                                  IMO the ultimate solution to the question I originally posed is found HERE.

                                  1 Reply Last reply Reply Quote 2
                                  • artie-finkelsteinA
                                    artie-finkelstein
                                    last edited by

                                    I didn’t check the date of the original posting but did immediately say "that’s a job for BetterMultiSelection. It was very satisfying to be able to figure out how to solve that problem. (It took me a few attempts, but that’s why Ctrl-Z exists)

                                    Thank you to @Alan-Kilborn for a wonderful lesson. It really helped drive home @astrosofista’s examples.

                                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                                    • Alan KilbornA
                                      Alan Kilborn @artie-finkelstein
                                      last edited by Alan Kilborn

                                      @artie-finkelstein

                                      …It really helped drive home @astrosofista’s examples

                                      This was not my intention; perhaps you misunderstood.
                                      I was linking directly to a posting, not the larger thread, for the awesome solution to the problem posed here in this thread.
                                      The linked posting discusses using Ctrl+Delete, not any plugin.

                                      Plugins (including Better Multiselection) are great, but even better is when something available natively is the solution to something. And the Ctrl+Delete technique is available natively.

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors