Community
    • Login

    Solutions for manipulating columns of numbers, in tables, with regexes

    Scheduled Pinned Locked Moved General Discussion
    2 Posts 2 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi All,

      Here are below, some methods, relative at delimited range of numbers, placed in columns, which, I hope, could be useful to someone !

      So, let’s suppose that you have the simple delimited table, below :

      xxxxxxxxxxx    43        zzzzzzzz
      xxxxxxxxxxxxx  186519    zzzz
      xxxxxxxxxx     124071915 z
      xxxxxxxxxxxxxx 18711     zzz
      xxxxxxxxxx     7         zzzzzzz
      xxxxxxxxx      2025712   zz
      xxxxxxxxxxxxx  10399543  zzz
      xxxxxxxxxx     957       zzzzzzz
      xxxxxxxxxxxxx  1030117   zzzz
      xxxxxxxxxx     69        zzzzzzz
      xxxxxxxxxxx    10297881  zzzzzz
      xxxxxxxxxx     4320      zzzzz
      xxxxxxxxxx     0         zzzzzz
      xxxxxxxxxxxx   95082     zzz
      xxxxxxxxx      344613527 zz
      xxxxxxxxxxxxxx 885       zzzz
      xxxxxxxxx      1583      zzzzzz
      xxxxxxxx       347788    zzzzzzzzzzzz
      

      Now, you would like, of course, to align the different numbers to the right. Quite easy with the S/R, below, with regular expressions. So :

      • Firstly, find out the number of leading characters, before numbers. In our example, this number is 15

      • Use the following S/R, with the Regular expression search mode checked

        • SEARCH : (?-s)^.{15}\K(\d+)( +)(?= )

        • REPLACE : \2\1

      • Click on the Replace All button. Et voilà !

      => You should get the changed text, below :

      xxxxxxxxxxx           43 zzzzzzzz
      xxxxxxxxxxxxx     186519 zzzz
      xxxxxxxxxx     124071915 z
      xxxxxxxxxxxxxx     18711 zzz
      xxxxxxxxxx             7 zzzzzzz
      xxxxxxxxx        2025712 zz
      xxxxxxxxxxxxx   10399543 zzz
      xxxxxxxxxx           957 zzzzzzz
      xxxxxxxxxxxxx    1030117 zzzz
      xxxxxxxxxx            69 zzzzzzz
      xxxxxxxxxxx     10297881 zzzzzz
      xxxxxxxxxx          4320 zzzzz
      xxxxxxxxxx             0 zzzzzz
      xxxxxxxxxxxx       95082 zzz
      xxxxxxxxx      344613527 zz
      xxxxxxxxxxxxxx       885 zzzz
      xxxxxxxxx           1583 zzzzzz
      xxxxxxxx          347788 zzzzzzzzzzzz
      

      IMPORTANT : This regex works fine, whatever the maximum number of digits is ( 9 digits, in our case )

      NOTES :

      • The first part (?-s) ensures that the regex engine will consider any dot symbol as a standard character, only

      • Then the second part ^.{15}\K select the first 15 characters of each line and reset the regex engine position just after them, due to the \K form

      • The third part (\d+) stores the non null range of digits in group 1

      • Finally, the part ( +)(?= ) stores, in group 2, any non null range of space characters, which must be followed by a space, due to the look-ahead structure (?= )

      • In replacement, the two groups are, just, swapped !


      Now, let’s suppose that the third column is quite far, on the right, like below :

      xxxxxxxxxxx    43                                zzzzzzzz
      xxxxxxxxxxxxx  186519                            zzzz
      xxxxxxxxxx     124071915                         z
      xxxxxxxxxxxxxx 18711                             zzz
      xxxxxxxxxx     7                                 zzzzzzz
      xxxxxxxxx      2025712                           zz
      xxxxxxxxxxxxx  10399543                          zzz
      xxxxxxxxxx     957                               zzzzzzz
      xxxxxxxxxxxxx  1030117                           zzzz
      xxxxxxxxxx     69                                zzzzzzz
      xxxxxxxxxxx    10297881                          zzzzzz
      xxxxxxxxxx     4320                              zzzzz
      xxxxxxxxxx     0                                 zzzzzz
      xxxxxxxxxxxx   95082                             zzz
      xxxxxxxxx      344613527                         zz
      xxxxxxxxxxxxxx 885                               zzzz
      xxxxxxxxx      1583                              zzzzzz
      xxxxxxxx       347788                            zzzzzzzzzzzz
      

      Let M, the minimum number of characters between the digits and this third column ( M = 25, in our example )

      So, in addition to align the numbers to the right, you can adjust the preferred location of that column of digits, with the regex below, where 1<= N <= M-1

      SEARCH ^.{15}\K(\d+)( +)(?=( ){N}) ( Of course change the N variable by an integer ! )

      REPLACE \2\1

      For instance, if you use the search regex ^.{15}\K(\d+)( +)(?=( ){24}), the second column is the most closed to the first column, as below :

      xxxxxxxxxxx            43                        zzzzzzzz
      xxxxxxxxxxxxx      186519                        zzzz
      xxxxxxxxxx      124071915                        z
      xxxxxxxxxxxxxx      18711                        zzz
      xxxxxxxxxx              7                        zzzzzzz
      xxxxxxxxx         2025712                        zz
      xxxxxxxxxxxxx    10399543                        zzz
      xxxxxxxxxx            957                        zzzzzzz
      xxxxxxxxxxxxx     1030117                        zzzz
      xxxxxxxxxx             69                        zzzzzzz
      xxxxxxxxxxx      10297881                        zzzzzz
      xxxxxxxxxx           4320                        zzzzz
      xxxxxxxxxx              0                        zzzzzz
      xxxxxxxxxxxx        95082                        zzz
      xxxxxxxxx       344613527                        zz
      xxxxxxxxxxxxxx        885                        zzzz
      xxxxxxxxx            1583                        zzzzzz
      xxxxxxxx           347788                        zzzzzzzzzzzz
      

      And, if you use the search regex ^.{15}\K(\d+)( +)(?=( ){1}), the second column is the most closed to the third column, as below :

      xxxxxxxxxxx                                   43 zzzzzzzz
      xxxxxxxxxxxxx                             186519 zzzz
      xxxxxxxxxx                             124071915 z
      xxxxxxxxxxxxxx                             18711 zzz
      xxxxxxxxxx                                     7 zzzzzzz
      xxxxxxxxx                                2025712 zz
      xxxxxxxxxxxxx                           10399543 zzz
      xxxxxxxxxx                                   957 zzzzzzz
      xxxxxxxxxxxxx                            1030117 zzzz
      xxxxxxxxxx                                    69 zzzzzzz
      xxxxxxxxxxx                             10297881 zzzzzz
      xxxxxxxxxx                                  4320 zzzzz
      xxxxxxxxxx                                     0 zzzzzz
      xxxxxxxxxxxx                               95082 zzz
      xxxxxxxxx                              344613527 zz
      xxxxxxxxxxxxxx                               885 zzzz
      xxxxxxxxx                                   1583 zzzzzz
      xxxxxxxx                                  347788 zzzzzzzzzzzz
      

      Now, an other general problem may happen : How to separate these numbers, with a comma, for instance, each thousand, as in the number 12,345,678

      So, let’s take again our first result of the alignment, above :

      xxxxxxxxxxx           43 zzzzzzzz
      xxxxxxxxxxxxx     186519 zzzz
      xxxxxxxxxx     124071915 z
      xxxxxxxxxxxxxx     18711 zzz
      xxxxxxxxxx             7 zzzzzzz
      xxxxxxxxx        2025712 zz
      xxxxxxxxxxxxx   10399543 zzz
      xxxxxxxxxx           957 zzzzzzz
      xxxxxxxxxxxxx    1030117 zzzz
      xxxxxxxxxx            69 zzzzzzz
      xxxxxxxxxxx     10297881 zzzzzz
      xxxxxxxxxx          4320 zzzzz
      xxxxxxxxxx             0 zzzzzz
      xxxxxxxxxxxx       95082 zzz
      xxxxxxxxx      344613527 zz
      xxxxxxxxxxxxxx       885 zzzz
      xxxxxxxxx           1583 zzzzzz
      xxxxxxxx          347788 zzzzzzzzzzzz
      

      From this example, the comma or a space must be inserted :

      • Between the 21th and the 22th character of this table

      • Between the 18th and the 19th character of this table

      To do so, perform, successively, the two S/R, below :

      SEARCH (?-s)^.{20}\K((\d)(\d)|(.)(.))

      REPLACE (?2\2,\3:\4 \5)

      Then :

      SEARCH (?-s)^.{17}\K((\d)(\d)|(.)(.))

      REPLACE (?2\2,\3:\4 \5)

      At the end, you should get the formatted text, below :

      xxxxxxxxxxx             43 zzzzzzzz
      xxxxxxxxxxxxx      186,519 zzzz
      xxxxxxxxxx     124,071,915 z
      xxxxxxxxxxxxxx      18,711 zzz
      xxxxxxxxxx               7 zzzzzzz
      xxxxxxxxx        2,025,712 zz
      xxxxxxxxxxxxx   10,399,543 zzz
      xxxxxxxxxx             957 zzzzzzz
      xxxxxxxxxxxxx    1,030,117 zzzz
      xxxxxxxxxx              69 zzzzzzz
      xxxxxxxxxxx     10,297,881 zzzzzz
      xxxxxxxxxx           4,320 zzzzz
      xxxxxxxxxx               0 zzzzzz
      xxxxxxxxxxxx        95,082 zzz
      xxxxxxxxx      344,613,527 zz
      xxxxxxxxxxxxxx         885 zzzz
      xxxxxxxxx            1,583 zzzzzz
      xxxxxxxx           347,788 zzzzzzzzzzzz
      

      NOTES :

      • The first part (?-s) ensures that the regex engine will consider any dot symbol as a standard character, only

      • Then, thanks to the part ^.{20}\K, we get rid of the first 20 characters of each line

      • The final part (?:(\d)(\d)|(.)(.)) tries to match two digits ( groups 1 and 2 ) OR two standard characters ( groups 3 and 4 )

      • If group 1 exists ( the digit ) we rewrite, in replacement, the two digits ( groups 1 and 2 ), separated by a comma sign

      • If group 1 is absent, we rewrite, in replacement, the two standard characters ( groups 3 and 4 ), separated by a space character


      REMARK :

      An other form, of the replacement regex, could be \1\3(?1,: )\2\4 :

      • This syntax is also correct, as the groups 1 and 3, as well as the groups 2 and 4, CANNOT exist at the same time. When one group exists the other does not Then, it’s just be replaced by a zero-length string

      • As above, due to the part (?1,: ), the separating character is a comma sign if group 1 exists ( the digit ) and a space character, on the contrary

      • Of course, you may change the comma sign, by any other symbol !

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      • Vasile CarausV
        Vasile Caraus
        last edited by

        briliant !

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors