Solutions for manipulating columns of numbers, in tables, with regexes



  • Hi All,

    Here are below, some methods, relative at delimited range of numbers, placed in columns, which, I hope, could be useful to someone !

    So, let’s suppose that you have the simple delimited table, below :

    xxxxxxxxxxx    43        zzzzzzzz
    xxxxxxxxxxxxx  186519    zzzz
    xxxxxxxxxx     124071915 z
    xxxxxxxxxxxxxx 18711     zzz
    xxxxxxxxxx     7         zzzzzzz
    xxxxxxxxx      2025712   zz
    xxxxxxxxxxxxx  10399543  zzz
    xxxxxxxxxx     957       zzzzzzz
    xxxxxxxxxxxxx  1030117   zzzz
    xxxxxxxxxx     69        zzzzzzz
    xxxxxxxxxxx    10297881  zzzzzz
    xxxxxxxxxx     4320      zzzzz
    xxxxxxxxxx     0         zzzzzz
    xxxxxxxxxxxx   95082     zzz
    xxxxxxxxx      344613527 zz
    xxxxxxxxxxxxxx 885       zzzz
    xxxxxxxxx      1583      zzzzzz
    xxxxxxxx       347788    zzzzzzzzzzzz
    

    Now, you would like, of course, to align the different numbers to the right. Quite easy with the S/R, below, with regular expressions. So :

    • Firstly, find out the number of leading characters, before numbers. In our example, this number is 15

    • Use the following S/R, with the Regular expression search mode checked

      • SEARCH : (?-s)^.{15}\K(\d+)( +)(?= )

      • REPLACE : \2\1

    • Click on the Replace All button. Et voilà !

    => You should get the changed text, below :

    xxxxxxxxxxx           43 zzzzzzzz
    xxxxxxxxxxxxx     186519 zzzz
    xxxxxxxxxx     124071915 z
    xxxxxxxxxxxxxx     18711 zzz
    xxxxxxxxxx             7 zzzzzzz
    xxxxxxxxx        2025712 zz
    xxxxxxxxxxxxx   10399543 zzz
    xxxxxxxxxx           957 zzzzzzz
    xxxxxxxxxxxxx    1030117 zzzz
    xxxxxxxxxx            69 zzzzzzz
    xxxxxxxxxxx     10297881 zzzzzz
    xxxxxxxxxx          4320 zzzzz
    xxxxxxxxxx             0 zzzzzz
    xxxxxxxxxxxx       95082 zzz
    xxxxxxxxx      344613527 zz
    xxxxxxxxxxxxxx       885 zzzz
    xxxxxxxxx           1583 zzzzzz
    xxxxxxxx          347788 zzzzzzzzzzzz
    

    IMPORTANT : This regex works fine, whatever the maximum number of digits is ( 9 digits, in our case )

    NOTES :

    • The first part (?-s) ensures that the regex engine will consider any dot symbol as a standard character, only

    • Then the second part ^.{15}\K select the first 15 characters of each line and reset the regex engine position just after them, due to the \K form

    • The third part (\d+) stores the non null range of digits in group 1

    • Finally, the part ( +)(?= ) stores, in group 2, any non null range of space characters, which must be followed by a space, due to the look-ahead structure (?= )

    • In replacement, the two groups are, just, swapped !


    Now, let’s suppose that the third column is quite far, on the right, like below :

    xxxxxxxxxxx    43                                zzzzzzzz
    xxxxxxxxxxxxx  186519                            zzzz
    xxxxxxxxxx     124071915                         z
    xxxxxxxxxxxxxx 18711                             zzz
    xxxxxxxxxx     7                                 zzzzzzz
    xxxxxxxxx      2025712                           zz
    xxxxxxxxxxxxx  10399543                          zzz
    xxxxxxxxxx     957                               zzzzzzz
    xxxxxxxxxxxxx  1030117                           zzzz
    xxxxxxxxxx     69                                zzzzzzz
    xxxxxxxxxxx    10297881                          zzzzzz
    xxxxxxxxxx     4320                              zzzzz
    xxxxxxxxxx     0                                 zzzzzz
    xxxxxxxxxxxx   95082                             zzz
    xxxxxxxxx      344613527                         zz
    xxxxxxxxxxxxxx 885                               zzzz
    xxxxxxxxx      1583                              zzzzzz
    xxxxxxxx       347788                            zzzzzzzzzzzz
    

    Let M, the minimum number of characters between the digits and this third column ( M = 25, in our example )

    So, in addition to align the numbers to the right, you can adjust the preferred location of that column of digits, with the regex below, where 1<= N <= M-1

    SEARCH ^.{15}\K(\d+)( +)(?=( ){N}) ( Of course change the N variable by an integer ! )

    REPLACE \2\1

    • For instance, if you use the search regex ^.{15}\K(\d+)( +)(?=( ){24}), the second column is the most closed to the first column, as below :

      xxxxxxxxxxx 43 zzzzzzzz
      xxxxxxxxxxxxx 186519 zzzz
      xxxxxxxxxx 124071915 z
      xxxxxxxxxxxxxx 18711 zzz
      xxxxxxxxxx 7 zzzzzzz
      xxxxxxxxx 2025712 zz
      xxxxxxxxxxxxx 10399543 zzz
      xxxxxxxxxx 957 zzzzzzz
      xxxxxxxxxxxxx 1030117 zzzz
      xxxxxxxxxx 69 zzzzzzz
      xxxxxxxxxxx 10297881 zzzzzz
      xxxxxxxxxx 4320 zzzzz
      xxxxxxxxxx 0 zzzzzz
      xxxxxxxxxxxx 95082 zzz
      xxxxxxxxx 344613527 zz
      xxxxxxxxxxxxxx 885 zzzz
      xxxxxxxxx 1583 zzzzzz
      xxxxxxxx 347788 zzzzzzzzzzzz

    • And, if you use the search regex ^.{15}\K(\d+)( +)(?=( ){1}), the second column is the most closed to the third column, as below :

      xxxxxxxxxxx 43 zzzzzzzz
      xxxxxxxxxxxxx 186519 zzzz
      xxxxxxxxxx 124071915 z
      xxxxxxxxxxxxxx 18711 zzz
      xxxxxxxxxx 7 zzzzzzz
      xxxxxxxxx 2025712 zz
      xxxxxxxxxxxxx 10399543 zzz
      xxxxxxxxxx 957 zzzzzzz
      xxxxxxxxxxxxx 1030117 zzzz
      xxxxxxxxxx 69 zzzzzzz
      xxxxxxxxxxx 10297881 zzzzzz
      xxxxxxxxxx 4320 zzzzz
      xxxxxxxxxx 0 zzzzzz
      xxxxxxxxxxxx 95082 zzz
      xxxxxxxxx 344613527 zz
      xxxxxxxxxxxxxx 885 zzzz
      xxxxxxxxx 1583 zzzzzz
      xxxxxxxx 347788 zzzzzzzzzzzz


    Now, an other general problem may happen : How to separate these numbers, with a comma, for instance, each thousand, as in the number 12,345,678

    So, let’s take again our first result of the alignment, above :

    xxxxxxxxxxx           43 zzzzzzzz
    xxxxxxxxxxxxx     186519 zzzz
    xxxxxxxxxx     124071915 z
    xxxxxxxxxxxxxx     18711 zzz
    xxxxxxxxxx             7 zzzzzzz
    xxxxxxxxx        2025712 zz
    xxxxxxxxxxxxx   10399543 zzz
    xxxxxxxxxx           957 zzzzzzz
    xxxxxxxxxxxxx    1030117 zzzz
    xxxxxxxxxx            69 zzzzzzz
    xxxxxxxxxxx     10297881 zzzzzz
    xxxxxxxxxx          4320 zzzzz
    xxxxxxxxxx             0 zzzzzz
    xxxxxxxxxxxx       95082 zzz
    xxxxxxxxx      344613527 zz
    xxxxxxxxxxxxxx       885 zzzz
    xxxxxxxxx           1583 zzzzzz
    xxxxxxxx          347788 zzzzzzzzzzzz
    

    From this example, the comma or a space must be inserted :

    • Between the 21th and the 22th character of this table

    • Between the 18th and the 19th character of this table

    To do so, perform, successively, the two S/R, below :

    SEARCH (?-s)^.{20}\K((\d)(\d)|(.)(.))

    REPLACE (?2\2,\3:\4 \5)

    Then :

    SEARCH (?-s)^.{17}\K((\d)(\d)|(.)(.))

    REPLACE (?2\2,\3:\4 \5)

    At the end, you should get the formatted text, below :

    xxxxxxxxxxx             43 zzzzzzzz
    xxxxxxxxxxxxx      186,519 zzzz
    xxxxxxxxxx     124,071,915 z
    xxxxxxxxxxxxxx      18,711 zzz
    xxxxxxxxxx               7 zzzzzzz
    xxxxxxxxx        2,025,712 zz
    xxxxxxxxxxxxx   10,399,543 zzz
    xxxxxxxxxx             957 zzzzzzz
    xxxxxxxxxxxxx    1,030,117 zzzz
    xxxxxxxxxx              69 zzzzzzz
    xxxxxxxxxxx     10,297,881 zzzzzz
    xxxxxxxxxx           4,320 zzzzz
    xxxxxxxxxx               0 zzzzzz
    xxxxxxxxxxxx        95,082 zzz
    xxxxxxxxx      344,613,527 zz
    xxxxxxxxxxxxxx         885 zzzz
    xxxxxxxxx            1,583 zzzzzz
    xxxxxxxx           347,788 zzzzzzzzzzzz
    

    NOTES :

    • The first part (?-s) ensures that the regex engine will consider any dot symbol as a standard character, only

    • Then, thanks to the part ^.{20}\K, we get rid of the first 20 characters of each line

    • The final part ((\d)(\d)|(.)(.)) tries to match two digits ( groups 2 and 3 ) OR two standard characters ( groups 4 and 5 )

    • If group 2 exists ( the digit ) we rewrite, in replacement, the two digits ( groups 2 and 3 ), separated by a comma sign

    • If group 2 is absent, we rewrite, in replacement, the two standard characters ( groups 4 and 5 ), separated buy a space character


    REMARK :

    An other form, of the replacement regex, could be \2\4(?2,: )\3\5 :

    • This syntax is correct, as the groups 2 and 4, as well as the groups 3 and 5, CANNOT exist at the same time. When one group exists the other doesn’t Then, it’s just be replaced by a zero-length string

    • As above, due to the part (?2,: ), the separating character is a comma sign if group 2 exists and a space character, on the contrary

    • Of course, you may change the comma sign, by any other symbol !

    Best Regards,

    guy038



  • briliant !


Log in to reply