Solutions for manipulating columns of numbers, in tables, with regexes
-
Hi All,
Here are below, some methods, relative at delimited range of numbers, placed in columns, which, I hope, could be useful to someone !
So, let’s suppose that you have the simple delimited table, below :
xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
Now, you would like, of course, to align the different numbers to the right. Quite easy with the S/R, below, with regular expressions. So :
-
Firstly, find out the number of leading characters, before numbers. In our example, this number is 15
-
Use the following S/R, with the Regular expression search mode checked
-
SEARCH :
(?-s)^.{15}\K(\d+)( +)(?= )
-
REPLACE :
\2\1
-
-
Click on the Replace All button. Et voilà !
=> You should get the changed text, below :
xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
IMPORTANT : This regex works fine, whatever the maximum number of digits is ( 9 digits, in our case )
NOTES :
-
The first part
(?-s)
ensures that the regex engine will consider any dot symbol as a standard character, only -
Then the second part
^.{15}\K
select the first 15 characters of each line and reset the regex engine position just after them, due to the\K
form -
The third part
(\d+)
stores the non null range of digits in group 1 -
Finally, the part
( +)(?= )
stores, in group 2, any non null range of space characters, which must be followed by a space, due to the look-ahead structure(?= )
-
In replacement, the two groups are, just, swapped !
Now, let’s suppose that the third column is quite far, on the right, like below :
xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
Let M, the minimum number of characters between the digits and this third column ( M = 25, in our example )
So, in addition to align the numbers to the right, you can adjust the preferred location of that column of digits, with the regex below, where
1<= N <= M-1
SEARCH
^.{15}\K(\d+)( +)(?=( ){N})
( Of course change the N variable by an integer ! )REPLACE
\2\1
For instance, if you use the search regex
^.{15}\K(\d+)( +)(?=( ){24})
, the second column is the most closed to the first column, as below :xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
And, if you use the search regex
^.{15}\K(\d+)( +)(?=( ){1})
, the second column is the most closed to the third column, as below :xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
Now, an other general problem may happen : How to separate these numbers, with a comma, for instance, each thousand, as in the number 12,345,678
So, let’s take again our first result of the alignment, above :
xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186519 zzzz xxxxxxxxxx 124071915 z xxxxxxxxxxxxxx 18711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2025712 zz xxxxxxxxxxxxx 10399543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1030117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10297881 zzzzzz xxxxxxxxxx 4320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95082 zzz xxxxxxxxx 344613527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1583 zzzzzz xxxxxxxx 347788 zzzzzzzzzzzz
From this example, the comma or a space must be inserted :
-
Between the 21th and the 22th character of this table
-
Between the 18th and the 19th character of this table
To do so, perform, successively, the two S/R, below :
SEARCH
(?-s)^.{20}\K((\d)(\d)|(.)(.))
REPLACE
(?2\2,\3:\4 \5)
Then :
SEARCH
(?-s)^.{17}\K((\d)(\d)|(.)(.))
REPLACE
(?2\2,\3:\4 \5)
At the end, you should get the formatted text, below :
xxxxxxxxxxx 43 zzzzzzzz xxxxxxxxxxxxx 186,519 zzzz xxxxxxxxxx 124,071,915 z xxxxxxxxxxxxxx 18,711 zzz xxxxxxxxxx 7 zzzzzzz xxxxxxxxx 2,025,712 zz xxxxxxxxxxxxx 10,399,543 zzz xxxxxxxxxx 957 zzzzzzz xxxxxxxxxxxxx 1,030,117 zzzz xxxxxxxxxx 69 zzzzzzz xxxxxxxxxxx 10,297,881 zzzzzz xxxxxxxxxx 4,320 zzzzz xxxxxxxxxx 0 zzzzzz xxxxxxxxxxxx 95,082 zzz xxxxxxxxx 344,613,527 zz xxxxxxxxxxxxxx 885 zzzz xxxxxxxxx 1,583 zzzzzz xxxxxxxx 347,788 zzzzzzzzzzzz
NOTES :
-
The first part
(?-s)
ensures that the regex engine will consider any dot symbol as a standard character, only -
Then, thanks to the part
^.{20}\K
, we get rid of the first 20 characters of each line -
The final part
(?:(\d)(\d)|(.)(.))
tries to match two digits ( groups 1 and 2 ) OR two standard characters ( groups 3 and 4 ) -
If group 1 exists ( the digit ) we rewrite, in replacement, the two digits ( groups 1 and 2 ), separated by a comma sign
-
If group 1 is absent, we rewrite, in replacement, the two standard characters ( groups 3 and 4 ), separated by a space character
REMARK :
An other form, of the replacement regex, could be
\1\3(?1,: )\2\4
:-
This syntax is also correct, as the groups 1 and 3, as well as the groups 2 and 4, CANNOT exist at the same time. When one group exists the other does not Then, it’s just be replaced by a zero-length string
-
As above, due to the part
(?1,: )
, the separating character is a comma sign if group 1 exists ( the digit ) and a space character, on the contrary -
Of course, you may change the comma sign, by any other symbol !
Best Regards,
guy038
-
-
briliant !