Organizing data
-
Hello, I have data that is in long rows, but I need to make different columns while keeping the rows in the certain order.
For example, I have this:
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2087 2.95E-02And I need it to look like this:
ospht421
4.52E-02 -1.20E-02 2.98E-03
3.19E-02 -1.93E-02 -6.87E-02
4.16E-02 1.71E-02 -2082.95E-02I have a lot of data and it takes forever to do it by hand. Any advice would be appreciated!
Thanks,
Kim -
Hello Kimberly Plomp,
Once more, this kind of modification of text can be easily achieved with regular expressions !
At first sight, your text seems to contain a list of numbers, in the classical E scientific notation ! Am I right ?
I supposed so. Therefore, I, first, tried to build a regex, in order to match such numbers. Taking in account, all the different syntaxes for these numbers, with an example below :
+9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
I got this regex :
[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d
-
The part
[+-]?
matches an optional sign ( remain that the regex quantifier?
means 0 or 1 time ) -
The part
\d+
matches the mantissa part, before the dot ( The regex quantifier+
means 1 or more time(s) ) -
The part
(\.(\d+)?)?
represents the optional fractional part of a number, itself composed of a dot\.
( which must be escaped as it’s a meta character in regexes ) and an optional digits(\d+)?
, after the dot -
The part
[Ee]
matches the E notation letter, whatever the case -
The part
[+-]
matches the mandatory sign -
The part
\d\d
represents the two digits of the exponent
Right. Now, from your text, below :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
We notice that any E notation number is preceded by a space. So, as you expect to split on several lines, every 3 numbers, we can build the final search regex
( [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}
, which will matches 3 numbers-
Enclosed in round brackets, it begins with a space and ends with the
{3}
quantifier -
The replacement part
\r\n$0
will replace this range of numbers by an Windows End of Line characters\r\n
, followed by the entire regex matched$0
-
Don’t forget, in the Replace dialog, to set the Regular expression search mode !
So , from the test example, below
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
After clicking on the Replace All button, we get the new text :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
Possibly, to get rid of the leading space(s), just use the simple search regex
^ +
and leave the replacement zone emptyEt voilà !
Best Regards,
guy038
-
-
Thanks, guy038!
I have never used Notepad++ before, so I am not familiar with how to use it. I have tried your suggestion in the Replace box but it say there are 0 replacements.
Find what: ([±]?\d+(.(\d+)?)?[Ee][±]\d\d){3}
Replace with: \r\n$0
And the Regular Expression box is ticked.I am sure it is just a silly mistake on my part, but I can’t seem to get it working.
Thanks!
Kim -
Hi Kim,
From your reply, it happens that one SPACE should be present, in the search regex, between the opening round bracket and the block
[+-]
!Well, I just realized that your numbers could be separated by more than one space characters ! Moreover, your file may, also, contain some tabulation characters. In that case, prefer the search regex below ( The replacement regex is unchanged ):
(\h+[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}
Notes :
-
The
\h
regex syntax matches one horizontal blank character, that is to say :-
The normal space character (
\x20
) -
The tabulation character (
\x09
) -
The No-Break space ( abbreviated NBSP ) (
\xA0
)
-
-
As usual, the
+
quantifier means one or more times the previous character
So given the test example below, where I put, between numbers, several spaces, in the first line, several tabulations in the second line and a mix of these two, in the third line :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
After clicking on the Replace All button, we get the following text :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
Now, we have to get rid of all the possible blanks characters, at the beginning of the lines and just keep one space character between the different numbers. The following S/R will achieve it, nicely :
Find what :
^\h+|(\h+)
Replace with :
(?1 )
, with a space after the digit 1Notes :
-
Due to the alternative regex symbol
|
, this search regex looks :-
for any non null range of blank characters, that begins the line ( the
^
symbol represents the zero length assertion beginning of line ) -
for any subsequent range of these blank characters, between the numbers
-
-
The second part
(\h+)
, ( the subsequent blanks ), is enclosed in round brackets, and represents the group 1 -
In replacement, we use the conditional form
(?n....)
:-
which replace the searched regex with the string located after digit
n
, till the ending round bracket, if the groupn
exists -
which does NO replacement if the group
n
does NOT exist
-
So :
-
When the first part, of the search regex, matches possible blank characters, at beginning of a line, group 1, does not exist. Therefore, there’s NO replacement
-
When the second part, of the search regex, matches blank characters, before the numbers, group 1 exists. Then they are replaced with the single space, located between the digit 1 and the ending round bracket, of the replacement regex
This time, after performing this second S/R, we obtain the final text below :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
Cheers,
guy038
-