Organizing data
-
Hello, I have data that is in long rows, but I need to make different columns while keeping the rows in the certain order.
For example, I have this:
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2087 2.95E-02And I need it to look like this:
ospht421
4.52E-02 -1.20E-02 2.98E-03
3.19E-02 -1.93E-02 -6.87E-02
4.16E-02 1.71E-02 -2082.95E-02I have a lot of data and it takes forever to do it by hand. Any advice would be appreciated!
Thanks,
Kim -
Hello Kimberly Plomp,
Once more, this kind of modification of text can be easily achieved with regular expressions !
At first sight, your text seems to contain a list of numbers, in the classical E scientific notation ! Am I right ?
I supposed so. Therefore, I, first, tried to build a regex, in order to match such numbers. Taking in account, all the different syntaxes for these numbers, with an example below :
+9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06I got this regex :
[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d-
The part
[+-]?matches an optional sign ( remain that the regex quantifier?means 0 or 1 time ) -
The part
\d+matches the mantissa part, before the dot ( The regex quantifier+means 1 or more time(s) ) -
The part
(\.(\d+)?)?represents the optional fractional part of a number, itself composed of a dot\.( which must be escaped as it’s a meta character in regexes ) and an optional digits(\d+)?, after the dot -
The part
[Ee]matches the E notation letter, whatever the case -
The part
[+-]matches the mandatory sign -
The part
\d\drepresents the two digits of the exponent
Right. Now, from your text, below :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02We notice that any E notation number is preceded by a space. So, as you expect to split on several lines, every 3 numbers, we can build the final search regex
( [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}, which will matches 3 numbers-
Enclosed in round brackets, it begins with a space and ends with the
{3}quantifier -
The replacement part
\r\n$0will replace this range of numbers by an Windows End of Line characters\r\n, followed by the entire regex matched$0 -
Don’t forget, in the Replace dialog, to set the Regular expression search mode !
So , from the test example, below
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06After clicking on the Replace All button, we get the new text :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06Possibly, to get rid of the leading space(s), just use the simple search regex
^ +and leave the replacement zone emptyEt voilà !
Best Regards,
guy038
-
-
Thanks, guy038!
I have never used Notepad++ before, so I am not familiar with how to use it. I have tried your suggestion in the Replace box but it say there are 0 replacements.
Find what: ([±]?\d+(.(\d+)?)?[Ee][±]\d\d){3}
Replace with: \r\n$0
And the Regular Expression box is ticked.I am sure it is just a silly mistake on my part, but I can’t seem to get it working.
Thanks!
Kim -
Hi Kim,
From your reply, it happens that one SPACE should be present, in the search regex, between the opening round bracket and the block
[+-]!Well, I just realized that your numbers could be separated by more than one space characters ! Moreover, your file may, also, contain some tabulation characters. In that case, prefer the search regex below ( The replacement regex is unchanged ):
(\h+[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}Notes :
-
The
\hregex syntax matches one horizontal blank character, that is to say :-
The normal space character (
\x20) -
The tabulation character (
\x09) -
The No-Break space ( abbreviated NBSP ) (
\xA0)
-
-
As usual, the
+quantifier means one or more times the previous character
So given the test example below, where I put, between numbers, several spaces, in the first line, several tabulations in the second line and a mix of these two, in the third line :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06After clicking on the Replace All button, we get the following text :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06Now, we have to get rid of all the possible blanks characters, at the beginning of the lines and just keep one space character between the different numbers. The following S/R will achieve it, nicely :
Find what :
^\h+|(\h+)Replace with :
(?1 ), with a space after the digit 1Notes :
-
Due to the alternative regex symbol
|, this search regex looks :-
for any non null range of blank characters, that begins the line ( the
^symbol represents the zero length assertion beginning of line ) -
for any subsequent range of these blank characters, between the numbers
-
-
The second part
(\h+), ( the subsequent blanks ), is enclosed in round brackets, and represents the group 1 -
In replacement, we use the conditional form
(?n....):-
which replace the searched regex with the string located after digit
n, till the ending round bracket, if the groupnexists -
which does NO replacement if the group
ndoes NOT exist
-
So :
-
When the first part, of the search regex, matches possible blank characters, at beginning of a line, group 1, does not exist. Therefore, there’s NO replacement
-
When the second part, of the search regex, matches blank characters, before the numbers, group 1 exists. Then they are replaced with the single space, located between the digit 1 and the ending round bracket, of the replacement regex
This time, after performing this second S/R, we obtain the final text below :
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02 ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04 ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06Cheers,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login