Organizing data

Kimberly Plomp

Hello, I have data that is in long rows, but I need to make different columns while keeping the rows in the certain order.

For example, I have this:
ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2087 2.95E-02

And I need it to look like this:
ospht421
4.52E-02 -1.20E-02 2.98E-03
3.19E-02 -1.93E-02 -6.87E-02
4.16E-02 1.71E-02 -2082.95E-02

I have a lot of data and it takes forever to do it by hand. Any advice would be appreciated!

Thanks,
Kim

guy038

Hello Kimberly Plomp,

Once more, this kind of modification of text can be easily achieved with regular expressions !

At first sight, your text seems to contain a list of numbers, in the classical E scientific notation ! Am I right ?

I supposed so. Therefore, I, first, tried to build a regex, in order to match such numbers. Taking in account, all the different syntaxes for these numbers, with an example below :

+9.9999998E+03
+1.23e-07
-5.4625e+04
-6.258E-00
5.789E+02
1.008E-06

+9.E+03
+1.E-07
-5.e+04
-6.E-00
5.E+02
1.e-06

+9E+03
+1e-07
-5E+04
-6e-00
5E+02
1E-06

I got this regex : [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d

The part [+-]? matches an optional sign ( remain that the regex quantifier ? means 0 or 1 time )
The part \d+ matches the mantissa part, before the dot ( The regex quantifier + means 1 or more time(s) )
The part (\.(\d+)?)? represents the optional fractional part of a number, itself composed of a dot \. ( which must be escaped as it’s a meta character in regexes ) and an optional digits (\d+)?, after the dot
The part [Ee] matches the E notation letter, whatever the case
The part [+-] matches the mandatory sign
The part \d\d represents the two digits of the exponent

Right. Now, from your text, below :

ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02

We notice that any E notation number is preceded by a space. So, as you expect to split on several lines, every 3 numbers, we can build the final search regex ( [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}, which will matches 3 numbers

Enclosed in round brackets, it begins with a space and ends with the {3} quantifier
The replacement part \r\n$0 will replace this range of numbers by an Windows End of Line characters \r\n, followed by the entire regex matched $0
Don’t forget, in the Replace dialog, to set the Regular expression search mode !

So , from the test example, below

ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02

ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04

ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06

After clicking on the Replace All button, we get the new text :

ospht421
 4.52E-02 -1.20E-02 2.98E-03
 3.19E-02 -1.93E-02 -6.87E-02
 4.16E-02 1.71E-02 -2082.95E-02

ospht000
 +9.9999998E+03 +1.23e-07 -5.4625e+04
 -6.258E-00 5.789E+02 1.008E-06
 +9.E+03 +1.E-07 -5.e+04

ospht999
 -6.E-00 5.E+02 1.e-06
 +9E+03 +1e-07 -5E+04
 -6e-00 5E+02 1E-06

Possibly, to get rid of the leading space(s), just use the simple search regex ^ + and leave the replacement zone empty

Et voilà !

Best Regards,

guy038

Kimberly Plomp

Thanks, guy038!

I have never used Notepad++ before, so I am not familiar with how to use it. I have tried your suggestion in the Replace box but it say there are 0 replacements.

Find what: ([±]?\d+(.(\d+)?)?[Ee][±]\d\d){3}
Replace with: \r\n$0
And the Regular Expression box is ticked.

I am sure it is just a silly mistake on my part, but I can’t seem to get it working.

Thanks!
Kim

guy038

Hi Kim,

From your reply, it happens that one SPACE should be present, in the search regex, between the opening round bracket and the block [+-] !

Well, I just realized that your numbers could be separated by more than one space characters ! Moreover, your file may, also, contain some tabulation characters. In that case, prefer the search regex below ( The replacement regex is unchanged ):

(\h+[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}

Notes :

The \h regex syntax matches one horizontal blank character, that is to say :
- The normal space character ( \x20 )
- The tabulation character ( \x09 )
- The No-Break space ( abbreviated NBSP ) ( \xA0 )
As usual, the + quantifier means one or more times the previous character

So given the test example below, where I put, between numbers, several spaces, in the first line, several tabulations in the second line and a mix of these two, in the third line :

    ospht421 4.52E-02     -1.20E-02 2.98E-03          3.19E-02             -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02

ospht000	+9.9999998E+03	+1.23e-07	-5.4625e+04	-6.258E-00	5.789E+02	1.008E-06	+9.E+03	+1.E-07	-5.e+04

        ospht999         		-6.E-00 5.E+02			        1.e-06 +9E+03 +1e-07 -5E+04    		    	-6e-00 5E+02 1E-06

After clicking on the Replace All button, we get the following text :

        ospht421
 4.52E-02     -1.20E-02 2.98E-03
          3.19E-02             -1.93E-02 -6.87E-02
 4.16E-02 1.71E-02 -2082.95E-02
    
    ospht000
	+9.9999998E+03	+1.23e-07	-5.4625e+04
	-6.258E-00	5.789E+02	1.008E-06
	+9.E+03	+1.E-07	-5.e+04
    
            ospht999
         		-6.E-00 5.E+02			        1.e-06
 +9E+03 +1e-07 -5E+04
    		    	-6e-00 5E+02 1E-06

Now, we have to get rid of all the possible blanks characters, at the beginning of the lines and just keep one space character between the different numbers. The following S/R will achieve it, nicely :

Find what : ^\h+|(\h+)

Replace with : (?1 ) , with a space after the digit 1

Notes :

Due to the alternative regex symbol |, this search regex looks :
- for any non null range of blank characters, that begins the line ( the ^ symbol represents the zero length assertion beginning of line )
- for any subsequent range of these blank characters, between the numbers
The second part (\h+), ( the subsequent blanks ), is enclosed in round brackets, and represents the group 1
In replacement, we use the conditional form (?n....) :
- which replace the searched regex with the string located after digit n , till the ending round bracket, if the group n exists
- which does NO replacement if the group n does NOT exist

So :

When the first part, of the search regex, matches possible blank characters, at beginning of a line, group 1, does not exist. Therefore, there’s NO replacement
When the second part, of the search regex, matches blank characters, before the numbers, group 1 exists. Then they are replaced with the single space, located between the digit 1 and the ending round bracket, of the replacement regex

This time, after performing this second S/R, we obtain the final text below :

ospht421
4.52E-02 -1.20E-02 2.98E-03
3.19E-02 -1.93E-02 -6.87E-02
4.16E-02 1.71E-02 -2082.95E-02

ospht000
+9.9999998E+03 +1.23e-07 -5.4625e+04
-6.258E-00 5.789E+02 1.008E-06
+9.E+03 +1.E-07 -5.e+04

ospht999
-6.E-00 5.E+02 1.e-06
+9E+03 +1e-07 -5E+04
-6e-00 5E+02 1E-06

Cheers,

guy038