• Login
Community
  • Login

Organizing data

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 2 Posters 3.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K
    Kimberly Plomp
    last edited by Kimberly Plomp May 16, 2016, 5:22 AM May 16, 2016, 5:21 AM

    Hello, I have data that is in long rows, but I need to make different columns while keeping the rows in the certain order.

    For example, I have this:
    ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2087 2.95E-02

    And I need it to look like this:
    ospht421
    4.52E-02 -1.20E-02 2.98E-03
    3.19E-02 -1.93E-02 -6.87E-02
    4.16E-02 1.71E-02 -2082.95E-02

    I have a lot of data and it takes forever to do it by hand. Any advice would be appreciated!

    Thanks,
    Kim

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 May 16, 2016, 11:55 AM May 16, 2016, 11:52 AM

      Hello Kimberly Plomp,

      Once more, this kind of modification of text can be easily achieved with regular expressions !

      At first sight, your text seems to contain a list of numbers, in the classical E scientific notation ! Am I right ?

      I supposed so. Therefore, I, first, tried to build a regex, in order to match such numbers. Taking in account, all the different syntaxes for these numbers, with an example below :

      +9.9999998E+03
      +1.23e-07
      -5.4625e+04
      -6.258E-00
      5.789E+02
      1.008E-06
      
      +9.E+03
      +1.E-07
      -5.e+04
      -6.E-00
      5.E+02
      1.e-06
      
      +9E+03
      +1e-07
      -5E+04
      -6e-00
      5E+02
      1E-06
      

      I got this regex : [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d

      • The part [+-]? matches an optional sign ( remain that the regex quantifier ? means 0 or 1 time )

      • The part \d+ matches the mantissa part, before the dot ( The regex quantifier + means 1 or more time(s) )

      • The part (\.(\d+)?)? represents the optional fractional part of a number, itself composed of a dot \. ( which must be escaped as it’s a meta character in regexes ) and an optional digits (\d+)?, after the dot

      • The part [Ee] matches the E notation letter, whatever the case

      • The part [+-] matches the mandatory sign

      • The part \d\d represents the two digits of the exponent

      Right. Now, from your text, below :

      ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
      

      We notice that any E notation number is preceded by a space. So, as you expect to split on several lines, every 3 numbers, we can build the final search regex ( [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}, which will matches 3 numbers

      • Enclosed in round brackets, it begins with a space and ends with the {3} quantifier

      • The replacement part \r\n$0 will replace this range of numbers by an Windows End of Line characters \r\n, followed by the entire regex matched $0

      • Don’t forget, in the Replace dialog, to set the Regular expression search mode !

      So , from the test example, below

      ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
      
      ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04
      
      ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
      

      After clicking on the Replace All button, we get the new text :

      ospht421
       4.52E-02 -1.20E-02 2.98E-03
       3.19E-02 -1.93E-02 -6.87E-02
       4.16E-02 1.71E-02 -2082.95E-02
      
      ospht000
       +9.9999998E+03 +1.23e-07 -5.4625e+04
       -6.258E-00 5.789E+02 1.008E-06
       +9.E+03 +1.E-07 -5.e+04
      
      ospht999
       -6.E-00 5.E+02 1.e-06
       +9E+03 +1e-07 -5E+04
       -6e-00 5E+02 1E-06
      

      Possibly, to get rid of the leading space(s), just use the simple search regex ^ + and leave the replacement zone empty

      Et voilà !

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 1
      • K
        Kimberly Plomp
        last edited by May 17, 2016, 4:50 PM

        Thanks, guy038!

        I have never used Notepad++ before, so I am not familiar with how to use it. I have tried your suggestion in the Replace box but it say there are 0 replacements.

        Find what: ([±]?\d+(.(\d+)?)?[Ee][±]\d\d){3}
        Replace with: \r\n$0
        And the Regular Expression box is ticked.

        I am sure it is just a silly mistake on my part, but I can’t seem to get it working.

        Thanks!
        Kim

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 May 17, 2016, 7:58 PM May 17, 2016, 7:48 PM

          Hi Kim,

          From your reply, it happens that one SPACE should be present, in the search regex, between the opening round bracket and the block [+-] !

          Well, I just realized that your numbers could be separated by more than one space characters ! Moreover, your file may, also, contain some tabulation characters. In that case, prefer the search regex below ( The replacement regex is unchanged ):

          (\h+[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}

          Notes :

          • The \h regex syntax matches one horizontal blank character, that is to say :

            • The normal space character ( \x20 )

            • The tabulation character ( \x09 )

            • The No-Break space ( abbreviated NBSP ) ( \xA0 )

          • As usual, the + quantifier means one or more times the previous character


          So given the test example below, where I put, between numbers, several spaces, in the first line, several tabulations in the second line and a mix of these two, in the third line :

              ospht421 4.52E-02     -1.20E-02 2.98E-03          3.19E-02             -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
          
          ospht000	+9.9999998E+03	+1.23e-07	-5.4625e+04	-6.258E-00	5.789E+02	1.008E-06	+9.E+03	+1.E-07	-5.e+04
          
                  ospht999         		-6.E-00 5.E+02			        1.e-06 +9E+03 +1e-07 -5E+04    		    	-6e-00 5E+02 1E-06
          

          After clicking on the Replace All button, we get the following text :

                  ospht421
           4.52E-02     -1.20E-02 2.98E-03
                    3.19E-02             -1.93E-02 -6.87E-02
           4.16E-02 1.71E-02 -2082.95E-02
              
              ospht000
          	+9.9999998E+03	+1.23e-07	-5.4625e+04
          	-6.258E-00	5.789E+02	1.008E-06
          	+9.E+03	+1.E-07	-5.e+04
              
                      ospht999
                   		-6.E-00 5.E+02			        1.e-06
           +9E+03 +1e-07 -5E+04
              		    	-6e-00 5E+02 1E-06
          

          Now, we have to get rid of all the possible blanks characters, at the beginning of the lines and just keep one space character between the different numbers. The following S/R will achieve it, nicely :

          Find what : ^\h+|(\h+)

          Replace with : (?1 ) , with a space after the digit 1

          Notes :

          • Due to the alternative regex symbol |, this search regex looks :

            • for any non null range of blank characters, that begins the line ( the ^ symbol represents the zero length assertion beginning of line )

            • for any subsequent range of these blank characters, between the numbers

          • The second part (\h+), ( the subsequent blanks ), is enclosed in round brackets, and represents the group 1

          • In replacement, we use the conditional form (?n....) :

            • which replace the searched regex with the string located after digit n , till the ending round bracket, if the group n exists

            • which does NO replacement if the group n does NOT exist

          So :

          • When the first part, of the search regex, matches possible blank characters, at beginning of a line, group 1, does not exist. Therefore, there’s NO replacement

          • When the second part, of the search regex, matches blank characters, before the numbers, group 1 exists. Then they are replaced with the single space, located between the digit 1 and the ending round bracket, of the replacement regex

          This time, after performing this second S/R, we obtain the final text below :

          ospht421
          4.52E-02 -1.20E-02 2.98E-03
          3.19E-02 -1.93E-02 -6.87E-02
          4.16E-02 1.71E-02 -2082.95E-02
          
          ospht000
          +9.9999998E+03 +1.23e-07 -5.4625e+04
          -6.258E-00 5.789E+02 1.008E-06
          +9.E+03 +1.E-07 -5.e+04
          
          ospht999
          -6.E-00 5.E+02 1.e-06
          +9E+03 +1e-07 -5E+04
          -6e-00 5E+02 1E-06
          

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 1
          1 out of 4
          • First post
            1/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors