Community
    • Login

    Organizing data

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Kimberly PlompK
      Kimberly Plomp
      last edited by Kimberly Plomp

      Hello, I have data that is in long rows, but I need to make different columns while keeping the rows in the certain order.

      For example, I have this:
      ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2087 2.95E-02

      And I need it to look like this:
      ospht421
      4.52E-02 -1.20E-02 2.98E-03
      3.19E-02 -1.93E-02 -6.87E-02
      4.16E-02 1.71E-02 -2082.95E-02

      I have a lot of data and it takes forever to do it by hand. Any advice would be appreciated!

      Thanks,
      Kim

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello Kimberly Plomp,

        Once more, this kind of modification of text can be easily achieved with regular expressions !

        At first sight, your text seems to contain a list of numbers, in the classical E scientific notation ! Am I right ?

        I supposed so. Therefore, I, first, tried to build a regex, in order to match such numbers. Taking in account, all the different syntaxes for these numbers, with an example below :

        +9.9999998E+03
        +1.23e-07
        -5.4625e+04
        -6.258E-00
        5.789E+02
        1.008E-06
        
        +9.E+03
        +1.E-07
        -5.e+04
        -6.E-00
        5.E+02
        1.e-06
        
        +9E+03
        +1e-07
        -5E+04
        -6e-00
        5E+02
        1E-06
        

        I got this regex : [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d

        • The part [+-]? matches an optional sign ( remain that the regex quantifier ? means 0 or 1 time )

        • The part \d+ matches the mantissa part, before the dot ( The regex quantifier + means 1 or more time(s) )

        • The part (\.(\d+)?)? represents the optional fractional part of a number, itself composed of a dot \. ( which must be escaped as it’s a meta character in regexes ) and an optional digits (\d+)?, after the dot

        • The part [Ee] matches the E notation letter, whatever the case

        • The part [+-] matches the mandatory sign

        • The part \d\d represents the two digits of the exponent

        Right. Now, from your text, below :

        ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
        

        We notice that any E notation number is preceded by a space. So, as you expect to split on several lines, every 3 numbers, we can build the final search regex ( [+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}, which will matches 3 numbers

        • Enclosed in round brackets, it begins with a space and ends with the {3} quantifier

        • The replacement part \r\n$0 will replace this range of numbers by an Windows End of Line characters \r\n, followed by the entire regex matched $0

        • Don’t forget, in the Replace dialog, to set the Regular expression search mode !

        So , from the test example, below

        ospht421 4.52E-02 -1.20E-02 2.98E-03 3.19E-02 -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
        
        ospht000 +9.9999998E+03 +1.23e-07 -5.4625e+04 -6.258E-00 5.789E+02 1.008E-06 +9.E+03 +1.E-07 -5.e+04
        
        ospht999 -6.E-00 5.E+02 1.e-06 +9E+03 +1e-07 -5E+04 -6e-00 5E+02 1E-06
        

        After clicking on the Replace All button, we get the new text :

        ospht421
         4.52E-02 -1.20E-02 2.98E-03
         3.19E-02 -1.93E-02 -6.87E-02
         4.16E-02 1.71E-02 -2082.95E-02
        
        ospht000
         +9.9999998E+03 +1.23e-07 -5.4625e+04
         -6.258E-00 5.789E+02 1.008E-06
         +9.E+03 +1.E-07 -5.e+04
        
        ospht999
         -6.E-00 5.E+02 1.e-06
         +9E+03 +1e-07 -5E+04
         -6e-00 5E+02 1E-06
        

        Possibly, to get rid of the leading space(s), just use the simple search regex ^ + and leave the replacement zone empty

        Et voilà !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • Kimberly PlompK
          Kimberly Plomp
          last edited by

          Thanks, guy038!

          I have never used Notepad++ before, so I am not familiar with how to use it. I have tried your suggestion in the Replace box but it say there are 0 replacements.

          Find what: ([±]?\d+(.(\d+)?)?[Ee][±]\d\d){3}
          Replace with: \r\n$0
          And the Regular Expression box is ticked.

          I am sure it is just a silly mistake on my part, but I can’t seem to get it working.

          Thanks!
          Kim

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi Kim,

            From your reply, it happens that one SPACE should be present, in the search regex, between the opening round bracket and the block [+-] !

            Well, I just realized that your numbers could be separated by more than one space characters ! Moreover, your file may, also, contain some tabulation characters. In that case, prefer the search regex below ( The replacement regex is unchanged ):

            (\h+[+-]?\d+(\.(\d+)?)?[Ee][+-]\d\d){3}

            Notes :

            • The \h regex syntax matches one horizontal blank character, that is to say :

              • The normal space character ( \x20 )

              • The tabulation character ( \x09 )

              • The No-Break space ( abbreviated NBSP ) ( \xA0 )

            • As usual, the + quantifier means one or more times the previous character


            So given the test example below, where I put, between numbers, several spaces, in the first line, several tabulations in the second line and a mix of these two, in the third line :

                ospht421 4.52E-02     -1.20E-02 2.98E-03          3.19E-02             -1.93E-02 -6.87E-02 4.16E-02 1.71E-02 -2082.95E-02
            
            ospht000	+9.9999998E+03	+1.23e-07	-5.4625e+04	-6.258E-00	5.789E+02	1.008E-06	+9.E+03	+1.E-07	-5.e+04
            
                    ospht999         		-6.E-00 5.E+02			        1.e-06 +9E+03 +1e-07 -5E+04    		    	-6e-00 5E+02 1E-06
            

            After clicking on the Replace All button, we get the following text :

                    ospht421
             4.52E-02     -1.20E-02 2.98E-03
                      3.19E-02             -1.93E-02 -6.87E-02
             4.16E-02 1.71E-02 -2082.95E-02
                
                ospht000
            	+9.9999998E+03	+1.23e-07	-5.4625e+04
            	-6.258E-00	5.789E+02	1.008E-06
            	+9.E+03	+1.E-07	-5.e+04
                
                        ospht999
                     		-6.E-00 5.E+02			        1.e-06
             +9E+03 +1e-07 -5E+04
                		    	-6e-00 5E+02 1E-06
            

            Now, we have to get rid of all the possible blanks characters, at the beginning of the lines and just keep one space character between the different numbers. The following S/R will achieve it, nicely :

            Find what : ^\h+|(\h+)

            Replace with : (?1 ) , with a space after the digit 1

            Notes :

            • Due to the alternative regex symbol |, this search regex looks :

              • for any non null range of blank characters, that begins the line ( the ^ symbol represents the zero length assertion beginning of line )

              • for any subsequent range of these blank characters, between the numbers

            • The second part (\h+), ( the subsequent blanks ), is enclosed in round brackets, and represents the group 1

            • In replacement, we use the conditional form (?n....) :

              • which replace the searched regex with the string located after digit n , till the ending round bracket, if the group n exists

              • which does NO replacement if the group n does NOT exist

            So :

            • When the first part, of the search regex, matches possible blank characters, at beginning of a line, group 1, does not exist. Therefore, there’s NO replacement

            • When the second part, of the search regex, matches blank characters, before the numbers, group 1 exists. Then they are replaced with the single space, located between the digit 1 and the ending round bracket, of the replacement regex

            This time, after performing this second S/R, we obtain the final text below :

            ospht421
            4.52E-02 -1.20E-02 2.98E-03
            3.19E-02 -1.93E-02 -6.87E-02
            4.16E-02 1.71E-02 -2082.95E-02
            
            ospht000
            +9.9999998E+03 +1.23e-07 -5.4625e+04
            -6.258E-00 5.789E+02 1.008E-06
            +9.E+03 +1.E-07 -5.e+04
            
            ospht999
            -6.E-00 5.E+02 1.e-06
            +9E+03 +1e-07 -5E+04
            -6e-00 5E+02 1E-06
            

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 1
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors