Community
    • Login

    Need to extract csv from text file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 2 Posters 7.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • grubbyjeansG
      grubbyjeans
      last edited by

      I need to extract comma delimited data from text files that were created by exporting from Thunderbird. There is no info in the msg. body other than that which I need. The problem is that ALL email header info is included in the exported files.

      Can this be done in Notepad++
      Thanks

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @grubbyjeans
        last edited by

        Hello @grubbyjeans,

        I think so. If you have a csv export I assume that it has a fixed length
        meaning the number of commas per line is always the same.
        Then you could use find/replace dialog (regular expressions) to get the
        data you want.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • grubbyjeansG
          grubbyjeans
          last edited by

          Thanks Claudia
          The emails come singularly each time a registrant completes an online form. The body of the message is formatted as below:
          field1,data1
          field2,data2
          etc.

          I have the ability to select multiple msgs and export them as a csv file. The problem is that all the header info is also exported for each message. That places 6 lines of text before the data. The only way I’ve found to collect just the data is to highlight and delete the header information for each message, leaving the sets of data stacked on one another in columns as above.

          I was just hoping for a method of parsing for field names and extracting the data to spreadsheet form.

          Any suggestions appreciated
          Ray

          Claudia FrankC 1 Reply Last reply Reply Quote 0
          • Claudia FrankC
            Claudia Frank @grubbyjeans
            last edited by

            Hello Ray,

            if the field do have unique characteristics we can search for it.
            But I’m still unsure what you exactly want to achieve?
            Could you provide a real (sensitive data replaced of course) example
            of what you have and how it should look at the end?

            Cheers
            Claudia

            grubbyjeansG 1 Reply Last reply Reply Quote 0
            • grubbyjeansG
              grubbyjeans @Claudia Frank
              last edited by

              @Claudia-Frank said:

              Hello Ray,

              if the field do have unique characteristics we can search for it.
              But I’m still unsure what you exactly want to achieve?

              **I need to remove the header lines from the messages, leaving the csv data for manipulation.
              **

              Could you provide a real (sensitive data replaced of course) example
              of what you have and how it should look at the end?

              **I’ve created a macro that will convert the columnar data to comma delimited form which can be imported to spreadsheet.
              I have screen captures of each.
              First, a Thunderbird message exported to ‘Spreadsheet.csv’

              “TEST Acoustic Music Retreat 2017 Registration”,“Website builder@sitebuilderservice.com”,“grubbyjeans@yahoo.com”,3/24/2016 16:26, ,"Subject:
              TEST Acoustic Music Retreat 2017 Registration
              From:
              ““Website”” builder@sitebuilderservice.com
              Date:
              3/24/2016 4:26 PM
              To:
              grubbyjeans@yahoo.com

              firstname,Ray
              lastname,New
              street1,1234 my street
              street2,1234
              city,mytown
              state,mystate
              zipcode,77777
              emailaddress,grubbyjeans@yahoo.com
              homephone,123456789
              workphone,1
              instrument1,Guitar
              skill_level1,Intermediate
              instrument2,Mandolin
              skill_level2,Beginner
              instrument3,Mountain Dulcimer
              skill_level3,Intermediate
              "
              Second:
              Header lines removed manually and conversion macro run, converting to single line csv (no word wrap)
              Ray,New,1234 my street,1234,mytown,mystate,77777,grubbyjeans@yahoo.com,123456789,1,Guitar,Intermediate,Mandolin,Beginner,Mountain Dulcimer,Intermediate

              **

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank
                last edited by

                Hello Ray,

                I guess by copying the example you messed up the structure, because in earlier post
                you said 6 lines (header) must be deleted. So I assumed the following data is the one
                which need to be modified (Note the double quote " on the end)

                “TEST Acoustic Music Retreat 2017 Registration”,"Website builder@sitebuilderservice.com“,”grubbyjeans@yahoo.com",3/24/2016 16:26, ,"Subject:
                TEST Acoustic Music Retreat 2017 Registration
                From: ““Website”” builder@sitebuilderservice.com
                Date: 3/24/2016 4:26 PM
                To: grubbyjeans@yahoo.com
                
                firstname,Ray
                lastname,New
                street1,1234 my street
                street2,1234
                city,mytown
                state,mystate
                zipcode,77777
                emailaddress,grubbyjeans@yahoo.com
                homephone,123456789
                workphone,1
                instrument1,Guitar
                skill_level1,Intermediate
                instrument2,Mandolin
                skill_level2,Beginner
                instrument3,Mountain Dulcimer
                skill_level3,Intermediate
                "
                

                One macro recording with 2 regex could result in

                Ray,New,1234 my street,1234,mytown,mystate,77777,grubbyjeans@yahoo.com,123456789,1,Guitar,Intermediate,Mandolin,Beginner,Mountain Dulcimer,Intermediate,
                

                This is your goal, isn’t it.

                If so,
                record macro and
                press CTRL+HOME (to get cursor located at first position)
                press CTRL+H
                click regular expression in lower pane
                put into find what:

                ((.*?\R){6})((.*,.*\R)*)"
                

                put into replace with:

                \3
                

                press Replace all

                -> now we should only have left

                firstname,Ray
                lastname,New
                street1,1234 my street
                street2,1234
                city,mytown
                state,mystate
                zipcode,77777
                emailaddress,grubbyjeans@yahoo.com
                homephone,123456789
                workphone,1
                instrument1,Guitar
                skill_level1,Intermediate
                instrument2,Mandolin
                skill_level2,Beginner
                instrument3,Mountain Dulcimer
                skill_level3,Intermediate
                

                change find what with:

                (.*,(.*)\R)
                

                and

                replace with:

                \2,
                

                press replace all -> you should see the expected result
                press close and stop recording.

                Is this what you expected?

                Cheers
                Claudia

                grubbyjeansG 1 Reply Last reply Reply Quote 0
                • grubbyjeansG
                  grubbyjeans @Claudia Frank
                  last edited by

                  @Claudia-Frank
                  Brilliant!! It works perfectly. I converted over 300 records and imported to spreadsheet in seconds.

                  I don’t know where the ‘6 lines’ came from. I changed {6} to {9} and that worked.

                  Now I just have to analyze your coding for understanding. I’m a total noobe to Notepad++.

                  What is the process for posting examples, as you did in your response?

                  Thanks so much.
                  Ray

                  1 Reply Last reply Reply Quote 0
                  • Claudia FrankC
                    Claudia Frank
                    last edited by

                    Hi Ray,

                    when posting formatted code/text you need to indent by at least 4 spaces, then it will keep its layout.
                    The regular expression is explained, for searching here and for replacing here

                    What it does is the following: (high level - detailed infos on the mentioned website)
                    ( ) = this is a caturing group
                    \R = newline chars
                    {6} = should be repeated exactly 6 times
                    (.*?\R){6}) = basically means match 6 lines
                    (.*,.*\R) = match lines which have or not chars followed by a comma and again have chars or not followed by newline
                    (now I’m thinking .+,.+\R would be better because this means lines having atleast one char followed by comma and again at least one char followed by newline)
                    (.*,.*\R)* = the added * means it could be any number of lines or no line at all
                    " at the end basically means, last line needs to be the double quote

                    In replace, as we only use what was matched in \3, anything else get deleted.

                    As said, the links provide better and detailed explanation.

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors