<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Need to extract csv from text file]]></title><description><![CDATA[<p dir="auto">I need to extract comma delimited data from text files that were created by exporting from Thunderbird. There is no info in the msg. body other than that which I need. The problem is that ALL email header info is included in the exported files.</p>
<p dir="auto">Can this be done in Notepad++<br />
Thanks</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/11503/need-to-extract-csv-from-text-file</link><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 13:27:00 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/11503.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 23 Mar 2016 19:29:28 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Need to extract csv from text file on Fri, 25 Mar 2016 22:58:37 GMT]]></title><description><![CDATA[<p dir="auto">Hi Ray,</p>
<p dir="auto">when posting formatted code/text you need to indent by at least 4 spaces, then it will keep its layout.<br />
The regular expression is explained, for searching <a href="http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html" rel="nofollow ugc">here</a> and for replacing <a href="http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html" rel="nofollow ugc">here</a></p>
<p dir="auto">What it does is the following: (high level - detailed infos on the mentioned website)<br />
( )  =  this is a caturing group<br />
\R = newline chars<br />
{6} = should be repeated exactly 6 times<br />
(.*?\R){6}) = basically means match 6 lines<br />
(.*,.*\R) = match lines which have or not chars followed by a comma and again have chars or not followed by newline<br />
(now I’m thinking .+,.+\R would be better because this means lines having atleast one char followed by comma and again at least one char followed by newline)<br />
(.*,.*\R)* = the added * means it could be any number of lines or no line at all<br />
" at the end basically means, last line needs to be the double quote</p>
<p dir="auto">In replace, as we only use what was matched in \3, anything else get deleted.</p>
<p dir="auto">As said, the links provide better and detailed explanation.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14923</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14923</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Fri, 25 Mar 2016 22:58:37 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Fri, 25 Mar 2016 17:58:38 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3662">@Claudia-Frank</a><br />
Brilliant!! It works perfectly. I converted over 300 records and imported to spreadsheet in seconds.</p>
<p dir="auto">I don’t know where the ‘6 lines’ came from. I changed {6} to {9} and that worked.</p>
<p dir="auto">Now I just have to analyze your coding for understanding. I’m a total noobe to Notepad++.</p>
<p dir="auto">What is the process for posting examples, as you did in your response?</p>
<p dir="auto">Thanks so much.<br />
Ray</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14918</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14918</guid><dc:creator><![CDATA[grubbyjeans]]></dc:creator><pubDate>Fri, 25 Mar 2016 17:58:38 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Thu, 24 Mar 2016 23:06:24 GMT]]></title><description><![CDATA[<p dir="auto">Hello Ray,</p>
<p dir="auto">I guess by copying the example you messed up the structure, because in earlier post<br />
you said 6 lines (header) must be deleted. So I assumed the following data is the one<br />
which need to be modified (Note the double quote " on the end)</p>
<pre><code>“TEST Acoustic Music Retreat 2017 Registration”,"Website builder@sitebuilderservice.com“,”grubbyjeans@yahoo.com",3/24/2016 16:26, ,"Subject:
TEST Acoustic Music Retreat 2017 Registration
From: ““Website”” builder@sitebuilderservice.com
Date: 3/24/2016 4:26 PM
To: grubbyjeans@yahoo.com

firstname,Ray
lastname,New
street1,1234 my street
street2,1234
city,mytown
state,mystate
zipcode,77777
emailaddress,grubbyjeans@yahoo.com
homephone,123456789
workphone,1
instrument1,Guitar
skill_level1,Intermediate
instrument2,Mandolin
skill_level2,Beginner
instrument3,Mountain Dulcimer
skill_level3,Intermediate
"
</code></pre>
<p dir="auto">One macro recording with 2 regex could result in</p>
<pre><code>Ray,New,1234 my street,1234,mytown,mystate,77777,grubbyjeans@yahoo.com,123456789,1,Guitar,Intermediate,Mandolin,Beginner,Mountain Dulcimer,Intermediate,
</code></pre>
<p dir="auto">This is your goal, isn’t it.</p>
<p dir="auto">If so,<br />
record macro and<br />
press CTRL+HOME (to get cursor located at first position)<br />
press CTRL+H<br />
click regular expression in lower pane<br />
put into find what:</p>
<pre><code>((.*?\R){6})((.*,.*\R)*)"
</code></pre>
<p dir="auto">put into replace with:</p>
<pre><code>\3
</code></pre>
<p dir="auto">press Replace all</p>
<p dir="auto">-&gt; now we should only have left</p>
<pre><code>firstname,Ray
lastname,New
street1,1234 my street
street2,1234
city,mytown
state,mystate
zipcode,77777
emailaddress,grubbyjeans@yahoo.com
homephone,123456789
workphone,1
instrument1,Guitar
skill_level1,Intermediate
instrument2,Mandolin
skill_level2,Beginner
instrument3,Mountain Dulcimer
skill_level3,Intermediate
</code></pre>
<p dir="auto">change find what with:</p>
<pre><code>(.*,(.*)\R)
</code></pre>
<p dir="auto">and</p>
<p dir="auto">replace with:</p>
<pre><code>\2,
</code></pre>
<p dir="auto">press replace all -&gt; you should see the expected result<br />
press close and stop recording.</p>
<p dir="auto">Is this what you expected?</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14897</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14897</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Thu, 24 Mar 2016 23:06:24 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Thu, 24 Mar 2016 21:46:20 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3662">@Claudia-Frank</a> said:</p>
<blockquote>
<p dir="auto">Hello Ray,</p>
<p dir="auto">if the field do have unique characteristics we can search for it.<br />
But I’m still unsure what you exactly want to achieve?</p>
</blockquote>
<p dir="auto">**I need to remove the header lines from the messages, leaving the csv data for manipulation.<br />
**</p>
<blockquote>
<p dir="auto">Could you provide a real (sensitive data replaced of course) example<br />
of what you have and how it should look at the end?</p>
</blockquote>
<p dir="auto">**I’ve created a macro that will convert the columnar data to comma delimited form which can be imported to spreadsheet.<br />
I have screen captures of each.<br />
First, a Thunderbird message exported to ‘Spreadsheet.csv’</p>
<p dir="auto">“TEST Acoustic Music Retreat 2017 Registration”,“Website <a href="mailto:builder@sitebuilderservice.com" rel="nofollow ugc">builder@sitebuilderservice.com</a>”,“<a href="mailto:grubbyjeans@yahoo.com" rel="nofollow ugc">grubbyjeans@yahoo.com</a>”,3/24/2016 16:26, ,"Subject:<br />
TEST Acoustic Music Retreat 2017 Registration<br />
From:<br />
““Website”” <a href="mailto:builder@sitebuilderservice.com" rel="nofollow ugc">builder@sitebuilderservice.com</a><br />
Date:<br />
3/24/2016 4:26 PM<br />
To:<br />
<a href="mailto:grubbyjeans@yahoo.com" rel="nofollow ugc">grubbyjeans@yahoo.com</a></p>
<p dir="auto">firstname,Ray<br />
lastname,New<br />
street1,1234 my street<br />
street2,1234<br />
city,mytown<br />
state,mystate<br />
zipcode,77777<br />
<a href="mailto:emailaddress,grubbyjeans@yahoo.com" rel="nofollow ugc">emailaddress,grubbyjeans@yahoo.com</a><br />
homephone,123456789<br />
workphone,1<br />
instrument1,Guitar<br />
skill_level1,Intermediate<br />
instrument2,Mandolin<br />
skill_level2,Beginner<br />
instrument3,Mountain Dulcimer<br />
skill_level3,Intermediate<br />
"<br />
Second:<br />
Header lines removed manually and conversion macro run, converting to single line csv (no word wrap)<br />
Ray,New,1234 my <a href="mailto:street,1234,mytown,mystate,77777,grubbyjeans@yahoo.com" rel="nofollow ugc">street,1234,mytown,mystate,77777,grubbyjeans@yahoo.com</a>,123456789,1,Guitar,Intermediate,Mandolin,Beginner,Mountain Dulcimer,Intermediate</p>
<p dir="auto">**</p>
<blockquote>
<p dir="auto">Cheers<br />
Claudia</p>
</blockquote>
]]></description><link>https://community.notepad-plus-plus.org/post/14893</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14893</guid><dc:creator><![CDATA[grubbyjeans]]></dc:creator><pubDate>Thu, 24 Mar 2016 21:46:20 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Thu, 24 Mar 2016 17:05:26 GMT]]></title><description><![CDATA[<p dir="auto">Hello Ray,</p>
<p dir="auto">if the field do have unique characteristics we can search for it.<br />
But I’m still unsure what you exactly want to achieve?<br />
Could you provide a real (sensitive data replaced of course) example<br />
of what you have and how it should look at the end?</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14885</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14885</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Thu, 24 Mar 2016 17:05:26 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Thu, 24 Mar 2016 02:12:07 GMT]]></title><description><![CDATA[<p dir="auto">Thanks Claudia<br />
The emails come singularly each time a registrant completes an online form. The body of the message is formatted as below:<br />
field1,data1<br />
field2,data2<br />
etc.</p>
<p dir="auto">I have the ability to select multiple msgs and export them as a csv file. The problem is that all the header info is also exported for each message. That places 6 lines of text before the data. The only way I’ve found to collect just the data is to highlight and delete the header information for each message, leaving the sets of data stacked on one another in columns as above.</p>
<p dir="auto">I was just hoping for a method of parsing for field names and extracting the data to spreadsheet form.</p>
<p dir="auto">Any suggestions appreciated<br />
Ray</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14868</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14868</guid><dc:creator><![CDATA[grubbyjeans]]></dc:creator><pubDate>Thu, 24 Mar 2016 02:12:07 GMT</pubDate></item><item><title><![CDATA[Reply to Need to extract csv from text file on Wed, 23 Mar 2016 20:05:57 GMT]]></title><description><![CDATA[<p dir="auto">Hello <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/4698">@grubbyjeans</a>,</p>
<p dir="auto">I think so. If you have a csv export I assume that it has a fixed length<br />
meaning the number of commas per line is always the same.<br />
Then you could use find/replace dialog (regular expressions) to get the<br />
data you want.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/14862</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/14862</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Wed, 23 Mar 2016 20:05:57 GMT</pubDate></item></channel></rss>