Convert multiline data to CSV - Regex not working
-
Hi,
I thought that this would be straight forward, but Notepad++ seems to not like something that I am doing, so I am looking for suggestions.
Assume I have the following simple file:value one value two value three value four
I want to convert this into a single line of csv, e.g.:
"value one","value two","value three","value four"
I’ve set my replace dialog as:
Find what: ^(.+)\R
Replace with: “\1”,
Wrap around is ON, and Search Mode is RegexI do a “Replace All” and I get:
"value one",value two "value three",value four
I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).
I can get around this in more steps if I use the regex to add the quotes and complete each line with a comma, and then run a new find and replace using “extended” mode and just replace all “\r\n” with nothing, but it feels like I should be able to do this all in one step with regex?
-
You are on the right track. I think the idea you haven’t yet grasped is that regex will do whatever it is asked to do as a single
stepiteration. So just build that singlestepiteration with more subexpressions (your initial Find What can be regarded as a subexpression in the new regex). Then for each set of brackets you would use a new group number, so \1, \2 and \3 if doing 4 lines.Terry
PS, try Find What:
^(.+)\R(.+)\R(.+)\R(.+)\R?
Replace With:"\1","\2","\3","\4"\r\n
Sorry, I adjusted my solution a couple of times as didn’t fully read your regex replacement.
You may still need to adjust based on your real data, but this would be a good starting point.
PPS I changed the word step for iteration, I couldn’t think of the more correct word at the time and step certainly isn’t a good simile.
-
@Terry-R When you say “as a single step” you mean per input line?
My input file has an unknown number of line (strictly speaking “variable” number of lines, but either way, without counting them and then adjusting the regex per file I wouldn’t know).I can see that a pattern like this would work for 4 lines:
Find:
(.+)\R(.+)\R(.+)\R(.+)\R
Replace:
“\1”,“\2”,“\3”,“\4”,but that seems excessively repetitive, and could be massive PITA if I had 100+ lines (in this case doing my workaround of regex first, then normal find&replace would be much simpler)…
I wondered if something like…
Find:
{(.+)\R}*
…might work (allow the single matching pattern to repeat multiple times), but then I have no idea how I would dereference that in the “Replace with” section!As above, I have 2-step workaround, but it is frustrating me that I can’t seem to get a regex to do a one step conversion from multiline data to CSV.
-
@Schmill said in Convert multiline data to CSV - Regex not working:
allow the single matching pattern to repeat multiple times
That’s why it looks repetitive as you can’t reference the 4 groups separately using your proposal.
My “single
stepiteration” doesn’t refer to lines processed, rather the process the regular expression takes to process the regex provided. It works through the expression and when at the end it has completed a “singlestepiteration”. If there is still data in the file being processed and you have asked it to process more line (if available) it will repeat that “singlestepiteration”.Terry
PS I subsequently changed the word step for iteration as being more correct
-
@Schmill said in Convert multiline data to CSV - Regex not working:
I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).
It’s not that (the new) line 1 is done; it’s that the beginning of the old line 2 is no longer the beginning of line, so the
^
doesn’t match.Try:
Find what :
(\R)?(.+)
Replace with :(?1,)"$2"
Search Mode: Regular expression
. matches newline : NOT CHECKEDthen put the caret at the beginning of the file and/or be sure Wrap around is checked, and click Replace All.
-
@Schmill said in Convert multiline data to CSV - Regex not working:
could be massive PITA if I had 100+ lines
Well, if a CSV file contained 100 fields in each line (so your file starts with 1 field on each line) another method would be required. In that case I’d likely tag the 100th (200th, 300th etc) newline with 2 $ characters (must use something that doesn’t exist in the data). Then use something like
^(.+)\R(?!$$)
. So by tagging the 100th newline the regex will look ahead at each field replacement (much as you originally had) but if a particular newline was followed by the 2 $ then it would perform a slightly different replacement to keep the newline (and remove the $$ at the same time).Terry
-
The new version 6 of JsonTools can do this!
I’d start with opening the treeview in REGEX mode on your document, then running the query
@ = s_slice(stringify(s_split(@, `\r\n?|\n`)), 1:-1)
.This will convert
value "with internal quotes" value with tab inside value three value four
to the JSON array (sans leading and closing squarebraces)
"value \"with internal quotes\"","value\twith tab inside","value three","value four"
As you can see, JsonTools automatically handles corner cases like literal quote characters and tabs.
Another neat thing about JsonTools (actually one of the features that I use the most): you can use the plugin command
Select every valid JSON in selection
followed byDump JSON string(s) as raw text
to “unescape” a bunch of JSON strings. Doing this on the output of the previous step would regenerate the original document. -
Hello, @schmill, @terry-R,@coises,@mark-olson and All,
I tried to generalize the problem, in order to get a complete
CSV
table. Note that the future rows of thisCSV
table must be separated by, at least, one blank line !So, starting with the INPUT text below, pasted in a new tab :
value 1 Row 1 value 2 Row 1 value 3 Row 1 value 4 Row 1 value 5 Row 1 value 1 Row 2 value 2 Row 2 value 3 Row 2 value 4 Row 2 value 5 Row 2 value 1 Row 3 value 2 Row 3 value 3 Row 3 value 4 Row 3 value 5 Row 3
After using the following regex S/R :
SEARCH
(?-s)(.+)|(?<=.)(\R)(?=.)
REPLACE
?1"\1"?2,
You should get this temporary text :
"value 1 Row 1","value 2 Row 1","value 3 Row 1","value 4 Row 1","value 5 Row 1" "value 1 Row 2","value 2 Row 2","value 3 Row 2","value 4 Row 2","value 5 Row 2" "value 1 Row 3","value 2 Row 3","value 3 Row 3","value 4 Row 3","value 5 Row 3"
And, after this second and trivial regex S/R, below :
SEARCH
^\R+
REPLACE
Leave EMPTY
You’re left with your expected
CSV
table :"value 1 Row 1","value 2 Row 1","value 3 Row 1","value 4 Row 1","value 5 Row 1" "value 1 Row 2","value 2 Row 2","value 3 Row 2","value 4 Row 2","value 5 Row 2" "value 1 Row 3","value 2 Row 3","value 3 Row 3","value 4 Row 3","value 5 Row 3"
Best Regards,
guy038
P.S. :
As the two alternatives of the first search regex are mutually exclusive, we do NOT even need to surround the conditional replacements with parentheses !!
-
Thank you to everyone for their input, lots of things to try and I’ll certainly be taking a look at the Json plugin as well.
Coises comment here in particular kicked me into the right way of thinking…
@Coises said in Convert multiline data to CSV - Regex not working:
@Schmill said in Convert multiline data to CSV - Regex not working:
I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).
It’s not that (the new) line 1 is done; it’s that the beginning of the old line 2 is no longer the beginning of line, so the
^
doesn’t match.Changing my “Find What” to be:
(.+)\R
and my replace to be:
“\1”,Means that everything is converted as expected - I had overcooked my regex!
Many thanks again all!