Community
    • Login

    Convert multiline data to CSV - Regex not working

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regexmultilinecsv
    9 Posts 5 Posters 984 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Schmill
      last edited by Schmill

      Hi,

      I thought that this would be straight forward, but Notepad++ seems to not like something that I am doing, so I am looking for suggestions.
      Assume I have the following simple file:

      value one
      value two
      value three
      value four
      

      I want to convert this into a single line of csv, e.g.:

      "value one","value two","value three","value four"
      

      I’ve set my replace dialog as:
      Find what: ^(.+)\R
      Replace with: “\1”,
      Wrap around is ON, and Search Mode is Regex

      I do a “Replace All” and I get:

      "value one",value two
      "value three",value four
      

      I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).

      I can get around this in more steps if I use the regex to add the quotes and complete each line with a comma, and then run a new find and replace using “extended” mode and just replace all “\r\n” with nothing, but it feels like I should be able to do this all in one step with regex?

      Terry RT CoisesC 2 Replies Last reply Reply Quote 1
      • Terry RT
        Terry R @Schmill
        last edited by Terry R

        @Schmill

        You are on the right track. I think the idea you haven’t yet grasped is that regex will do whatever it is asked to do as a single step iteration. So just build that single step iteration with more subexpressions (your initial Find What can be regarded as a subexpression in the new regex). Then for each set of brackets you would use a new group number, so \1, \2 and \3 if doing 4 lines.

        Terry

        PS, try Find What:^(.+)\R(.+)\R(.+)\R(.+)\R?
        Replace With:"\1","\2","\3","\4"\r\n

        Sorry, I adjusted my solution a couple of times as didn’t fully read your regex replacement.

        You may still need to adjust based on your real data, but this would be a good starting point.

        PPS I changed the word step for iteration, I couldn’t think of the more correct word at the time and step certainly isn’t a good simile.

        S 1 Reply Last reply Reply Quote 2
        • S
          Schmill @Terry R
          last edited by Schmill

          @Terry-R When you say “as a single step” you mean per input line?
          My input file has an unknown number of line (strictly speaking “variable” number of lines, but either way, without counting them and then adjusting the regex per file I wouldn’t know).

          I can see that a pattern like this would work for 4 lines:
          Find:
          (.+)\R(.+)\R(.+)\R(.+)\R
          Replace:
          “\1”,“\2”,“\3”,“\4”,

          but that seems excessively repetitive, and could be massive PITA if I had 100+ lines (in this case doing my workaround of regex first, then normal find&replace would be much simpler)…
          I wondered if something like…
          Find:
          {(.+)\R}*
          …might work (allow the single matching pattern to repeat multiple times), but then I have no idea how I would dereference that in the “Replace with” section!

          As above, I have 2-step workaround, but it is frustrating me that I can’t seem to get a regex to do a one step conversion from multiline data to CSV.

          Terry RT 2 Replies Last reply Reply Quote 1
          • Terry RT
            Terry R @Schmill
            last edited by Terry R

            @Schmill said in Convert multiline data to CSV - Regex not working:

            allow the single matching pattern to repeat multiple times

            That’s why it looks repetitive as you can’t reference the 4 groups separately using your proposal.

            My “single step iteration” doesn’t refer to lines processed, rather the process the regular expression takes to process the regex provided. It works through the expression and when at the end it has completed a “single step iteration”. If there is still data in the file being processed and you have asked it to process more line (if available) it will repeat that “single step iteration”.

            Terry

            PS I subsequently changed the word step for iteration as being more correct

            1 Reply Last reply Reply Quote 2
            • CoisesC
              Coises @Schmill
              last edited by

              @Schmill said in Convert multiline data to CSV - Regex not working:

              I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).

              It’s not that (the new) line 1 is done; it’s that the beginning of the old line 2 is no longer the beginning of line, so the ^ doesn’t match.

              Try:

              Find what : (\R)?(.+)
              Replace with : (?1,)"$2"
              Search Mode: Regular expression
              . matches newline : NOT CHECKED

              then put the caret at the beginning of the file and/or be sure Wrap around is checked, and click Replace All.

              S 1 Reply Last reply Reply Quote 2
              • Terry RT
                Terry R @Schmill
                last edited by

                @Schmill said in Convert multiline data to CSV - Regex not working:

                could be massive PITA if I had 100+ lines

                Well, if a CSV file contained 100 fields in each line (so your file starts with 1 field on each line) another method would be required. In that case I’d likely tag the 100th (200th, 300th etc) newline with 2 $ characters (must use something that doesn’t exist in the data). Then use something like ^(.+)\R(?!$$). So by tagging the 100th newline the regex will look ahead at each field replacement (much as you originally had) but if a particular newline was followed by the 2 $ then it would perform a slightly different replacement to keep the newline (and remove the $$ at the same time).

                Terry

                1 Reply Last reply Reply Quote 2
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  The new version 6 of JsonTools can do this!

                  I’d start with opening the treeview in REGEX mode on your document, then running the query @ = s_slice(stringify(s_split(@, `\r\n?|\n`)), 1:-1).

                  This will convert

                  value "with internal quotes"
                  value	with tab inside
                  value three
                  value four
                  

                  to the JSON array (sans leading and closing squarebraces)

                  "value \"with internal quotes\"","value\twith tab inside","value three","value four"
                  

                  As you can see, JsonTools automatically handles corner cases like literal quote characters and tabs.

                  Another neat thing about JsonTools (actually one of the features that I use the most): you can use the plugin command Select every valid JSON in selection followed by Dump JSON string(s) as raw text to “unescape” a bunch of JSON strings. Doing this on the output of the previous step would regenerate the original document.

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @schmill, @terry-R,@coises,@mark-olson and All,

                    I tried to generalize the problem, in order to get a complete CSV table. Note that the future rows of this CSV table must be separated by, at least, one blank line !

                    So, starting with the INPUT text below, pasted in a new tab :

                    value 1 Row 1
                    value 2 Row 1
                    value 3 Row 1
                    value 4 Row 1
                    value 5 Row 1
                    
                    value 1 Row 2
                    value 2 Row 2
                    value 3 Row 2
                    value 4 Row 2
                    value 5 Row 2
                    
                    
                    
                    
                    value 1 Row 3
                    value 2 Row 3
                    value 3 Row 3
                    value 4 Row 3
                    value 5 Row 3
                    

                    After using the following regex S/R :

                    SEARCH (?-s)(.+)|(?<=.)(\R)(?=.)

                    REPLACE ?1"\1"?2,

                    You should get this temporary text :

                    "value 1 Row 1","value 2 Row 1","value 3 Row 1","value 4 Row 1","value 5 Row 1"
                    
                    "value 1 Row 2","value 2 Row 2","value 3 Row 2","value 4 Row 2","value 5 Row 2"
                    
                    
                    
                    
                    "value 1 Row 3","value 2 Row 3","value 3 Row 3","value 4 Row 3","value 5 Row 3"
                    

                    And, after this second and trivial regex S/R, below :

                    SEARCH ^\R+

                    REPLACE Leave EMPTY

                    You’re left with your expected CSV table :

                    "value 1 Row 1","value 2 Row 1","value 3 Row 1","value 4 Row 1","value 5 Row 1"
                    "value 1 Row 2","value 2 Row 2","value 3 Row 2","value 4 Row 2","value 5 Row 2"
                    "value 1 Row 3","value 2 Row 3","value 3 Row 3","value 4 Row 3","value 5 Row 3"
                    

                    Best Regards,

                    guy038

                    P.S. :

                    As the two alternatives of the first search regex are mutually exclusive, we do NOT even need to surround the conditional replacements with parentheses !!

                    1 Reply Last reply Reply Quote 1
                    • S
                      Schmill @Coises
                      last edited by

                      Thank you to everyone for their input, lots of things to try and I’ll certainly be taking a look at the Json plugin as well.

                      Coises comment here in particular kicked me into the right way of thinking…

                      @Coises said in Convert multiline data to CSV - Regex not working:

                      @Schmill said in Convert multiline data to CSV - Regex not working:

                      I’m assuming that the Regex is evaluating the first line, running the replace (which now places line 2 as an append to line 1), but now considers that line 1 is “done”, and moves onto line 2 (which is the old line 3 before the replace joined lines 1 and 2).

                      It’s not that (the new) line 1 is done; it’s that the beginning of the old line 2 is no longer the beginning of line, so the ^ doesn’t match.

                      Changing my “Find What” to be:
                      (.+)\R
                      and my replace to be:
                      “\1”,

                      Means that everything is converted as expected - I had overcooked my regex!

                      Many thanks again all!

                      1 Reply Last reply Reply Quote 3
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors