Community
    • Login

    How to replace CR/LF within fields in a CSV file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 5 Posters 9.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones @Mark Yorkovich
      last edited by

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Mark Yorkovich
        last edited by PeterJones

        @Mark-Yorkovich,

        This is a classic use-case for Generic Regex: Replacing in a Specific Zone of Text, which can be found in FAQ Desk: Generic Regular Expression formulas in our FAQ section

        In your case, you want to search between comma-quote and quote (I used comma-quote as the start tag because that makes the regex less confusing rather than trying some other way to disambiguate a start-quote from end-quote), so the “parameters” of the formula are:

        • FR = \r\n
          • looking for CRLF
        • RR = ,
          • replace with comma
        • BSR = ,"
          • but only in regions that start with comma-quote
        • ESR = "
          • … and end with the next quote mark

        Put it all together in that formula to get

        • FIND WHAT = (?-si:,"|(?!\A)\G)(?s-i:(?!").)*?\K(?-si:\r\n)
        • REPLACE WITH = ,
        • REPLACE ALL (as explained in that sub-FAQ, single REPLACE won’t work because of the \K)

        ----

        Useful References

        • Please Read Before Posting
        • Template for Search/Replace Questions
        • Formatting Forum Posts
        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation

        –
        edits: added descriptions of each piece, and the references list

        Mark YorkovichM Alan KilbornA 2 Replies Last reply Reply Quote 3
        • Mark YorkovichM
          Mark Yorkovich @PeterJones
          last edited by

          @PeterJones said in How to replace CR/LF within fields in a CSV file:

          (?-si:,“|(?!\A)\G)(?s-i:(?!”).)*?\K(?-si:\r\n)

          Works like a charm, Peter.

          Thanks so much for your help!

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @PeterJones
            last edited by Alan Kilborn

            @PeterJones said in How to replace CR/LF within fields in a CSV file:

            I used comma-quote as the start tag…
            BSR = ,"

            What struck me about this was, what if the next person coming along with a similar problem has the issue in his CSV’s column “1” string data?

            In that case the solution would have to be revised – because there would be no , starting off each line; the first string field, being the first, would simply start with ".

            So let’s revise it right now; change the BSR to:

            BSR = (?:(?:^|,)")

            Then, some data that looks like this will also process:

            "1
            one","Smith","John","Loves to skateboard","12/3/98"
            "2
            two","Johnson","Mike","Eats worms","2/14/75"
            "3
            three
            little
            pigs","Howard","Sally","Walks along the beach
            Likes bike rides
            Loves to cook","9/7/67"
            "4
            four","Doe","Jane","Restores classic cars","7/4/74"
            

            into:

            "1,one","Smith","John","Loves to skateboard","12/3/98"
            "2,two","Johnson","Mike","Eats worms","2/14/75"
            "3,three,little,pigs","Howard","Sally","Walks along the beach,Likes bike rides,Loves to cook","9/7/67"
            "4,four","Doe","Jane","Restores classic cars","7/4/74"
            
            Lycan ThropeL 1 Reply Last reply Reply Quote 2
            • Lycan ThropeL
              Lycan Thrope @Alan Kilborn
              last edited by

              @Alan-Kilborn ,
              Actually, it could be worse than that, since CSV doesn’t require the use of quotes, except around strings, if the first or any field is a numeric with no quotes, it needs to test for a numeric prior to a ,.
              I wrote a parsing program in dBASE 1-2 years ago, that would take a file that was over a million records that were CSV, and wouldn’t you know, the numeric fields weren’t quoted, so every field had to be tested first, to see if it was numeric, and if not, process it accordingly.

              Just an FYI.

              PeterJonesP 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @Lycan Thrope
                last edited by PeterJones

                @Lycan-Thrope said in How to replace CR/LF within fields in a CSV file:

                Actually, it could be worse than that, since CSV doesn’t require the use of quotes, except around strings

                Any field that has an embedded newline in it is by definition a string, and requires quotes (because of the newline). Hence, for purposes of changing an embedded newline to something else, what he has shown is sufficient.

                And, despite what you said, strings are allowed to be non-quoted, as long as the don’t contain the separator character or newline sequences

                Lycan ThropeL 1 Reply Last reply Reply Quote 3
                • Lycan ThropeL
                  Lycan Thrope @PeterJones
                  last edited by Lycan Thrope

                  @PeterJones ,
                  True, but I noticed he was quoting numerics, and figured that might change the search aspect. Ignore if I misread that regex application. I’m still not comfortable playing with the \A and \G, so wasn’t reading what yours was doing, but I saw his using quoted numerics so maybe I misread his regex for it. Still not sure what I see, but trying. :( It was, however an example where I used a program instead of a regex to parse out the lines. ::shrug:: Of course, I was pulling out the fields and assigning them to another tables structure, so that may be the difference.
                  Feel free to disregard my idiocy. :)

                  Edit: Just went over my notes for that program, and it was a 1.7 million record file, and I was trying to convert it to a CSV format dBASE could use for importing because we needed to count characters to find the largest size of each field, so we could create the structure, and of course, fields were missing and the like. I hadn’t even started messing with regex back then, it’s 3 years now, so it wasn’t a tool in my box, back then. It might have made the job easier, in hindsight. :)

                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @Lycan Thrope
                    last edited by

                    @Lycan-Thrope said in How to replace CR/LF within fields in a CSV file:

                    he was quoting numerics

                    My intent was not to quote numeric data. I was “making up” data based on the OP’s original data, in order to show a string as the first CSV “column”, so that the overall idea I was presenting abot the regex used would work. Sorry if there was some confusion about it. This might have been better sample data:

                    "one
                    ONE","Smith","John","Loves to skateboard","12/3/98"
                    "two
                    TWO","Johnson","Mike","Eats worms","2/14/75"
                    

                    and so on.

                    Lycan ThropeL 1 Reply Last reply Reply Quote 1
                    • Lycan ThropeL
                      Lycan Thrope @Alan Kilborn
                      last edited by

                      @Alan-Kilborn ,
                      Talk about an epiphany moment. Most applications were choking on that big .csv file, which is why I was trying to write that program to fix the file without having to open and display it so we could convert the problem aspects and store it in a database.

                      I just did it with the current NPP, and it took a few seconds, but it opened it up completely, all 1.6m records in NPP. I could have used this capability 3 years ago. Maybe I could have had NPP find the field sizes so I could have known then how big to make the fields to hold the data and then import it. :)

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @mark-yorkovich, @peterjones, @alan-kilborn, @lycan-thrope and All,

                        Of course, the generic regex satisfies the @mark-yorkovich’s goal ! But if we try to think more deeply about the problem, the solution is almost obvious !

                        Mark said :

                        I have a CSV file with comma-separated fields, and double quotes for each field, except the first field which is an ID field.

                        Thus, as a valid field is always surrounded by double-quotes, this means that the line-breaks which need to be replaced with a literal , must be :

                        • Not immediately preceded with a double-quote character

                        AND

                        • Not immediately followed with a double-quote character

                        Hence, the following regex S/R :

                        SEARCH (?<!")\r\n(?!")

                        REPLACE ,

                        • Untick all box options of the Replace dialog

                        • Select the Regex expression search mode

                        • Move to very beginning of your CSV file

                        • Click once on the Replace All button ( or several times on the Replace button )

                        Best Regards,

                        guy038

                        P.S. :

                        • As explained in a recent post, do not use the simplified search syntax (?<!")\R(?!") !

                        • Of course, this method does not work if fields, near the line break, may not be followed / preceded with a " character !

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors