Community
    • Login

    I have a large comma delimited database that I want to process into a csv file for excel

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Craig BiddleC
      Craig Biddle
      last edited by

      How can I insert an end of line character after every n data fields? Right now I have ]'[ as a record delineator, but that doesn’t work in notepad++.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Craig Biddle
        last edited by PeterJones

        @Craig-Biddle

        I am not sure what you meant by ]’[ when you were talking about comma-separated. I’ll just ignore that, and interpret your question as “how can I split a comma-separated file so that it replaces every Nth comma with a CRLF newline sequence”

        If you have

        one,two,three,four,five,six,seven,eight,nine,ten,eleven
        

        FIND = ^(([^,\r\n]*,){4}[^,\r\n]*),
        REPLACE = $1\r\n
        SEARCH MODE = regular expression

        (change the 4 in the FIND expression to whatever n-1 value you need)

        will turn it into

        one,two,three,four,five
        six,seven,eight,nine,ten
        eleven
        

        … which I believe is what you want. If it’s not, you’ll have to be more specific, and follow the advice in italics below.

        (This assumes your comma-delimted database doesn’t have confusing things like commas embedded in the data inside quotes, or newlines inside a field, either of which are allowed in CSV files. If you’ve got that, let us know, and someone better than I might be able to improve my simplistic solution.)

        ----

        Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

        see also: FAQ: Formatting Forum Posts

        1 Reply Last reply Reply Quote 3
        • Craig BiddleC
          Craig Biddle
          last edited by

          Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter. I was advised to replace the ],[ with \n, but that doesn’t seem to work. From my limited knowledge of programming, I though that would work and am at a loss as to what to do next.

          I think I understand what you are telling me, but I want to learn more about what I am doing here. Is there a listing somewhere of the * commands where I can learn what the various characters in place of the * do? And again, from my limited knowledge of programming, I think * is used for comments.

          I think the issue is that I need to have a comma before the \n from your parenthetical note. Right now the \n is just appended to the last character of the data field. I tried embedding an asc(10) text string, but I think I am just showing my age in thinking that would have worked. :)

          Terry RT PeterJonesP 2 Replies Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by Terry R

            @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

            Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter.

            A very unusual delimiter. And the reason you are having a problem is the ] and [ are what are called meta-characters. They have an additional meaning in regular expressions. However all is not lost, by using the regex delimiter \ you can imply the actual character meaning.

            Try \\]’\\[ and see if you can select and then replace with \r\n. Note that although the ] is also a meta-character it’s not required to have a \ in front of it in this regex, however it likely would in different circumstances. I have included it for conformity.

            Terry

            1 Reply Last reply Reply Quote 1
            • Terry RT
              Terry R @Craig Biddle
              last edited by Terry R

              @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

              Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter.

              I noticed after posting that the delimiter appears to have changed. Previously it was an apostrophe, now it’s a comma. You will need to adjust my solution based on what you really have, but the ] and [ need to be escaped by the \ in front of each character.

              Terry

              1 Reply Last reply Reply Quote 1
              • PeterJonesP
                PeterJones @Craig Biddle
                last edited by PeterJones

                @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

                Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter. I was advised to replace the ],[ with \n,

                As @Terry-R said, you will have to escape it, with \\],\\[. But because you have a separate delimiter for EOL than for the normal just-comma field delimiter, you don’t need an expression as complicated as the one I originally provided. Just using FIND = \\],\\[ , REPLACE = \r\n, SEARCH MODE = regular expression should be suffient.

                I think I understand what you are telling me, but I want to learn more about what I am doing here. Is there a listing somewhere of the * commands where I can learn what the various characters in place of the * do?

                Yes. It’s in the NPP Searching / Regex docs that I linked you to.

                And again, from my limited knowledge of programming, I think * is used for comments.

                Not in regular expression; in the Notepad++ flavor of regular expressions, * means “one zero⁰ or more of the preceding token”. (And I think you’re confusing it with C-style comments, where /* starts a comment and */ ends a comment.)

                I think the issue is that I need to have a comma before the \n from your parenthetical note. Right now the \n is just appended to the last character of the data field.

                In your first post, you said you wanted to replace the Nth comma with a newline. Now you’re saying you want to leave a comma there and then add a newline. Which is it?

                This is why the advice, which I already posted, said you should always give before and after data, so that we can understand what you want.

                Right now, I believe you have data which looks like:

                one,two,three,four,five],[six,seven,eight,nine,ten],[eleven
                

                and I believe you want it in one of the forms below

                one,two,three,four,five
                six,seven,eight,nine,ten
                eleven
                

                or

                one,two,three,four,five,
                six,seven,eight,nine,ten,
                eleven
                

                or

                one,two,three,four,five],[
                six,seven,eight,nine,ten],[
                eleven
                

                … but your posts have not made it easy to figure out which.

                The regex I included in this post will give you the first of the three output options. If you want something different, you will have to be more clear, and actually follow the advice that you’ve been given in how to effectively ask a search-and-replace question…

                ----
                ⁰: edit: fixed typo

                Alan KilbornA 1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @PeterJones
                  last edited by

                  I think * is used for comments

                  This construct is used for comments within a regex:

                  (?# … )

                  So, for example:

                  (?# I am a comment )

                  1 Reply Last reply Reply Quote 2
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors