I have a large comma delimited database that I want to process into a csv file for excel



  • How can I insert an end of line character after every n data fields? Right now I have ]’[ as a record delineator, but that doesn’t work in notepad++.



  • @Craig-Biddle

    I am not sure what you meant by ]’[ when you were talking about comma-separated. I’ll just ignore that, and interpret your question as “how can I split a comma-separated file so that it replaces every Nth comma with a CRLF newline sequence”

    If you have

    one,two,three,four,five,six,seven,eight,nine,ten,eleven
    

    FIND = ^(([^,\r\n]*,){4}[^,\r\n]*),
    REPLACE = $1\r\n
    SEARCH MODE = regular expression

    (change the 4 in the FIND expression to whatever n-1 value you need)

    will turn it into

    one,two,three,four,five
    six,seven,eight,nine,ten
    eleven
    

    … which I believe is what you want. If it’s not, you’ll have to be more specific, and follow the advice in italics below.

    (This assumes your comma-delimted database doesn’t have confusing things like commas embedded in the data inside quotes, or newlines inside a field, either of which are allowed in CSV files. If you’ve got that, let us know, and someone better than I might be able to improve my simplistic solution.)

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

    see also: FAQ: Formatting Forum Posts



  • Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter. I was advised to replace the ],[ with \n, but that doesn’t seem to work. From my limited knowledge of programming, I though that would work and am at a loss as to what to do next.

    I think I understand what you are telling me, but I want to learn more about what I am doing here. Is there a listing somewhere of the * commands where I can learn what the various characters in place of the * do? And again, from my limited knowledge of programming, I think * is used for comments.

    I think the issue is that I need to have a comma before the \n from your parenthetical note. Right now the \n is just appended to the last character of the data field. I tried embedding an asc(10) text string, but I think I am just showing my age in thinking that would have worked. :)



  • @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

    Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter.

    A very unusual delimiter. And the reason you are having a problem is the ] and [ are what are called meta-characters. They have an additional meaning in regular expressions. However all is not lost, by using the regex delimiter \ you can imply the actual character meaning.

    Try \]’\[ and see if you can select and then replace with \r\n. Note that although the ] is also a meta-character it’s not required to have a \ in front of it in this regex, however it likely would in different circumstances. I have included it for conformity.

    Terry



  • @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

    Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter.

    I noticed after posting that the delimiter appears to have changed. Previously it was an apostrophe, now it’s a comma. You will need to adjust my solution based on what you really have, but the ] and [ need to be escaped by the \ in front of each character.

    Terry



  • @Craig-Biddle said in I have a large comma delimited database that I want to process into a csv file for excel:

    Sorry, the comma delimited file uses the character string ],[ as an end of line delimiter. I was advised to replace the ],[ with \n,

    As @Terry-R said, you will have to escape it, with \],\[. But because you have a separate delimiter for EOL than for the normal just-comma field delimiter, you don’t need an expression as complicated as the one I originally provided. Just using FIND = \],\[ , REPLACE = \r\n, SEARCH MODE = regular expression should be suffient.

    I think I understand what you are telling me, but I want to learn more about what I am doing here. Is there a listing somewhere of the * commands where I can learn what the various characters in place of the * do?

    Yes. It’s in the NPP Searching / Regex docs that I linked you to.

    And again, from my limited knowledge of programming, I think * is used for comments.

    Not in regular expression; in the Notepad++ flavor of regular expressions, * means “one zero⁰ or more of the preceding token”. (And I think you’re confusing it with C-style comments, where /* starts a comment and */ ends a comment.)

    I think the issue is that I need to have a comma before the \n from your parenthetical note. Right now the \n is just appended to the last character of the data field.

    In your first post, you said you wanted to replace the Nth comma with a newline. Now you’re saying you want to leave a comma there and then add a newline. Which is it?

    This is why the advice, which I already posted, said you should always give before and after data, so that we can understand what you want.

    Right now, I believe you have data which looks like:

    one,two,three,four,five],[six,seven,eight,nine,ten],[eleven
    

    and I believe you want it in one of the forms below

    one,two,three,four,five
    six,seven,eight,nine,ten
    eleven
    

    or

    one,two,three,four,five,
    six,seven,eight,nine,ten,
    eleven
    

    or

    one,two,three,four,five],[
    six,seven,eight,nine,ten],[
    eleven
    

    … but your posts have not made it easy to figure out which.

    The regex I included in this post will give you the first of the three output options. If you want something different, you will have to be more clear, and actually follow the advice that you’ve been given in how to effectively ask a search-and-replace question…

    ----
    ⁰: edit: fixed typo



  • I think * is used for comments

    This construct is used for comments within a regex:

    (?#)

    So, for example:

    (?# I am a comment )


Log in to reply