Search for a wrong comma in csv file
- 
Hello together,
i have a lot of very large csv files, which i will import to an Historian system.
The seperator of the csv files is a comma. here an Example:
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
The bold text are the historized values.In some the historized value is text, which contains a comma:
LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192In this case the import will interupted and i must search the text with the wrong comma.
It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines)
Is it possible to search for lines with more than 6 comma?
I can’t use wildcard in the search like * in notepad ++Has anyone a solution for my question?
Regards
Torsten - 
using regular expression and something like
(.+,){7}
might do the job.
This matches lines likea,b,c,d,e,f,g,hbut NOT lines like
a,b,c,d,e,f,,h a,,c,d,e,f,g,h - 
@ekopalypse said in Search for a wrong comma in csv file:
(.+,){7}
Hi,
i tested it, and it works fine.
Thanks for your support.Regards
Torsten - 
Come to think of it, this regex
(.*?,){7}should also work and has the advantage that it also fits lines likea,,c,d,e,f,g,h. - 
It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for:
Find:
(?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
Search mode: Regular expression - 
Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All,
Let’s consider a regular
CSVfile, will all records containing the same number ofcommascharsFor instance, the @torsten-froese’s example contains exactly
6commas, so7fieldsRegarding this example, I suppose that it would be better to :
- Firstly, Mark lines matching the regex :
 
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$Because all these lines, with the template, below, are correct :
Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma- Secondly, perform the operation 
Search > Bookmark > Inverse Bookmarks 
So, we could easily find out any line which do no have the same template, as above !
Indeed, your solution, Alan, works only for more than
6comma chars, found in text, but it misses text containing less than6commas !You need to perform an additional search :
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$In that case, all wrong cases are detected !
Test it against this text, below, where :
- The first line contains 
6commas, (OK) - The second line contains 
4commas (KO) - The third line contains 
6commas, (OK) - The fourth line contains 
8commas (KO) 
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192 LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192 LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192 LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
So, we could generalize and say :
Let’s suppose a regular
CSVfile, containingnone-char separators (,;:or else ) andn+1fields- 
To search all the lines which do not contain exactly
nseparators : - 
Open the Mark dialog (
Ctrl + M)- 
MARK
(?-s)^(?:((?:(?!(SEP)).)*?)(?2)){n}(?1)$ - 
Untick all options
 - 
Tick the
Bookmark line,Purge fore each searchandWrap aroundoptions - 
Select the
Regular expressionsearch mode - 
Click on the
Mark Allbutton - 
Close the Mark dialog (
Esc) 
 - 
 - 
Use the
Search > Bookmark > Inverse Bookmarksoption - 
Use the
F2and/orMaj + F2shortcuts to navigate between all the remaining lines - 
Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly
nseparators for all the records 
Best Regards,
guy038
 - 
@guy038 said in Search for a wrong comma in csv file:
your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !
Yes (but actually it is your – generic – solution I used!)
I just try to give them what they ask for:
Is it possible to search for lines with more than 6 comma?
And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though).