Search for a wrong comma in csv file
- 
 Hello together, 
 i have a lot of very large csv files, which i will import to an Historian system.
 The seperator of the csv files is a comma. here an Example:
 LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
 LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
 The bold text are the historized values.In some the historized value is text, which contains a comma: 
 LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192In this case the import will interupted and i must search the text with the wrong comma. It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines) Is it possible to search for lines with more than 6 comma? 
 I can’t use wildcard in the search like * in notepad ++Has anyone a solution for my question? Regards 
 Torsten
- 
 using regular expression and something like (.+,){7}
 might do the job.
 This matches lines likea,b,c,d,e,f,g,hbut NOT lines like a,b,c,d,e,f,,h a,,c,d,e,f,g,h
- 
 @ekopalypse said in Search for a wrong comma in csv file: (.+,){7} Hi, 
 i tested it, and it works fine.
 Thanks for your support.Regards 
 Torsten
- 
 Come to think of it, this regex (.*?,){7}should also work and has the advantage that it also fits lines likea,,c,d,e,f,g,h.
- 
 It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for: Find: (?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
 Search mode: Regular expression
- 
 Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All, Let’s consider a regular CSVfile, will all records containing the same number ofcommascharsFor instance, the @torsten-froese’s example contains exactly 6commas, so7fieldsRegarding this example, I suppose that it would be better to : - Firstly, Mark lines matching the regex :
 MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$Because all these lines, with the template, below, are correct : Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma- Secondly, perform the operation Search > Bookmark > Inverse Bookmarks
 So, we could easily find out any line which do no have the same template, as above ! 
 Indeed, your solution, Alan, works only for more than 6comma chars, found in text, but it misses text containing less than6commas !You need to perform an additional search : MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$In that case, all wrong cases are detected ! Test it against this text, below, where : - The first line contains 6commas, (OK)
- The second line contains 4commas (KO)
- The third line contains 6commas, (OK)
- The fourth line contains 8commas (KO)
 LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192 LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192 LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192 LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
 So, we could generalize and say : Let’s suppose a regular CSVfile, containingnone-char separators (,;:or else ) andn+1fields- 
To search all the lines which do not contain exactly nseparators :
- 
Open the Mark dialog ( Ctrl + M)- 
MARK (?-s)^(?:((?:(?!(SEP)).)*?)(?2)){n}(?1)$
- 
Untick all options 
- 
Tick the Bookmark line,Purge fore each searchandWrap aroundoptions
- 
Select the Regular expressionsearch mode
- 
Click on the Mark Allbutton
- 
Close the Mark dialog ( Esc)
 
- 
- 
Use the Search > Bookmark > Inverse Bookmarksoption
- 
Use the F2and/orMaj + F2shortcuts to navigate between all the remaining lines
- 
Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly nseparators for all the records
 Best Regards, guy038 
- 
 @guy038 said in Search for a wrong comma in csv file: your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas ! Yes (but actually it is your – generic – solution I used!) I just try to give them what they ask for: Is it possible to search for lines with more than 6 comma? And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though). 


