Community
    • Login

    Search for a wrong comma in csv file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Torsten FroeseT
      Torsten Froese
      last edited by

      Hello together,
      i have a lot of very large csv files, which i will import to an Historian system.
      The seperator of the csv files is a comma. here an Example:
      LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
      LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
      The bold text are the historized values.

      In some the historized value is text, which contains a comma:
      LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192

      In this case the import will interupted and i must search the text with the wrong comma.

      It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines)

      Is it possible to search for lines with more than 6 comma?
      I can’t use wildcard in the search like * in notepad ++

      Has anyone a solution for my question?

      Regards
      Torsten

      EkopalypseE Alan KilbornA 3 Replies Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse @Torsten Froese
        last edited by Ekopalypse

        @torsten-froese

        using regular expression and something like (.+,){7}
        might do the job.
        This matches lines like

        a,b,c,d,e,f,g,h
        

        but NOT lines like

        a,b,c,d,e,f,,h
        a,,c,d,e,f,g,h
        
        Torsten FroeseT 1 Reply Last reply Reply Quote 3
        • Torsten FroeseT
          Torsten Froese @Ekopalypse
          last edited by

          @ekopalypse said in Search for a wrong comma in csv file:

          (.+,){7}

          Hi,
          i tested it, and it works fine.
          Thanks for your support.

          Regards
          Torsten

          1 Reply Last reply Reply Quote 1
          • EkopalypseE
            Ekopalypse @Torsten Froese
            last edited by Ekopalypse

            @torsten-froese

            Come to think of it, this regex (.*?,){7} should also work and has the advantage that it also fits lines like a,,c,d,e,f,g,h.

            1 Reply Last reply Reply Quote 3
            • Alan KilbornA
              Alan Kilborn
              last edited by

              It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for:

              Find: (?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
              Search mode: Regular expression

              1 Reply Last reply Reply Quote 3
              • guy038G
                guy038
                last edited by guy038

                Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All,

                Let’s consider a regular CSV file, will all records containing the same number of commas chars

                For instance, the @torsten-froese’s example contains exactly 6 commas, so 7 fields

                Regarding this example, I suppose that it would be better to :

                • Firstly, Mark lines matching the regex :

                MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$

                Because all these lines, with the template, below, are correct :

                Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma

                • Secondly, perform the operation Search > Bookmark > Inverse Bookmarks

                So, we could easily find out any line which do no have the same template, as above !


                Indeed, your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !

                You need to perform an additional search :

                MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$

                In that case, all wrong cases are detected !

                Test it against this text, below, where :

                • The first line contains 6 commas, (OK)
                • The second line contains 4 commas (KO)
                • The third line contains 6 commas, (OK)
                • The fourth line contains 8 commas (KO)
                LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
                LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192
                LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
                LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
                

                So, we could generalize and say :

                Let’s suppose a regular CSV file, containing n one-char separators ( , ; : or else ) and n+1 fields

                • To search all the lines which do not contain exactly n separators :

                • Open the Mark dialog ( Ctrl + M )

                  • MARK (?-s)^(?:((?:(?!(SEP)).)*?)(?2)){n}(?1)$

                  • Untick all options

                  • Tick the Bookmark line, Purge fore each search and Wrap around options

                  • Select the Regular expression search mode

                  • Click on the Mark All button

                  • Close the Mark dialog ( Esc )

                • Use the Search > Bookmark > Inverse Bookmarks option

                • Use the F2 and/or Maj + F2 shortcuts to navigate between all the remaining lines

                • Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly n separators for all the records

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 3
                • Alan KilbornA
                  Alan Kilborn @Torsten Froese
                  last edited by

                  @guy038 said in Search for a wrong comma in csv file:

                  your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !

                  Yes (but actually it is your – generic – solution I used!)

                  I just try to give them what they ask for:

                  Is it possible to search for lines with more than 6 comma?

                  And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though).

                  1 Reply Last reply Reply Quote 2
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors