Community
    • Login

    Search for a wrong comma in csv file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 1.9k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Torsten FroeseT Offline
      Torsten Froese
      last edited by

      Hello together,
      i have a lot of very large csv files, which i will import to an Historian system.
      The seperator of the csv files is a comma. here an Example:
      LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
      LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
      The bold text are the historized values.

      In some the historized value is text, which contains a comma:
      LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192

      In this case the import will interupted and i must search the text with the wrong comma.

      It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines)

      Is it possible to search for lines with more than 6 comma?
      I can’t use wildcard in the search like * in notepad ++

      Has anyone a solution for my question?

      Regards
      Torsten

      EkopalypseE Alan KilbornA 3 Replies Last reply Reply Quote 0
      • EkopalypseE Offline
        Ekopalypse @Torsten Froese
        last edited by Ekopalypse

        @torsten-froese

        using regular expression and something like (.+,){7}
        might do the job.
        This matches lines like

        a,b,c,d,e,f,g,h
        

        but NOT lines like

        a,b,c,d,e,f,,h
        a,,c,d,e,f,g,h
        
        Torsten FroeseT 1 Reply Last reply Reply Quote 3
        • Torsten FroeseT Offline
          Torsten Froese @Ekopalypse
          last edited by

          @ekopalypse said in Search for a wrong comma in csv file:

          (.+,){7}

          Hi,
          i tested it, and it works fine.
          Thanks for your support.

          Regards
          Torsten

          1 Reply Last reply Reply Quote 1
          • EkopalypseE Offline
            Ekopalypse @Torsten Froese
            last edited by Ekopalypse

            @torsten-froese

            Come to think of it, this regex (.*?,){7} should also work and has the advantage that it also fits lines like a,,c,d,e,f,g,h.

            1 Reply Last reply Reply Quote 3
            • Alan KilbornA Offline
              Alan Kilborn
              last edited by

              It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for:

              Find: (?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
              Search mode: Regular expression

              1 Reply Last reply Reply Quote 3
              • guy038G Offline
                guy038
                last edited by guy038

                Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All,

                Let’s consider a regular CSV file, will all records containing the same number of commas chars

                For instance, the @torsten-froese’s example contains exactly 6 commas, so 7 fields

                Regarding this example, I suppose that it would be better to :

                • Firstly, Mark lines matching the regex :

                MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$

                Because all these lines, with the template, below, are correct :

                Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma

                • Secondly, perform the operation Search > Bookmark > Inverse Bookmarks

                So, we could easily find out any line which do no have the same template, as above !


                Indeed, your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !

                You need to perform an additional search :

                MARK (?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$

                In that case, all wrong cases are detected !

                Test it against this text, below, where :

                • The first line contains 6 commas, (OK)
                • The second line contains 4 commas (KO)
                • The third line contains 6 commas, (OK)
                • The fourth line contains 8 commas (KO)
                LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
                LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192
                LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
                LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
                

                So, we could generalize and say :

                Let’s suppose a regular CSV file, containing n one-char separators ( , ; : or else ) and n+1 fields

                • To search all the lines which do not contain exactly n separators :

                • Open the Mark dialog ( Ctrl + M )

                  • MARK (?-s)^(?:((?:(?!(SEP)).)*?)(?2)){n}(?1)$

                  • Untick all options

                  • Tick the Bookmark line, Purge fore each search and Wrap around options

                  • Select the Regular expression search mode

                  • Click on the Mark All button

                  • Close the Mark dialog ( Esc )

                • Use the Search > Bookmark > Inverse Bookmarks option

                • Use the F2 and/or Maj + F2 shortcuts to navigate between all the remaining lines

                • Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly n separators for all the records

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 3
                • Alan KilbornA Offline
                  Alan Kilborn @Torsten Froese
                  last edited by

                  @guy038 said in Search for a wrong comma in csv file:

                  your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !

                  Yes (but actually it is your – generic – solution I used!)

                  I just try to give them what they ask for:

                  Is it possible to search for lines with more than 6 comma?

                  And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though).

                  1 Reply Last reply Reply Quote 2

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors