Community
    • Login

    regex to conditionally delete unnecessary lines in .csv files?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Brandon TibbittsB
      Brandon Tibbitts
      last edited by

      Can I use regex expression and replace function to delete lines that have 0 in the fourth data field of a .csv file?

      Before
      20180829014727,40.393018,-117.974365,14.54421,163.1
      20180829014740,40.590137,-117.952637,12.82776,162.3
      20180829014754,40.187256,-117.505029,13.20208,162.5
      20180829014832,40.372559,-117.538574,41.76501,162.5
      20180829014836,40.584375,-117.960938,0,162.8
      20180829014840,40.152100,-117.909180,0,162.7
      20180829014845,40.346338,-117.683301,0,162.7

      After
      20180829014727,40.393018,-117.974365,14.54421,163.1
      20180829014740,40.590137,-117.952637,12.82776,162.3
      20180829014754,40.187256,-117.505029,13.20208,162.5
      20180829014832,40.372559,-117.538574,41.76501,162.5

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by

        @Brandon-Tibbitts
        Certainly and quite simply. The following regex will filter out those lines.
        Find What:^([^,]+,){3}0.+\R*
        Replace With:empty field <—nothing in this field.

        Have search mode as “regular expression” and wraparound ticked.
        As a brief explanation:
        [^,] means any character that is NOT a comma
        ([^,]+,) means any number of non-commas followed by a comma.
        {3} means three sets of the expression preceding (inside brackets).
        0.+\R* means a zero followed by as many characters as possible followed by the carriage return line if it exists (this caters for the last line).

        As the replace with field is empty is just removes that line. When complete check the very last line as it will likely be blank, so you may need to remove that.

        Terry

        1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R
          last edited by

          @Brandon-Tibbitts
          I just realised that although my regex works well on your example there is an instance where it won’t. If the 4th field contains something like 0.76 then my regex will still pick it. The correct regex is:
          ^([^,]+,){3}0,.+\R*
          Note the addition of a comma after the zero.

          Sorry about that.

          Terry

          1 Reply Last reply Reply Quote 3
          • Brandon TibbittsB
            Brandon Tibbitts
            last edited by

            Thank you Terry,

            That helps me out a ton.

            Is there also a way to do arithmetic on that same field(after the zero values/lines removed), so for example if I wanted to add to or multiply the fourth field value in every line across a file

            I suspect the first part of the regex would be the same ^([^,]+,){3}0,. but rather with a filter to recognize any value in that field instead of the 0, then a replace function similar but to replace it with a modified value

            Or should I post a new topic to cover that one separately?

            Brandon

            Scott SumnerS 1 Reply Last reply Reply Quote 0
            • Scott SumnerS
              Scott Sumner @Brandon Tibbitts
              last edited by

              @Brandon-Tibbitts

              You can’t do arithmetic in this (regex) manner. You’re going to need a programming language for that, or perhaps, since you appear to have .csv files, pulling the data into Excel to do the math would work. Good luck.

              1 Reply Last reply Reply Quote 3
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors