Community
    • Login

    possible to delete almost duplicate lines?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 41 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • tenchyUKT Offline
      tenchyUK
      last edited by

      I use the line operations to delete duplicate lines in a comma delimited text file. But I get left with a lot of this almost duplicate lines, where I want to only keep the longest line.
      Is this possible easily enough?
      The shorter lines have double comma at the end, in case not immediately visible. Longer has (usually) 2 chars between those commas
      example:
      I just want to keep the 2nd line
      G7ODA,IO93WS,
      G7ODA,IO93WS,PE,

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP Offline
        PeterJones @tenchyUK
        last edited by

        @tenchyUK,

        Does order of the lines matter in the final results?
        Can there ever be 3 or more lines that you want to compress into one (ie, could there ever be three or more of the G7ODA lines, or will it always only be a single short and a single long?)

        Assuming order doesn’t matter, assuming never more than a pair of almost-duplicate lines:

        P01AZ,IO55WS,XY,
        P01AZ,IO55WS,,
        G7ODA,IO93WS,
        G7ODA,IO93WS,PE,
        
        1. Edit > Line Operations > Sort Lines Lexicographically Ascending
        2. Search > Replace
          FIND WHAT = ^(.*?,.*?,),*\R\1
          REPLACE WITH = $1
          SEARCH MODE = regular expression
          REPLACE ALL

        End Result:

        G7ODA,IO93WS,PE,
        P01AZ,IO55WS,XY,
        

        If one or both of my assumptions are wrong, provide enough example data to counter my assumptions (use the </> button on the toolbar and put the text between the ``` lines it creates), showing both the original data, and how you want it to look at the end…

        (It’s possible to restore the order, by adding/removing numbers in extra steps… but that gets complicated, and I didn’t want to overwhelm you if the final order of data doesn’t matter. Similarly, the FIND WHAT regex can be made more complex to handle removing one-or-more short lines, but if your data is as simple as my example, then this should be sufficient.)

        tenchyUKT 1 Reply Last reply Reply Quote 0
        • tenchyUKT Offline
          tenchyUK @PeterJones
          last edited by

          @PeterJones

          Hi Peter,
          No there is only ever the 2 forms of the lines. I usually applut a lex sort then remove duplicate lines.
          So I would end up with:

          G7ODA,IO93WS,
          G7ODA,IO93WS,PE,
          P01AZ,IO55WS,
          P01AZ,IO55WS,XY,

          I can sort again after as that takes split second.

          Thanks for the suggestion, I shall try that.

          1 Reply Last reply Reply Quote 0

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors