Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Compare two files and Remove Duplicates from One

    Help wanted · · · – – – · · ·
    2
    2
    682
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Husnain Raza
      Husnain Raza last edited by

      I have millions of rows of emails data I want to compare files No 1 with No 2 or 3 and remove duplicates from file No 1 which present in file No 2 and No 3.

      PeterJones 1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones @Husnain Raza last edited by

        @Husnain-Raza ,

        That sounds like a programming task to me.

        Anytime your problem statement is “look in file x to decide how to edit file y”, you have gone beyond the native skillset of text editors.

        If it were a small number of lines, the experts here would use a trick of copying some of the data from one file to another, and then use some super-fancy regex to remove the duplicates from N2 based on the data from N1. But since you’ve invoked “millions of rows” already, that might give you memory problems, depending on exactly how the regex capture groups are defined while doing that replacement. It wouldn’t surprise me if one of those experts jumped in and provided that solution, or linked you to a previous implementation in this forum. But my guess is that it might have problems with your “millions of rows”.

        That said, there is a Notepad++ plugin PythonScript (and similar LuaScript Plugin or jN Notepad++ Plugin, and my external Perl module) which allows you to automate things inside Notepad++ using Python (or Lua or JavaScript or Perl). But really, at the point that you invoke one of those for “millions of lines”, the Notepad++-specific nature of those plugins actually gets in your way and slows you down. It is easier (and faster) to do those edits just running the Python/Lua/JavaScript/Perl natively (ie, at the command line) and using the programming language’s file IO functions rather than using the language to drive Notepad++ to load and edit the file and write it back out.

        This is not a programming forum, and definitely not a free code-writing service, so it is beyond the scope of this Notepad++ forum for us to write a script to do that.

        1 Reply Last reply Reply Quote 2
        • First post
          Last post
        Copyright © 2014 NodeBB Forums | Contributors