Compare two files and Remove Duplicates from One



  • I have millions of rows of emails data I want to compare files No 1 with No 2 or 3 and remove duplicates from file No 1 which present in file No 2 and No 3.



  • @Husnain-Raza ,

    That sounds like a programming task to me.

    Anytime your problem statement is “look in file x to decide how to edit file y”, you have gone beyond the native skillset of text editors.

    If it were a small number of lines, the experts here would use a trick of copying some of the data from one file to another, and then use some super-fancy regex to remove the duplicates from N2 based on the data from N1. But since you’ve invoked “millions of rows” already, that might give you memory problems, depending on exactly how the regex capture groups are defined while doing that replacement. It wouldn’t surprise me if one of those experts jumped in and provided that solution, or linked you to a previous implementation in this forum. But my guess is that it might have problems with your “millions of rows”.

    That said, there is a Notepad++ plugin PythonScript (and similar LuaScript Plugin or jN Notepad++ Plugin, and my external Perl module) which allows you to automate things inside Notepad++ using Python (or Lua or JavaScript or Perl). But really, at the point that you invoke one of those for “millions of lines”, the Notepad++-specific nature of those plugins actually gets in your way and slows you down. It is easier (and faster) to do those edits just running the Python/Lua/JavaScript/Perl natively (ie, at the command line) and using the programming language’s file IO functions rather than using the language to drive Notepad++ to load and edit the file and write it back out.

    This is not a programming forum, and definitely not a free code-writing service, so it is beyond the scope of this Notepad++ forum for us to write a script to do that.


Log in to reply