Is there a way to search for duplicate records in Notepad++?



  • Hi All,

    I currently and using Notepad++ to review user files before uploading into our provisioning system and I was curious if there was a way to search for duplicates (emails, UID’s, etc…) within Notepad++ or do I have to save the file and review it in excel to do this?

    All assistance is greatly appreciated.

    Best,
    Adam



  • Hi Adam,

    I don’t think there is a way in meaning of search. If you don’t think like use Smart Highlighting, CTRL+F3 or just search.

    But there’s a way to remove duplicates from simple list without excel (I use it a lot). So if you have simple list of values:

    value1
    value2
    value2
    value3
    value2
    

    you can simply get list of unique values:

    value1
    value2
    value3
    

    like this:

    1. You need plugin TextFX Characters
    2. Backup your current editing file !!!
    3. Set TextFX: Menu -> TextFX -> TextFX Tools:
      ✓ +Sort ascending
      ✓ +Sort outputs only UNIQUE (at column) lines
    4. Select text
    5. Use one of the actions: Menu -> TextFX -> TextFX Tools:
      a) Sort lines case sensitive (at column)
      b) Sort lines case insensitive (at column)
    6. Remember to DISABLE option +Sort outputs only UNIQUE (at column) lines, so you won’t lose data when just sorting later!

    Still it won’t work for some complex multi-column data, where only Excel filters/remove duplicates of specific data will help.


    Best regards,
    Tomas



  • Thanks really helped…:-)



  • [Adding my own answer, since this answer gets so many views and was the top result on google]
    There is no need to use a plugin.

    You can easily find duplicate lines with the following regex:
    ^([^\r\n]+)$(?=.*?^\1$)

    This will find the all occurrences of duplicate lines except the last, so you can also use search and replace to delete them.

    You can see it in action here: https://regex101.com/r/5GPJfz/1

    Just make sure that you activate the option “. finds \r and \n” in the search-dialogue.



  • @Matthias-Heim

    For me, I like this one to do the same thing:

    ^((?-s).+?)\R(?=(?s).*?^\1(?:\R|\z))

    It has (at least) two advantages:

    • You don’t have to care about the state of the . matches newline box

    • The last line of the file doesn’t have to have a line-ending on it to be considered in the duplicate decision (the text itself decides that) – whether it is truly a duplicate then is up for debate, but I think it is



  • Hello, @matthias-heim, @alan-kilborn and All,

    Alan, I don’t think that the lazy quantifier, at beginning of the regex is necessary, as, obviously, the EOL chars must be matched, anyway !

    Hence, the syntax :

    (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))


    However, @matthias-heim be aware that in case of an important amount of lines between current line scanned and its nearest duplicate, the regex may completely fail to detect correct matches :-((

    Best Regards

    guy038



  • @guy038 said in Is there a way to search for duplicate records in Notepad++?:

    (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))

    can you please tell me how to mark both lines (original+duplicate)?



  • Hello, @mohammed-asif and All,

    Before practically answering to your question, could you tell us some hints about your data :

    • Why do you want to mark all the duplicate lines ? Do you intend to delete them all or copy them for other process or else ?

    • How many lines, about, to processed and the average length of the lines ?

    • How many lines, max, about, between two duplicate lines ?

    May be, you could add a short example of your text ?


    I’ve already found out a solution but it mainly depends on the data’s organization and on what kind of process is needed after bookmarking !

    See you later,

    Best Regards,

    guy038


Log in to reply