• Login
Community
  • Login

Is there a way to search for duplicate records in Notepad++?

Scheduled Pinned Locked Moved General Discussion
9 Posts 7 Posters 194.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Adam Buxbaum
    last edited by Sep 30, 2015, 12:36 PM

    Hi All,

    I currently and using Notepad++ to review user files before uploading into our provisioning system and I was curious if there was a way to search for duplicates (emails, UID’s, etc…) within Notepad++ or do I have to save the file and review it in excel to do this?

    All assistance is greatly appreciated.

    Best,
    Adam

    1 Reply Last reply Reply Quote 0
    • T
      tomas-chrastina
      last edited by tomas-chrastina Oct 5, 2015, 10:07 AM Oct 5, 2015, 10:04 AM

      Hi Adam,

      I don’t think there is a way in meaning of search. If you don’t think like use Smart Highlighting, CTRL+F3 or just search.

      But there’s a way to remove duplicates from simple list without excel (I use it a lot). So if you have simple list of values:

      value1
      value2
      value2
      value3
      value2
      

      you can simply get list of unique values:

      value1
      value2
      value3
      

      like this:

      1. You need plugin TextFX Characters
      2. Backup your current editing file !!!
      3. Set TextFX: Menu -> TextFX -> TextFX Tools:
        ✓ +Sort ascending
        ✓ +Sort outputs only UNIQUE (at column) lines
      4. Select text
      5. Use one of the actions: Menu -> TextFX -> TextFX Tools:
        a) Sort lines case sensitive (at column)
        b) Sort lines case insensitive (at column)
      6. Remember to DISABLE option +Sort outputs only UNIQUE (at column) lines, so you won’t lose data when just sorting later!

      Still it won’t work for some complex multi-column data, where only Excel filters/remove duplicates of specific data will help.


      Best regards,
      Tomas

      1 Reply Last reply Reply Quote 0
      • R
        rajeshp2408
        last edited by Jun 22, 2018, 2:46 AM

        Thanks really helped…:-)

        1 Reply Last reply Reply Quote 0
        • M
          Matthias Heim
          last edited by Jun 20, 2020, 8:38 AM

          [Adding my own answer, since this answer gets so many views and was the top result on google]
          There is no need to use a plugin.

          You can easily find duplicate lines with the following regex:
          ^([^\r\n]+)$(?=.*?^\1$)

          This will find the all occurrences of duplicate lines except the last, so you can also use search and replace to delete them.

          You can see it in action here: https://regex101.com/r/5GPJfz/1

          Just make sure that you activate the option “. finds \r and \n” in the search-dialogue.

          A 1 Reply Last reply Jun 24, 2020, 7:13 PM Reply Quote 3
          • A
            Alan Kilborn @Matthias Heim
            last edited by Jun 24, 2020, 7:13 PM

            @Matthias-Heim

            For me, I like this one to do the same thing:

            ^((?-s).+?)\R(?=(?s).*?^\1(?:\R|\z))

            It has (at least) two advantages:

            • You don’t have to care about the state of the . matches newline box

            • The last line of the file doesn’t have to have a line-ending on it to be considered in the duplicate decision (the text itself decides that) – whether it is truly a duplicate then is up for debate, but I think it is

            1 Reply Last reply Reply Quote 5
            • G
              guy038
              last edited by Jun 25, 2020, 9:32 PM

              Hello, @matthias-heim, @alan-kilborn and All,

              Alan, I don’t think that the lazy quantifier, at beginning of the regex is necessary, as, obviously, the EOL chars must be matched, anyway !

              Hence, the syntax :

              (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))


              However, @matthias-heim be aware that in case of an important amount of lines between current line scanned and its nearest duplicate, the regex may completely fail to detect correct matches :-((

              Best Regards

              guy038

              M 1 Reply Last reply Aug 8, 2020, 8:39 AM Reply Quote 4
              • M
                Mohammed Asif @guy038
                last edited by Aug 8, 2020, 8:39 AM

                @guy038 said in Is there a way to search for duplicate records in Notepad++?:

                (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))

                can you please tell me how to mark both lines (original+duplicate)?

                1 Reply Last reply Reply Quote 0
                • G
                  guy038
                  last edited by Aug 9, 2020, 8:12 AM

                  Hello, @mohammed-asif and All,

                  Before practically answering to your question, could you tell us some hints about your data :

                  • Why do you want to mark all the duplicate lines ? Do you intend to delete them all or copy them for other process or else ?

                  • How many lines, about, to processed and the average length of the lines ?

                  • How many lines, max, about, between two duplicate lines ?

                  May be, you could add a short example of your text ?


                  I’ve already found out a solution but it mainly depends on the data’s organization and on what kind of process is needed after bookmarking !

                  See you later,

                  Best Regards,

                  guy038

                  A 1 Reply Last reply Nov 17, 2023, 2:28 PM Reply Quote 2
                  • Y Yaron referenced this topic on Nov 17, 2023, 2:17 PM
                  • A
                    Alan Kilborn @guy038
                    last edited by Nov 17, 2023, 2:28 PM

                    @guy038 said in Is there a way to search for duplicate records in Notepad++?:

                    Why do you want to mark all the duplicate lines ?

                    A practical reason (not involving copy/cut/delete) for this might be so that each duplicate line can be visited and manually edited to be made unique in some way.

                    1 Reply Last reply Reply Quote 4
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors