Community
    • Login

    Is there a way to search for duplicate records in Notepad++?

    Scheduled Pinned Locked Moved General Discussion
    9 Posts 7 Posters 223.2k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Adam BuxbaumA Offline
      Adam Buxbaum
      last edited by

      Hi All,

      I currently and using Notepad++ to review user files before uploading into our provisioning system and I was curious if there was a way to search for duplicates (emails, UID’s, etc…) within Notepad++ or do I have to save the file and review it in excel to do this?

      All assistance is greatly appreciated.

      Best,
      Adam

      1 Reply Last reply Reply Quote 0
      • tomas-chrastinaT Offline
        tomas-chrastina
        last edited by tomas-chrastina

        Hi Adam,

        I don’t think there is a way in meaning of search. If you don’t think like use Smart Highlighting, CTRL+F3 or just search.

        But there’s a way to remove duplicates from simple list without excel (I use it a lot). So if you have simple list of values:

        value1
        value2
        value2
        value3
        value2
        

        you can simply get list of unique values:

        value1
        value2
        value3
        

        like this:

        1. You need plugin TextFX Characters
        2. Backup your current editing file !!!
        3. Set TextFX: Menu -> TextFX -> TextFX Tools:
          ✓ +Sort ascending
          ✓ +Sort outputs only UNIQUE (at column) lines
        4. Select text
        5. Use one of the actions: Menu -> TextFX -> TextFX Tools:
          a) Sort lines case sensitive (at column)
          b) Sort lines case insensitive (at column)
        6. Remember to DISABLE option +Sort outputs only UNIQUE (at column) lines, so you won’t lose data when just sorting later!

        Still it won’t work for some complex multi-column data, where only Excel filters/remove duplicates of specific data will help.


        Best regards,
        Tomas

        1 Reply Last reply Reply Quote 0
        • rajeshp2408R Offline
          rajeshp2408
          last edited by

          Thanks really helped…:-)

          1 Reply Last reply Reply Quote 0
          • Matthias HeimM Offline
            Matthias Heim
            last edited by

            [Adding my own answer, since this answer gets so many views and was the top result on google]
            There is no need to use a plugin.

            You can easily find duplicate lines with the following regex:
            ^([^\r\n]+)$(?=.*?^\1$)

            This will find the all occurrences of duplicate lines except the last, so you can also use search and replace to delete them.

            You can see it in action here: https://regex101.com/r/5GPJfz/1

            Just make sure that you activate the option “. finds \r and \n” in the search-dialogue.

            Alan KilbornA 1 Reply Last reply Reply Quote 3
            • Alan KilbornA Offline
              Alan Kilborn @Matthias Heim
              last edited by

              @Matthias-Heim

              For me, I like this one to do the same thing:

              ^((?-s).+?)\R(?=(?s).*?^\1(?:\R|\z))

              It has (at least) two advantages:

              • You don’t have to care about the state of the . matches newline box

              • The last line of the file doesn’t have to have a line-ending on it to be considered in the duplicate decision (the text itself decides that) – whether it is truly a duplicate then is up for debate, but I think it is

              1 Reply Last reply Reply Quote 5
              • guy038G Offline
                guy038
                last edited by

                Hello, @matthias-heim, @alan-kilborn and All,

                Alan, I don’t think that the lazy quantifier, at beginning of the regex is necessary, as, obviously, the EOL chars must be matched, anyway !

                Hence, the syntax :

                (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))


                However, @matthias-heim be aware that in case of an important amount of lines between current line scanned and its nearest duplicate, the regex may completely fail to detect correct matches :-((

                Best Regards

                guy038

                Mohammed AsifM 1 Reply Last reply Reply Quote 4
                • Mohammed AsifM Offline
                  Mohammed Asif @guy038
                  last edited by

                  @guy038 said in Is there a way to search for duplicate records in Notepad++?:

                  (?-s)^(.+)\R(?=(?s).*?^\1(?:\R|\z))

                  can you please tell me how to mark both lines (original+duplicate)?

                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by

                    Hello, @mohammed-asif and All,

                    Before practically answering to your question, could you tell us some hints about your data :

                    • Why do you want to mark all the duplicate lines ? Do you intend to delete them all or copy them for other process or else ?

                    • How many lines, about, to processed and the average length of the lines ?

                    • How many lines, max, about, between two duplicate lines ?

                    May be, you could add a short example of your text ?


                    I’ve already found out a solution but it mainly depends on the data’s organization and on what kind of process is needed after bookmarking !

                    See you later,

                    Best Regards,

                    guy038

                    Alan KilbornA 1 Reply Last reply Reply Quote 2
                    • YaronY Yaron referenced this topic on
                    • Alan KilbornA Offline
                      Alan Kilborn @guy038
                      last edited by

                      @guy038 said in Is there a way to search for duplicate records in Notepad++?:

                      Why do you want to mark all the duplicate lines ?

                      A practical reason (not involving copy/cut/delete) for this might be so that each duplicate line can be visited and manually edited to be made unique in some way.

                      1 Reply Last reply Reply Quote 4

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors