• Login
Community
  • Login

How to remove duplicate words in a list that are not consecutive?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
8 Posts 5 Posters 989 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Debojit Acharjee
    last edited by Feb 19, 2024, 5:28 AM

    I need to remove duplicate words in a list of words as lines. But these duplicates are not listed consecutively and are in various line numbers.

    I tried the Edit>Line Operations>Remove Duplicate Lines, but this feature doesn’t work when there are thousands of lines in a list.

    So I need to know upto how many lines this feature works?

    Is there any other script to do this for lines more than 30k?

    T 1 Reply Last reply Feb 19, 2024, 6:01 AM Reply Quote 0
    • T
      Terry R @Debojit Acharjee
      last edited by Terry R Feb 19, 2024, 6:02 AM Feb 19, 2024, 6:01 AM

      @Debojit-Acharjee said in How to remove duplicate words in a list that are not consecutive?:

      I tried the Edit>Line Operations>Remove Duplicate Lines,

      I think it should work, even for the number if lines you mention.

      The online manual reference is here , down a bit, look for Line Operations. There is a requirement that the line endings are uniform and meet the file type as shown in the bottom bar. Have a read and check that your file meets those requirements.

      Terry

      1 Reply Last reply Reply Quote 1
      • M
        Mark Olson
        last edited by Feb 19, 2024, 6:39 AM

        The following PythonScript script is a fine solution:

        '''
        ref: https://community.notepad-plus-plus.org/topic/25492/how-to-remove-duplicate-words-in-a-list-that-are-not-consecutive
        requires PythonScript: https://github.com/bruderstein/PythonScript
        '''
        from Npp import *
        
        values = set()
        
        def callback(match):
            line = match.group(0)
            if line in values:
                return ''
            values.add(line)
            return line
            
        
        editor.rereplace('(?-s)^.*$', callback)
        
        1 Reply Last reply Reply Quote 0
        • M Mark Olson referenced this topic on Feb 19, 2024, 6:39 AM
        • A
          Alan Kilborn
          last edited by Feb 19, 2024, 12:37 PM

          Notes on Mark’s script:

          • when it removes a duplicate line, it leaves an empty line at the position of the removal (perhaps the OP wants this, perhaps not; OP didn’t say)

          • naming the replacement function callback is something I don’t really like, as “callback” has a bit of a different connotation in PythonScript programming (but this is MY problem, not a problem with the script)

          There are some other useful scripts for removing duplicate lines in THIS fairly old thread.

          D 1 Reply Last reply Feb 20, 2024, 5:48 AM Reply Quote 2
          • D
            Debojit Acharjee @Alan Kilborn
            last edited by Feb 20, 2024, 5:48 AM

            @Alan-Kilborn I can use script but I want to know why the “Remove Duplicate Lines” feature of “Line Operations” in Notepad++ doesn’t work when there are more than 30 thousand lines?

            Is there any thing to do with the CPU register memory?

            T D A 3 Replies Last reply Feb 20, 2024, 6:22 AM Reply Quote 0
            • T
              Terry R @Debojit Acharjee
              last edited by Terry R Feb 20, 2024, 6:24 AM Feb 20, 2024, 6:22 AM

              @Debojit-Acharjee

              Did you check the online reference I linked to?

              I just created a 30K plus line file with 1 word on every line. Since I had approximately 1500 words which I duplicated it was going to remove most lines when the “Remove Duplicate Lines” option was used, leaving just over 1100 words as seen in the image below. The removal was very quick, only about 1 second (or less).

              82abee8a-5a18-4e0f-903a-1573cc859c10-image.png

              I have shown the line ending in the file and also pointed out that the file is recognized as the same type (CR LF). That is what the online reference refers to. To show the line endings you use the View menu, then “Show Symbol”, then tick at least the “Show End of Line”.

              Do that and take a picture of your file, post it here. Also copy and paste the version of your Notepad++ installation. It is under the ? menu, then “Debug Info”.

              As your installation may well be using a different language these options will need to be translated to your language.

              Without that additional information we have no way of identifying your problem, but rest assured that the “Remove Duplicate Lines” option does work if the requirements are met.

              Terry

              1 Reply Last reply Reply Quote 3
              • D
                dr ramaanand @Debojit Acharjee
                last edited by Feb 21, 2024, 2:37 AM

                @Debojit-Acharjee The simplest solution is to use the Python script @Mark-Olson gave you. Please click and read, “How to install and run a script in Python Script” if you don’t know how to install and run a script in python script.

                1 Reply Last reply Reply Quote 0
                • A
                  Alan Kilborn @Debojit Acharjee
                  last edited by Alan Kilborn Feb 21, 2024, 1:12 PM Feb 21, 2024, 1:12 PM

                  @Debojit-Acharjee said:

                  I can use script but I want to know why the “Remove Duplicate Lines” feature of “Line Operations” in Notepad++ doesn’t work when there are more than 30 thousand lines? Is there any thing to do with the CPU register memory?

                  The best way to explore this is to create an “issue” on the official bug reporting site (see HERE for info on that) and attach the 30K+ file where it fails.

                  1 Reply Last reply Reply Quote 2
                  3 out of 8
                  • First post
                    3/8
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors