Community
    • Login

    How to remove duplicate words in a list that are not consecutive?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 5 Posters 972 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Debojit AcharjeeD
      Debojit Acharjee
      last edited by

      I need to remove duplicate words in a list of words as lines. But these duplicates are not listed consecutively and are in various line numbers.

      I tried the Edit>Line Operations>Remove Duplicate Lines, but this feature doesn’t work when there are thousands of lines in a list.

      So I need to know upto how many lines this feature works?

      Is there any other script to do this for lines more than 30k?

      Terry RT 1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @Debojit Acharjee
        last edited by Terry R

        @Debojit-Acharjee said in How to remove duplicate words in a list that are not consecutive?:

        I tried the Edit>Line Operations>Remove Duplicate Lines,

        I think it should work, even for the number if lines you mention.

        The online manual reference is here, down a bit, look for Line Operations. There is a requirement that the line endings are uniform and meet the file type as shown in the bottom bar. Have a read and check that your file meets those requirements.

        Terry

        1 Reply Last reply Reply Quote 1
        • Mark OlsonM
          Mark Olson
          last edited by

          The following PythonScript script is a fine solution:

          '''
          ref: https://community.notepad-plus-plus.org/topic/25492/how-to-remove-duplicate-words-in-a-list-that-are-not-consecutive
          requires PythonScript: https://github.com/bruderstein/PythonScript
          '''
          from Npp import *
          
          values = set()
          
          def callback(match):
              line = match.group(0)
              if line in values:
                  return ''
              values.add(line)
              return line
              
          
          editor.rereplace('(?-s)^.*$', callback)
          
          1 Reply Last reply Reply Quote 0
          • Mark OlsonM Mark Olson referenced this topic on
          • Alan KilbornA
            Alan Kilborn
            last edited by

            Notes on Mark’s script:

            • when it removes a duplicate line, it leaves an empty line at the position of the removal (perhaps the OP wants this, perhaps not; OP didn’t say)

            • naming the replacement function callback is something I don’t really like, as “callback” has a bit of a different connotation in PythonScript programming (but this is MY problem, not a problem with the script)

            There are some other useful scripts for removing duplicate lines in THIS fairly old thread.

            Debojit AcharjeeD 1 Reply Last reply Reply Quote 2
            • Debojit AcharjeeD
              Debojit Acharjee @Alan Kilborn
              last edited by

              @Alan-Kilborn I can use script but I want to know why the “Remove Duplicate Lines” feature of “Line Operations” in Notepad++ doesn’t work when there are more than 30 thousand lines?

              Is there any thing to do with the CPU register memory?

              Terry RT dr ramaanandD Alan KilbornA 3 Replies Last reply Reply Quote 0
              • Terry RT
                Terry R @Debojit Acharjee
                last edited by Terry R

                @Debojit-Acharjee

                Did you check the online reference I linked to?

                I just created a 30K plus line file with 1 word on every line. Since I had approximately 1500 words which I duplicated it was going to remove most lines when the “Remove Duplicate Lines” option was used, leaving just over 1100 words as seen in the image below. The removal was very quick, only about 1 second (or less).

                82abee8a-5a18-4e0f-903a-1573cc859c10-image.png

                I have shown the line ending in the file and also pointed out that the file is recognized as the same type (CR LF). That is what the online reference refers to. To show the line endings you use the View menu, then “Show Symbol”, then tick at least the “Show End of Line”.

                Do that and take a picture of your file, post it here. Also copy and paste the version of your Notepad++ installation. It is under the ? menu, then “Debug Info”.

                As your installation may well be using a different language these options will need to be translated to your language.

                Without that additional information we have no way of identifying your problem, but rest assured that the “Remove Duplicate Lines” option does work if the requirements are met.

                Terry

                1 Reply Last reply Reply Quote 3
                • dr ramaanandD
                  dr ramaanand @Debojit Acharjee
                  last edited by

                  @Debojit-Acharjee The simplest solution is to use the Python script @Mark-Olson gave you. Please click and read, “How to install and run a script in Python Script” if you don’t know how to install and run a script in python script.

                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @Debojit Acharjee
                    last edited by Alan Kilborn

                    @Debojit-Acharjee said:

                    I can use script but I want to know why the “Remove Duplicate Lines” feature of “Line Operations” in Notepad++ doesn’t work when there are more than 30 thousand lines? Is there any thing to do with the CPU register memory?

                    The best way to explore this is to create an “issue” on the official bug reporting site (see HERE for info on that) and attach the 30K+ file where it fails.

                    1 Reply Last reply Reply Quote 2
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors