• Login
Community
  • Login

sort file removing duplicates possible?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
75 Posts 5 Posters 44.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Scott Sumner @Scott Sumner
    last edited by Jun 1, 2018, 1:30 PM

    @Scott-Sumner

    …the following spec…

    Hey Scott! You did forget some things! How about when removing duplicates, we need the options to:

    • keep one occurrence of a duplicated line (when sorting)
    • keep no occurrences of a duplicated line (when sorting or not sorting)
    • keep LAST occurrence of a duplicated line (when not sorting)
    • keep FIRST occurrence of a duplicated line (when not sorting)
    C 1 Reply Last reply Jun 1, 2018, 1:32 PM Reply Quote 0
    • C
      Claudia Frank @Scott Sumner
      last edited by Jun 1, 2018, 1:32 PM

      @Scott-Sumner

      ahh - sorry too late - specs already defined for version 1 you need to open a feature request for version 2 :-D

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 4
      • C
        Claudia Frank
        last edited by Claudia Frank Jun 2, 2018, 2:50 AM Jun 2, 2018, 2:49 AM

        You can find the 1st version of the script here .

        In order to make it run there are two requirements, apart from the obvious one that you need to have python script plugin installed, which needs to be full-filled.

        1.) be sue you have either installed the full package or download and unzip the TclTk into the NPP_INSTALL_DIR. Latest releases

        2.) in order to make the “accent insensitive” feature working it is needed to install a python library called unidecode .
        Unzip the .whl package into NPP_INSTALL_DIR\plugins\lib\

        To check both requirements, open the python script console and do the following commands

         import Tkinter
         import unidecode
        

        If you don’t see any errors - done.
        Usage is simple - run the script and check the different options.

        What should work is

        • sort/delete duplicates on whole text (aka nothing is selected)
        • sort/delete duplicates on vertically selected text

        not supported yet:

        • sort/delete duplicates on rectangular selection

        Cheers
        Claudia

        Btw. I spent most of the time creating this ugly window - so if someone wants to create a nicer gui - please go for it. I’m not really good in designing UIs.

        1 Reply Last reply Reply Quote 2
        • P
          patrickdrd
          last edited by Jun 2, 2018, 7:13 AM

          tkinter doesn’t work:

          Python 2.7.6-notepad++ r2 (default, Apr 21 2014, 19:26:54) [MSC v.1600 32 bit (Intel)]
          Initialisation took 219ms
          Ready.

          import Tkinter
          Traceback (most recent call last):
          File “<console>”, line 1, in <module>
          ImportError: No module named Tkinter
          Traceback (most recent call last):
          File “D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\scripts\Sorter.py ”, line 5, in <module>
          import Tkinter as tk
          ImportError: No module named Tkinter

          1 Reply Last reply Reply Quote 0
          • P
            patrickdrd
            last edited by Jun 2, 2018, 8:39 AM

            I found out something else, about that easylist file,
            textfx’s case insensitive sort results in 69234 entries,
            which is the same as ue’s result!

            S 1 Reply Last reply Jun 2, 2018, 12:23 PM Reply Quote 0
            • G
              guy038
              last edited by guy038 Jun 2, 2018, 10:43 AM Jun 2, 2018, 10:36 AM

              Hello, @patrickdrd, and All,

              Well, I must complete my previous post !

              • Firstly, I realized that your list, below, is constantly updated ( Last modified: 02 Jun 2018 08:09 UTC )

              https://easylist.to/easylist/easylist.txt

              So, today, this list contains 69917 lines


              • Secondly, when performing the regex S/R, we must consider, both, sensitive and insensitive search => The two search regexes :

              Regex A : (?-is)(^.+\R)\1+

              Regex B : (?i-s)(^.+\R)\1+

              give, after sorting and removing duplicates with the regex, a file containing :

              A 69852 lines ( so, 65 lines deleted, in 56 matches )

              B 69817 lines ( so, 100 lines deleted, in 88 matches )


              • Thirdly, we, also, must take in account the possibility that the sort, itself, is run in a sensitive or insensitive way !

              Natively, Notepad++ sort text, according to the Unicode value ( code-point ) of characters ( a kind of sensitive sort ! ) whereas some other text editors may consider these two case options, leading to different results !

              For instance, using the RJ TextEd software, here are the differences with a simple list of three-characters strings ( 1 x ‘ABC’, 2 x ‘AbC’, 3 x ‘Abc’,1 x ‘DEF’, 3 x ‘DEf’, 2 x ‘aBC’, 3 x ‘aBc’, 3 x ‘dEF’ and 3 x ‘def’ )

                              •-----------------------•---------------------------•
                              |    with Notepad++     |      with RJ TextEd       |
              •---------------•-----------------------•---------------------------•
              |  Before Sort  |  After UNICODE Sort   |   After SENSITIVE Sort    |
              •---------------•-----------------------•---------------------------•
              |      Abc      |          ABC          |            AbC            |
              |      Abc      |          AbC          |            aBC            |
              |      DEf      |          AbC          |            aBc            |
              |      AbC      |          Abc          |            Abc            |
              |      aBc      |          Abc          |            ABC            |
              |      dEF      |          Abc          |            aBC            |
              |      aBc      |          DEF          |            Abc            |
              |      DEf      |          DEf          |            AbC            |
              |      aBC      |          DEf          |            aBc            |
              |      def      |          DEf          |            aBc            |
              |      AbC      |          aBC          |            Abc            |
              |      def      |          aBC          |            DEf            |
              |      def      |          aBc          |            dEF            |
              |      ABC      |          aBc          |            dEF            |
              |      Abc      |          aBc          |            DEf            |
              |      DEF      |          dEF          |            dEF            |
              |      aBc      |          dEF          |            DEf            |
              |      dEF      |          dEF          |            def            |
              |      dEF      |          def          |            DEF            |
              |      DEf      |          def          |            def            |
              |      aBC      |          def          |            def            |
              •---------------•-----------------------•---------------------------•
              

              So, it’s easy to understand that removing consecutive duplicates, after the sort, with the regexes above, will, necessarily, give results totally different, depending of the software used :-(


              • Fourthly, sort may give different results, after being run several times, one after another. For instance, with RJ TextEd, running 6 times the insensitive sort on the 3 character list above, I was left with 3 sets of data ( 4 times, identical to the sensitive sort and two other lists !! Luckily, as for Notepad++, its Unicode sort always give identical results :-))

              That’s why, @patrickdrd, it’s very difficult, finally, to compare results between different softwares, at each piece have its own behavior !

              Cheers,

              guy038

              P.S. :

              Here are the results of my tests :

              1) With Notepad++ and RJ TextEd, using sensitive sort :

              •---------------------------------------------------------------------------------------------------•
              |                                      with Notepad++ Features                                      |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |  Before Sort  |  After Sensitive Sort   |  After Sensitive Regex +  |  After INsensitive Regex +  |
              |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |      Abc      |           ABC           |            ABC            |             ABC             |
              |      Abc      |           AbC           |            AbC            |             DEF             |
              |      DEf      |           AbC           |            Abc            |             aBC             |
              |      AbC      |           Abc           |            DEF            |             dEF             |
              |      aBc      |           Abc           |            DEf            |                             |
              |      dEF      |           Abc           |            aBC            |                             |
              |      aBc      |           DEF           |            aBc            |                             |
              |      DEf      |           DEf           |            dEF            |                             |
              |      aBC      |           DEf           |            def            |                             |
              |      def      |           DEf           |                           |                             |
              |      AbC      |           aBC           |                           |                             |
              |      def      |           aBC           |                           |                             |
              |      def      |           aBc           |                           |                             |
              |      ABC      |           aBc           |                           |                             |
              |      Abc      |           aBc           |                           |                             |
              |      DEF      |           dEF           |                           |                             |
              |      aBc      |           dEF           |                           |                             |
              |      dEF      |           dEF           |                           |                             |
              |      dEF      |           def           |                           |                             |
              |      DEf      |           def           |                           |                             |
              |      aBC      |           def           |                           |                             |
              •---------------•-------------------------•---------------------------•-----------------------------•
              
              
              •---------------------------------------------------------------------------------------------------•
              |                                       with RJ TextEd Features                                     |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |  Before Sort  |  After Sensitive Sort   |  After Sensitive Regex +  |  After INsensitive Regex +  |
              |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |      Abc      |           AbC           |            AbC            |             AbC             |
              |      Abc      |           aBC           |            aBC            |             DEf             |
              |      DEf      |           aBc           |            aBc            |                             |
              |      AbC      |           Abc           |            Abc            |                             |
              |      aBc      |           ABC           |            ABC            |                             |
              |      dEF      |           aBC           |            aBC            |                             |
              |      aBc      |           Abc           |            Abc            |                             |
              |      DEf      |           AbC           |            AbC            |                             |
              |      aBC      |           aBc           |            aBc            |                             |
              |      def      |           aBc           |            Abc            |                             |
              |      AbC      |           Abc           |            DEf            |                             |
              |      def      |           DEf           |            dEF            |                             |
              |      def      |           dEF           |            DEf            |                             |
              |      ABC      |           dEF           |            dEF            |                             |
              |      Abc      |           DEf           |            DEf            |                             |
              |      DEF      |           dEF           |            def            |                             |
              |      aBc      |           DEf           |            DEF            |                             |
              |      dEF      |           def           |            def            |                             |
              |      dEF      |           DEF           |                           |                             |
              |      DEf      |           def           |                           |                             |
              |      aBC      |           def           |                           |                             |
              •---------------•-------------------------•---------------------------•-----------------------------•
              

              2) When running, several times, an insensitive sort, with RJ TextEd, I obtained 3 different lists :

              • The first one was identical to the table just above, which uses a sensitive sort

              • The two others are listed below !

              •---------------------------------------------------------------------------------------------------•
              |                                       with RJ TextEd Features                                     |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |  Before Sort  |  After INsensitive Sort |  After Sensitive Regex +  |  After INsensitive Regex +  |
              |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |      Abc      |          ABC            |            ABC            |             ABC             |
              |      Abc      |          AbC            |            AbC            |             dEF             |
              |      DEf      |          aBC            |            aBC            |                             |
              |      AbC      |          aBC            |            aBc            |                             |
              |      aBc      |          aBc            |            Abc            |                             |
              |      dEF      |          Abc            |            AbC            |                             |
              |      aBc      |          AbC            |            aBc            |                             |
              |      DEf      |          aBc            |            Abc            |                             |
              |      aBC      |          Abc            |            aBc            |                             |
              |      def      |          Abc            |            dEF            |                             |
              |      AbC      |          aBc            |            DEf            |                             |
              |      def      |          dEF            |            def            |                             |
              |      def      |          dEF            |            dEF            |                             |
              |      ABC      |          DEf            |            DEF            |                             |
              |      Abc      |          DEf            |            def            |                             |
              |      DEF      |          DEf            |                           |                             |
              |      aBc      |          def            |                           |                             |
              |      dEF      |          def            |                           |                             |
              |      dEF      |          dEF            |                           |                             |
              |      DEf      |          DEF            |                           |                             |
              |      aBC      |          def            |                           |                             |
              •---------------•-------------------------•---------------------------•-----------------------------•
              
              
              •---------------------------------------------------------------------------------------------------•
              |                                       with RJ TextEd Features                                     |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |  Before Sort  |  After INsensitive Sort |  After Sensitive Regex +  |  After INsensitive Regex +  |
              |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
              •---------------•-------------------------•---------------------------•-----------------------------•
              |      Abc      |          ABC            |           ABC             |            ABC              |
              |      Abc      |          AbC            |           AbC             |            dEF              |
              |      DEf      |          aBC            |           aBC             |                             |
              |      AbC      |          aBC            |           aBc             |                             |
              |      aBc      |          aBc            |           Abc             |                             |
              |      dEF      |          Abc            |           aBc             |                             |
              |      aBc      |          aBc            |           AbC             |                             |
              |      DEf      |          AbC            |           Abc             |                             |
              |      aBC      |          Abc            |           aBc             |                             |
              |      def      |          Abc            |           dEF             |                             |
              |      AbC      |          aBc            |           DEf             |                             |
              |      def      |          dEF            |           def             |                             |
              |      def      |          dEF            |           DEF             |                             |
              |      ABC      |          dEF            |           DEf             |                             |
              |      Abc      |          DEf            |                           |                             |
              |      DEF      |          DEf            |                           |                             |
              |      aBc      |          def            |                           |                             |
              |      dEF      |          def            |                           |                             |
              |      dEF      |          def            |                           |                             |
              |      DEf      |          DEF            |                           |                             |
              |      aBC      |          DEf            |                           |                             |
              •---------------•-------------------------•---------------------------•-----------------------------•
              
              1 Reply Last reply Reply Quote 1
              • C
                Claudia Frank
                last edited by Jun 2, 2018, 12:22 PM

                Patrick, did you downlaod and unzip the TclTk into the
                NPP_INSTALL_DIR ? (in your case into D:\Utilities\PortableApps\Notepad++)

                If so, can you run the following in the python script console

                import sys; print '\n'.join(sys.path) 
                

                and post the output?

                Did the unidecode library installation work?

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 0
                • S
                  Scott Sumner @patrickdrd
                  last edited by Jun 2, 2018, 12:23 PM

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • P
                    patrickdrd
                    last edited by Jun 2, 2018, 1:50 PM

                    yes, unidecode works fine, import command works

                    C 1 Reply Last reply Jun 2, 2018, 1:59 PM Reply Quote 0
                    • C
                      Claudia Frank @patrickdrd
                      last edited by Jun 2, 2018, 1:59 PM

                      @patrickdrd

                      what does the sys.path report?

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • P
                        patrickdrd
                        last edited by Jun 2, 2018, 2:43 PM

                        D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib
                        D:\Utilities\PortableApps\Notepad++\plugins\Config\PythonScript\lib
                        D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\scripts
                        D:\Utilities\PortableApps\Notepad++\plugins\Config\PythonScript\scripts
                        D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk
                        D:\Utilities\PortableApps\Notepad++\python27.zip
                        D:\Utilities\PortableApps\Notepad++\DLLs
                        D:\Utilities\PortableApps\Notepad++\lib
                        D:\Utilities\PortableApps\Notepad++\lib\plat-win
                        D:\Utilities\PortableApps\Notepad++\lib\lib-tk
                        D:\Utilities\PortableApps\Notepad++

                        C 1 Reply Last reply Jun 2, 2018, 3:04 PM Reply Quote 0
                        • C
                          Claudia Frank @patrickdrd
                          last edited by Jun 2, 2018, 3:04 PM

                          The correct one is

                          D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk

                          but those

                          D:\Utilities\PortableApps\Notepad++\lib
                          D:\Utilities\PortableApps\Notepad++\lib\plat-win
                          D:\Utilities\PortableApps\Notepad++\lib\lib-tk

                          are strange, could it be that you unzipped only part of tk packages into
                          D:\Utilities\PortableApps\Notepad++\ ?

                          Can you check if you have the following files under D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk

                          Canvas.py
                          Dialog.py
                          FileDialog.py
                          FixTk.py
                          ScrolledText.py
                          SimpleDialog.py
                          Tix.py
                          tkColorChooser.py
                          tkCommonDialog.py
                          Tkconstants.py
                          Tkdnd.py
                          tkFileDialog.py
                          tkFont.py
                          Tkinter.py
                          tkMessageBox.py
                          tkSimpleDialog.py
                          ttk.py
                          turtle.py

                          You might see additional files with extension pyc - that’s ok.

                          If you do have the files, delete the D:\Utilities\PortableApps\Notepad++\lib directory.
                          If you don’t have the files under D:\Utilities\PortableApps\Notepad++\lib\lib-tk but
                          within D:\Utilities\PortableApps\Notepad++\lib then cut D:\Utilities\PortableApps\Notepad++\lib and paste it into
                          D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\

                          Cheers
                          Claudia

                          1 Reply Last reply Reply Quote 0
                          • P
                            patrickdrd
                            last edited by Jun 2, 2018, 3:34 PM

                            I can’t find either D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk or D:\Utilities\PortableApps\Notepad++\lib folder in explorer!

                            C 1 Reply Last reply Jun 2, 2018, 3:58 PM Reply Quote 0
                            • C
                              Claudia Frank @patrickdrd
                              last edited by Jun 2, 2018, 3:58 PM

                              so how did you install Tcl/Tk libraries?

                              Cheers
                              Claudia

                              1 Reply Last reply Reply Quote 0
                              • P
                                patrickdrd
                                last edited by Jun 2, 2018, 4:01 PM

                                I extracted the zip of course, the folder you say is in:
                                d:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\tcl\lib-tk\

                                both in zip file and my explorer!

                                1 Reply Last reply Reply Quote 0
                                • P
                                  patrickdrd
                                  last edited by Jun 2, 2018, 4:22 PM

                                  I’ve just read guy038’s post and I’m more confused :S

                                  I downloaded the file again and now it’s Last modified: 02 Jun 2018 16:00 UTC
                                  and 69930 results,
                                  sorting with insensitive (ue and textfx) yields 69284 and the output should be similar,
                                  so I should be satisfied by that consensus I guess?

                                  C 1 Reply Last reply Jun 2, 2018, 4:34 PM Reply Quote 0
                                  • C
                                    Claudia Frank @patrickdrd
                                    last edited by Jun 2, 2018, 4:34 PM

                                    @patrickdrd

                                    the easylist file is adblocker file it will change consistently.

                                    Regarding the Tcl/Tk installation - you should have unzipped it into
                                    D:\Utilities\PortableApps\Notepad++\ directory.

                                    The zip contains the complete folder hierachy - as you see on the left side (archive tree)

                                    if you did this you normally got a message saying that the plugins folder already exists and
                                    if you want it to overwrite -> you should have answered this with yes, didn’t you?

                                    Cheers
                                    Claudia

                                    1 Reply Last reply Reply Quote 0
                                    • P
                                      patrickdrd
                                      last edited by Jun 2, 2018, 4:37 PM

                                      yep, that’s what I got https://imgur.com/a/xNQB5Gn

                                      C 1 Reply Last reply Jun 2, 2018, 5:36 PM Reply Quote 0
                                      • C
                                        Claudia Frank @patrickdrd
                                        last edited by Jun 2, 2018, 5:36 PM

                                        @patrickdrd

                                        took some time to understand the difference.
                                        You do have
                                        …\Notepad++\plugins\PythonScript\lib\tcl\lib-tk
                                        where I do have
                                        …\Notepad++\plugins\PythonScript\lib\lib-tk

                                        so the error makes sense as it can’t be found in …\lib\lib-tk

                                        You could try to add the following to your user startup.py script

                                        import sys
                                        sys.path.append(r'D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\tcl\lib-tk')
                                        

                                        and restart npp and do another import Tkinter test.

                                        Cheers
                                        Claudia

                                        1 Reply Last reply Reply Quote 0
                                        • P
                                          patrickdrd
                                          last edited by Jun 2, 2018, 6:25 PM

                                          ok thanks, first thing tomorrow with the morning coffee
                                          :-D

                                          1 Reply Last reply Reply Quote 0
                                          54 out of 75
                                          • First post
                                            54/75
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors