• Login
Community
  • Login

sort file removing duplicates possible?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
75 Posts 5 Posters 61.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    patrickdrd @Claudia Frank
    last edited by Jun 1, 2018, 1:09 PM

    @Claudia-Frank said:

    @Patrick

    sorry, don’t know the term “accent insensitive” , what does it mean?
    For example that è is the same as e?

    Can you provide example data (just need a couple of lines) to see if it is working correctly?
    The speed test I will do with the easylist text.

    Cheers
    Claudia

    yes, exactly that

    1 Reply Last reply Reply Quote 1
    • C
      Claudia Frank
      last edited by Jun 1, 2018, 1:26 PM

      OK - let’s see what we can do.

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 1
      • S
        Scott Sumner @Scott Sumner
        last edited by Jun 1, 2018, 1:30 PM

        @Scott-Sumner

        …the following spec…

        Hey Scott! You did forget some things! How about when removing duplicates, we need the options to:

        • keep one occurrence of a duplicated line (when sorting)
        • keep no occurrences of a duplicated line (when sorting or not sorting)
        • keep LAST occurrence of a duplicated line (when not sorting)
        • keep FIRST occurrence of a duplicated line (when not sorting)
        C 1 Reply Last reply Jun 1, 2018, 1:32 PM Reply Quote 0
        • C
          Claudia Frank @Scott Sumner
          last edited by Jun 1, 2018, 1:32 PM

          @Scott-Sumner

          ahh - sorry too late - specs already defined for version 1 you need to open a feature request for version 2 :-D

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 4
          • C
            Claudia Frank
            last edited by Claudia Frank Jun 2, 2018, 2:50 AM Jun 2, 2018, 2:49 AM

            You can find the 1st version of the script here .

            In order to make it run there are two requirements, apart from the obvious one that you need to have python script plugin installed, which needs to be full-filled.

            1.) be sue you have either installed the full package or download and unzip the TclTk into the NPP_INSTALL_DIR. Latest releases

            2.) in order to make the “accent insensitive” feature working it is needed to install a python library called unidecode .
            Unzip the .whl package into NPP_INSTALL_DIR\plugins\lib\

            To check both requirements, open the python script console and do the following commands

             import Tkinter
             import unidecode
            

            If you don’t see any errors - done.
            Usage is simple - run the script and check the different options.

            What should work is

            • sort/delete duplicates on whole text (aka nothing is selected)
            • sort/delete duplicates on vertically selected text

            not supported yet:

            • sort/delete duplicates on rectangular selection

            Cheers
            Claudia

            Btw. I spent most of the time creating this ugly window - so if someone wants to create a nicer gui - please go for it. I’m not really good in designing UIs.

            1 Reply Last reply Reply Quote 2
            • P
              patrickdrd
              last edited by Jun 2, 2018, 7:13 AM

              tkinter doesn’t work:

              Python 2.7.6-notepad++ r2 (default, Apr 21 2014, 19:26:54) [MSC v.1600 32 bit (Intel)]
              Initialisation took 219ms
              Ready.

              import Tkinter
              Traceback (most recent call last):
              File “<console>”, line 1, in <module>
              ImportError: No module named Tkinter
              Traceback (most recent call last):
              File “D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\scripts\Sorter.py ”, line 5, in <module>
              import Tkinter as tk
              ImportError: No module named Tkinter

              1 Reply Last reply Reply Quote 0
              • P
                patrickdrd
                last edited by Jun 2, 2018, 8:39 AM

                I found out something else, about that easylist file,
                textfx’s case insensitive sort results in 69234 entries,
                which is the same as ue’s result!

                S 1 Reply Last reply Jun 2, 2018, 12:23 PM Reply Quote 0
                • G
                  guy038
                  last edited by guy038 Jun 2, 2018, 10:43 AM Jun 2, 2018, 10:36 AM

                  Hello, @patrickdrd, and All,

                  Well, I must complete my previous post !

                  • Firstly, I realized that your list, below, is constantly updated ( Last modified: 02 Jun 2018 08:09 UTC )

                  https://easylist.to/easylist/easylist.txt

                  So, today, this list contains 69917 lines


                  • Secondly, when performing the regex S/R, we must consider, both, sensitive and insensitive search => The two search regexes :

                  Regex A : (?-is)(^.+\R)\1+

                  Regex B : (?i-s)(^.+\R)\1+

                  give, after sorting and removing duplicates with the regex, a file containing :

                  A 69852 lines ( so, 65 lines deleted, in 56 matches )

                  B 69817 lines ( so, 100 lines deleted, in 88 matches )


                  • Thirdly, we, also, must take in account the possibility that the sort, itself, is run in a sensitive or insensitive way !

                  Natively, Notepad++ sort text, according to the Unicode value ( code-point ) of characters ( a kind of sensitive sort ! ) whereas some other text editors may consider these two case options, leading to different results !

                  For instance, using the RJ TextEd software, here are the differences with a simple list of three-characters strings ( 1 x ‘ABC’, 2 x ‘AbC’, 3 x ‘Abc’,1 x ‘DEF’, 3 x ‘DEf’, 2 x ‘aBC’, 3 x ‘aBc’, 3 x ‘dEF’ and 3 x ‘def’ )

                                  •-----------------------•---------------------------•
                                  |    with Notepad++     |      with RJ TextEd       |
                  •---------------•-----------------------•---------------------------•
                  |  Before Sort  |  After UNICODE Sort   |   After SENSITIVE Sort    |
                  •---------------•-----------------------•---------------------------•
                  |      Abc      |          ABC          |            AbC            |
                  |      Abc      |          AbC          |            aBC            |
                  |      DEf      |          AbC          |            aBc            |
                  |      AbC      |          Abc          |            Abc            |
                  |      aBc      |          Abc          |            ABC            |
                  |      dEF      |          Abc          |            aBC            |
                  |      aBc      |          DEF          |            Abc            |
                  |      DEf      |          DEf          |            AbC            |
                  |      aBC      |          DEf          |            aBc            |
                  |      def      |          DEf          |            aBc            |
                  |      AbC      |          aBC          |            Abc            |
                  |      def      |          aBC          |            DEf            |
                  |      def      |          aBc          |            dEF            |
                  |      ABC      |          aBc          |            dEF            |
                  |      Abc      |          aBc          |            DEf            |
                  |      DEF      |          dEF          |            dEF            |
                  |      aBc      |          dEF          |            DEf            |
                  |      dEF      |          dEF          |            def            |
                  |      dEF      |          def          |            DEF            |
                  |      DEf      |          def          |            def            |
                  |      aBC      |          def          |            def            |
                  •---------------•-----------------------•---------------------------•
                  

                  So, it’s easy to understand that removing consecutive duplicates, after the sort, with the regexes above, will, necessarily, give results totally different, depending of the software used :-(


                  • Fourthly, sort may give different results, after being run several times, one after another. For instance, with RJ TextEd, running 6 times the insensitive sort on the 3 character list above, I was left with 3 sets of data ( 4 times, identical to the sensitive sort and two other lists !! Luckily, as for Notepad++, its Unicode sort always give identical results :-))

                  That’s why, @patrickdrd, it’s very difficult, finally, to compare results between different softwares, at each piece have its own behavior !

                  Cheers,

                  guy038

                  P.S. :

                  Here are the results of my tests :

                  1) With Notepad++ and RJ TextEd, using sensitive sort :

                  •---------------------------------------------------------------------------------------------------•
                  |                                      with Notepad++ Features                                      |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |  Before Sort  |  After Sensitive Sort   |  After Sensitive Regex +  |  After INsensitive Regex +  |
                  |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |      Abc      |           ABC           |            ABC            |             ABC             |
                  |      Abc      |           AbC           |            AbC            |             DEF             |
                  |      DEf      |           AbC           |            Abc            |             aBC             |
                  |      AbC      |           Abc           |            DEF            |             dEF             |
                  |      aBc      |           Abc           |            DEf            |                             |
                  |      dEF      |           Abc           |            aBC            |                             |
                  |      aBc      |           DEF           |            aBc            |                             |
                  |      DEf      |           DEf           |            dEF            |                             |
                  |      aBC      |           DEf           |            def            |                             |
                  |      def      |           DEf           |                           |                             |
                  |      AbC      |           aBC           |                           |                             |
                  |      def      |           aBC           |                           |                             |
                  |      def      |           aBc           |                           |                             |
                  |      ABC      |           aBc           |                           |                             |
                  |      Abc      |           aBc           |                           |                             |
                  |      DEF      |           dEF           |                           |                             |
                  |      aBc      |           dEF           |                           |                             |
                  |      dEF      |           dEF           |                           |                             |
                  |      dEF      |           def           |                           |                             |
                  |      DEf      |           def           |                           |                             |
                  |      aBC      |           def           |                           |                             |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  
                  
                  •---------------------------------------------------------------------------------------------------•
                  |                                       with RJ TextEd Features                                     |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |  Before Sort  |  After Sensitive Sort   |  After Sensitive Regex +  |  After INsensitive Regex +  |
                  |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |      Abc      |           AbC           |            AbC            |             AbC             |
                  |      Abc      |           aBC           |            aBC            |             DEf             |
                  |      DEf      |           aBc           |            aBc            |                             |
                  |      AbC      |           Abc           |            Abc            |                             |
                  |      aBc      |           ABC           |            ABC            |                             |
                  |      dEF      |           aBC           |            aBC            |                             |
                  |      aBc      |           Abc           |            Abc            |                             |
                  |      DEf      |           AbC           |            AbC            |                             |
                  |      aBC      |           aBc           |            aBc            |                             |
                  |      def      |           aBc           |            Abc            |                             |
                  |      AbC      |           Abc           |            DEf            |                             |
                  |      def      |           DEf           |            dEF            |                             |
                  |      def      |           dEF           |            DEf            |                             |
                  |      ABC      |           dEF           |            dEF            |                             |
                  |      Abc      |           DEf           |            DEf            |                             |
                  |      DEF      |           dEF           |            def            |                             |
                  |      aBc      |           DEf           |            DEF            |                             |
                  |      dEF      |           def           |            def            |                             |
                  |      dEF      |           DEF           |                           |                             |
                  |      DEf      |           def           |                           |                             |
                  |      aBC      |           def           |                           |                             |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  

                  2) When running, several times, an insensitive sort, with RJ TextEd, I obtained 3 different lists :

                  • The first one was identical to the table just above, which uses a sensitive sort

                  • The two others are listed below !

                  •---------------------------------------------------------------------------------------------------•
                  |                                       with RJ TextEd Features                                     |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |  Before Sort  |  After INsensitive Sort |  After Sensitive Regex +  |  After INsensitive Regex +  |
                  |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |      Abc      |          ABC            |            ABC            |             ABC             |
                  |      Abc      |          AbC            |            AbC            |             dEF             |
                  |      DEf      |          aBC            |            aBC            |                             |
                  |      AbC      |          aBC            |            aBc            |                             |
                  |      aBc      |          aBc            |            Abc            |                             |
                  |      dEF      |          Abc            |            AbC            |                             |
                  |      aBc      |          AbC            |            aBc            |                             |
                  |      DEf      |          aBc            |            Abc            |                             |
                  |      aBC      |          Abc            |            aBc            |                             |
                  |      def      |          Abc            |            dEF            |                             |
                  |      AbC      |          aBc            |            DEf            |                             |
                  |      def      |          dEF            |            def            |                             |
                  |      def      |          dEF            |            dEF            |                             |
                  |      ABC      |          DEf            |            DEF            |                             |
                  |      Abc      |          DEf            |            def            |                             |
                  |      DEF      |          DEf            |                           |                             |
                  |      aBc      |          def            |                           |                             |
                  |      dEF      |          def            |                           |                             |
                  |      dEF      |          dEF            |                           |                             |
                  |      DEf      |          DEF            |                           |                             |
                  |      aBC      |          def            |                           |                             |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  
                  
                  •---------------------------------------------------------------------------------------------------•
                  |                                       with RJ TextEd Features                                     |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |  Before Sort  |  After INsensitive Sort |  After Sensitive Regex +  |  After INsensitive Regex +  |
                  |               |                         |  Suppression Duplicates   |   Suppression Duplicates    |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  |      Abc      |          ABC            |           ABC             |            ABC              |
                  |      Abc      |          AbC            |           AbC             |            dEF              |
                  |      DEf      |          aBC            |           aBC             |                             |
                  |      AbC      |          aBC            |           aBc             |                             |
                  |      aBc      |          aBc            |           Abc             |                             |
                  |      dEF      |          Abc            |           aBc             |                             |
                  |      aBc      |          aBc            |           AbC             |                             |
                  |      DEf      |          AbC            |           Abc             |                             |
                  |      aBC      |          Abc            |           aBc             |                             |
                  |      def      |          Abc            |           dEF             |                             |
                  |      AbC      |          aBc            |           DEf             |                             |
                  |      def      |          dEF            |           def             |                             |
                  |      def      |          dEF            |           DEF             |                             |
                  |      ABC      |          dEF            |           DEf             |                             |
                  |      Abc      |          DEf            |                           |                             |
                  |      DEF      |          DEf            |                           |                             |
                  |      aBc      |          def            |                           |                             |
                  |      dEF      |          def            |                           |                             |
                  |      dEF      |          def            |                           |                             |
                  |      DEf      |          DEF            |                           |                             |
                  |      aBC      |          DEf            |                           |                             |
                  •---------------•-------------------------•---------------------------•-----------------------------•
                  
                  1 Reply Last reply Reply Quote 1
                  • C
                    Claudia Frank
                    last edited by Jun 2, 2018, 12:22 PM

                    Patrick, did you downlaod and unzip the TclTk into the
                    NPP_INSTALL_DIR ? (in your case into D:\Utilities\PortableApps\Notepad++)

                    If so, can you run the following in the python script console

                    import sys; print '\n'.join(sys.path) 
                    

                    and post the output?

                    Did the unidecode library installation work?

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • S
                      Scott Sumner @patrickdrd
                      last edited by Jun 2, 2018, 12:23 PM

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • P
                        patrickdrd
                        last edited by Jun 2, 2018, 1:50 PM

                        yes, unidecode works fine, import command works

                        C 1 Reply Last reply Jun 2, 2018, 1:59 PM Reply Quote 0
                        • C
                          Claudia Frank @patrickdrd
                          last edited by Jun 2, 2018, 1:59 PM

                          @patrickdrd

                          what does the sys.path report?

                          Cheers
                          Claudia

                          1 Reply Last reply Reply Quote 0
                          • P
                            patrickdrd
                            last edited by Jun 2, 2018, 2:43 PM

                            D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib
                            D:\Utilities\PortableApps\Notepad++\plugins\Config\PythonScript\lib
                            D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\scripts
                            D:\Utilities\PortableApps\Notepad++\plugins\Config\PythonScript\scripts
                            D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk
                            D:\Utilities\PortableApps\Notepad++\python27.zip
                            D:\Utilities\PortableApps\Notepad++\DLLs
                            D:\Utilities\PortableApps\Notepad++\lib
                            D:\Utilities\PortableApps\Notepad++\lib\plat-win
                            D:\Utilities\PortableApps\Notepad++\lib\lib-tk
                            D:\Utilities\PortableApps\Notepad++

                            C 1 Reply Last reply Jun 2, 2018, 3:04 PM Reply Quote 0
                            • C
                              Claudia Frank @patrickdrd
                              last edited by Jun 2, 2018, 3:04 PM

                              The correct one is

                              D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk

                              but those

                              D:\Utilities\PortableApps\Notepad++\lib
                              D:\Utilities\PortableApps\Notepad++\lib\plat-win
                              D:\Utilities\PortableApps\Notepad++\lib\lib-tk

                              are strange, could it be that you unzipped only part of tk packages into
                              D:\Utilities\PortableApps\Notepad++\ ?

                              Can you check if you have the following files under D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk

                              Canvas.py
                              Dialog.py
                              FileDialog.py
                              FixTk.py
                              ScrolledText.py
                              SimpleDialog.py
                              Tix.py
                              tkColorChooser.py
                              tkCommonDialog.py
                              Tkconstants.py
                              Tkdnd.py
                              tkFileDialog.py
                              tkFont.py
                              Tkinter.py
                              tkMessageBox.py
                              tkSimpleDialog.py
                              ttk.py
                              turtle.py

                              You might see additional files with extension pyc - that’s ok.

                              If you do have the files, delete the D:\Utilities\PortableApps\Notepad++\lib directory.
                              If you don’t have the files under D:\Utilities\PortableApps\Notepad++\lib\lib-tk but
                              within D:\Utilities\PortableApps\Notepad++\lib then cut D:\Utilities\PortableApps\Notepad++\lib and paste it into
                              D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\

                              Cheers
                              Claudia

                              1 Reply Last reply Reply Quote 0
                              • P
                                patrickdrd
                                last edited by Jun 2, 2018, 3:34 PM

                                I can’t find either D:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\lib-tk or D:\Utilities\PortableApps\Notepad++\lib folder in explorer!

                                C 1 Reply Last reply Jun 2, 2018, 3:58 PM Reply Quote 0
                                • C
                                  Claudia Frank @patrickdrd
                                  last edited by Jun 2, 2018, 3:58 PM

                                  so how did you install Tcl/Tk libraries?

                                  Cheers
                                  Claudia

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    patrickdrd
                                    last edited by Jun 2, 2018, 4:01 PM

                                    I extracted the zip of course, the folder you say is in:
                                    d:\Utilities\PortableApps\Notepad++\plugins\PythonScript\lib\tcl\lib-tk\

                                    both in zip file and my explorer!

                                    1 Reply Last reply Reply Quote 0
                                    • P
                                      patrickdrd
                                      last edited by Jun 2, 2018, 4:22 PM

                                      I’ve just read guy038’s post and I’m more confused :S

                                      I downloaded the file again and now it’s Last modified: 02 Jun 2018 16:00 UTC
                                      and 69930 results,
                                      sorting with insensitive (ue and textfx) yields 69284 and the output should be similar,
                                      so I should be satisfied by that consensus I guess?

                                      C 1 Reply Last reply Jun 2, 2018, 4:34 PM Reply Quote 0
                                      • C
                                        Claudia Frank @patrickdrd
                                        last edited by Jun 2, 2018, 4:34 PM

                                        @patrickdrd

                                        the easylist file is adblocker file it will change consistently.

                                        Regarding the Tcl/Tk installation - you should have unzipped it into
                                        D:\Utilities\PortableApps\Notepad++\ directory.

                                        The zip contains the complete folder hierachy - as you see on the left side (archive tree)

                                        if you did this you normally got a message saying that the plugins folder already exists and
                                        if you want it to overwrite -> you should have answered this with yes, didn’t you?

                                        Cheers
                                        Claudia

                                        1 Reply Last reply Reply Quote 0
                                        • P
                                          patrickdrd
                                          last edited by Jun 2, 2018, 4:37 PM

                                          yep, that’s what I got https://imgur.com/a/xNQB5Gn

                                          C 1 Reply Last reply Jun 2, 2018, 5:36 PM Reply Quote 0
                                          52 out of 75
                                          • First post
                                            52/75
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors