Community
    • Login

    Change menu text for "Remove Consecutive Duplicate Lines" ?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    28 Posts 7 Posters 4.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by Alan Kilborn

      I just joined the ranks of 7.6.6 after being a long holdout from the Program Manager days… :)

      In this new version (or more accurately, even in a slightly earlier version), the Edit > Line Operations menu has a new entry: “Remove Consecutive Duplicate Lines”

      This may prove useful, but the menu text leaves me wondering, is this a “selected lines” thing or an “entire file” thing. Experimentation shows that it is “entire file”. But I will never remember this.

      So what I’d like to do is modify english_customizable.xml and make the menu item say “Remove Consecutive Duplicate Lines (Entire File)” or some such (as I’ve done for other menu entries).

      But I notice that english_customizable.xml doesn’t have an entry for this for me to edit. Is it still possible in some way to customize this menu text?

      Meta ChuhM 1 Reply Last reply Reply Quote 2
      • Meta ChuhM
        Meta Chuh moderator @Alan Kilborn
        last edited by Meta Chuh

        welcome to the sublime++ community, @Alan-Kilborn ;-)

        english_customizable.xml is just a pretty unmaintained (forgotten) copy of english.xml.
        you can duplicate your english.xml and rename it to english_customizable.xml, or you can edit english.xml directly if you like.

        for the changes to take effect, please switch to any other language and back to english.

        many thanks and best regards.

        Alan KilbornA 2 Replies Last reply Reply Quote 2
        • andrecool-68A
          andrecool-68
          last edited by andrecool-68

          The basis for the translation is to take English.xml and Chinese.xml localization files.
          In them the very first changes and additions.

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones
            last edited by

            @Meta-Chuh said:

            you can duplicate your english.xml and rename it

            or, if you don’t want to lose your customizations, do a file compare, and copy over the missing lines from english.xml to english_customizable.xml.

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones
              last edited by

              If I had supersecret_petercj_language.xml and wanted to add it to the list of localization languages, how would I do that? I tried creating one and just adding it to the localization\ subfolder for my portable, but when i restarted notepad++, my new language wasn’t in the list of localizations to choose from.

              I would say, based on the outdated NpWiki++ Localisation entry, that since you can do a command-line selection of localization using -L, the list of available languages must be embedded in the executable itself. However, based on the first sentence in that page,

              You can teach Notepad++ to speak your mother tongue: just download english.xml and translate it into your language

              it implies that you just have to add a new translation file to the localization subdirectory to add a new language. But that didn’t work for me.

              1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Meta Chuh
                last edited by

                @Meta-Chuh said:

                english_customizable.xml is just a pretty unmaintained (forgotten) copy of english.xml

                Ah, I didn’t think of that. I thought english_customizable.xml was maintained. Sad that it is there but out of date then. I do see the appropriate entry in english.xml.

                @PeterJones said

                if you don’t want to lose your customizations, do a file compare, and copy over the missing lines from english.xml to english_customizable.xml.

                Yep, this very thing is about to happen. ;)

                Thanks, guys.

                1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn
                  last edited by

                  I got the built-in menu item changed, no probs…then:

                  I ended up making my own version of this that works “in-selection”. Here’s how:

                  • Select some text in your active editing tab
                  • Open the Replace window by pressing ctrl+h
                  • In the Find what box, put (?-s)^(.+\R)\1+
                  • In the Replace with box, put \1
                  • Tick the Regular expression radio button
                  • Tick the In-Selection checkbox
                  • Untick all other boxes [bah, Transparency, wtf cares? ;) ]
                  • Start macro recording (macro menu or toolbar)
                  • Press the Replace All button
                  • End macro recording (macro menu or toolbar)
                  • Save macro (macro menu) with nice name (see below)

                  For a name, I chose Condense Dupe Non-Empty Adjacent Lines (IN SELECTION) to 1 Copy. It’s a bit wordy, but it more accurately describes what is going on than the built-in command it is based on. ;)

                  1 Reply Last reply Reply Quote 1
                  • Alan KilbornA
                    Alan Kilborn @Meta Chuh
                    last edited by Alan Kilborn

                    @Meta-Chuh said:

                    for the changes to take effect, please switch to any other language and back to english.

                    I’m always nervous doing this. What if I can’t read the new language enough to know how to switch back to English? Haha, just kidding. :)

                    But I had a real reason for replying: There is no need to actually switch to a different language. Just click the dropdown box (so that it drops down) and re-select the current language choice (which should be the top selection available and should be highlighted–in blue, for me). This is enough to get changed customizations put into use. One step instead of 2.

                    1 Reply Last reply Reply Quote 2
                    • Brian SchweitzerB
                      Brian Schweitzer
                      last edited by

                      I used this technique:

                      In the Find what box, put (?-s)^(.+\R)\1+
                      In the Replace with box, put \1

                      It works… BUT… it will remove ALL records after the last duplicate that it finds in our files. It’s because of the size of my files, I know that…because I tested it out with shorter files, and had no issue. Our files are 5.3 million lines long, and we process 3-6 of these per day. We cannot split the files, as they’re used for manufacturing and we don’t want to do multi-step copy/pastes.

                      For reference, the built in “Remove Consecutive Duplicates” does exactly the same thing with the removing ALL the records after the last duplicate…

                      Our file generators push out a file with 5.3 million records, and usually we only have 3-5 duplicates, so when we run this command, it may run into the last duplicate on line 500,000 and then delete everything afterwards.

                      Is there a way to allow larger file sizes to process successfully? TextFX does it perfectly, and I can use that with a 32 bit Notepad++, but I’d like to keep the 64 bit if possible.

                      Thx.

                      Alan KilbornA 1 Reply Last reply Reply Quote 1
                      • PeterJonesP
                        PeterJones
                        last edited by

                        @Brian-Schweitzer said:

                        TextFX does it perfectly, and I can use that with a 32 bit Notepad++, but I’d like to keep the 64 bit if possible.

                        The two are not mutually exclusive. You could leave 64-bit as your installed Notepad++, but download a portable (zip-edition) of 32-bit Notepad++, unzipped in to some other directory (not in the Program Files (x86) hierarchy; I take a inspiration from the linux world, and put my outside-of-program-files programs in c:\usr\local\apps\____). You could then use the 64-bit for normal, everyday usage. But when you want to do the removing of duplicates, you can just launch your 32bit instance instead.

                        1 Reply Last reply Reply Quote 2
                        • Alan KilbornA
                          Alan Kilborn @Brian Schweitzer
                          last edited by

                          @Brian-Schweitzer

                          it will remove ALL records after the last duplicate that it finds

                          Sadly, there are some limitations where the regular expression engine is concerned…but you’ve already discovered this so I’m adding nothing new…

                          the built in “Remove Consecutive Duplicates” does exactly the same thing

                          This built-in command uses a regular-expression replacement operation as well (but rather a C++ coded one, not a user-supplied one), so the same outcome makes sense.

                          Is there a way to allow larger file sizes to process successfully?

                          If I were doing it, I’d turn to an external tool. Since something existing that does exactly this doesn’t pop to mind, I’d likely roll my own. I’d probably first try Python but if that wasn’t fast enough I’d turn to C. Maybe in your case, sticking with TextFX is the best option.

                          Sorry I don’t have a more optimistic response – maybe someone else?

                          EkopalypseE 1 Reply Last reply Reply Quote 2
                          • EkopalypseE
                            Ekopalypse @Alan Kilborn
                            last edited by

                            @Alan-Kilborn

                            did a quick test. Creating 6_000_100 lines take much longer than removing its duplicates.

                            def remove_duplicates():
                                unique_lines = set()
                                duplicates = []
                                for line_num, line in enumerate(editor.getCharacterPointer().splitlines()):
                                    if line not in unique_lines:
                                        unique_lines.add(line)
                                    else:
                                        duplicates.append(line_num)
                            
                                for line_num in reversed(duplicates):
                                    editor.deleteLine(line_num)
                            

                            Which took 5.8 seconds on my environment. :-)
                            Note, this script would remove ANY duplicate, not only the ones which are consecutive.

                            Alan KilbornA 1 Reply Last reply Reply Quote 4
                            • Alan KilbornA
                              Alan Kilborn @Ekopalypse
                              last edited by

                              @Ekopalypse

                              Nice.

                              Which took 5.8 seconds on my environment

                              Nicer.

                              script would remove ANY duplicate, not only the ones which are consecutive.

                              Perhaps nicest.

                              :)

                              I was just generalizing in my earlier reply; I didn’t know a script was going to come out of it. :)

                              EkopalypseE 1 Reply Last reply Reply Quote 3
                              • EkopalypseE
                                Ekopalypse @Alan Kilborn
                                last edited by

                                @Alan-Kilborn

                                I was just generalizing in my earlier reply; I didn’t know a script was going to come out of it. :)

                                I had it already but never tested it with really big data and this thread just gave me the trigger to do the test :-)

                                1 Reply Last reply Reply Quote 2
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  Hello, @ekopalypse,

                                  I’ve just tried out your script, about removing duplicates lines, with a local N++ v7.6.3, 32 bits release and nothing occured :-((

                                  My Python script version is 1.3.0.0 and NO error message is displayed on the console !

                                  My Python interpreter seems OK, as other scripts just work as expected !

                                  I used this simple sample text below :

                                  abcde
                                  fgh
                                  abcde
                                  jk
                                  opq
                                  abcde
                                  fgh
                                  jk
                                  fgh
                                  abcde
                                  

                                  I also, tried to sort it out first, to select a line, a block of lines or all text => No result --(( I also suppressed the line numbering, just in case…

                                  Here is my debug info :

                                  Notepad++ v7.6.3   (32-bit)
                                  Build time : Jan 27 2019 - 17:20:30
                                  Path : D:\@@\763\notepad++.exe
                                  Admin mode : OFF
                                  Local Conf mode : ON
                                  OS : Windows XP (32-bit)
                                  Plugins : BetterMultiSelection.dll ComparePlugin.dll DSpellCheck.dll ElasticTabstops.dll mimeTools.dll NppConverter.dll NppExport.dll PythonScript.dll TabIndentSpaceAlign.dll 
                                  

                                  Note that the v7.6.3 version is my last version, where I installed the PythonScript plugin and that my Win XP laptop contains numerous portable N++ versions, with various plugins in each ;-))

                                  So, am I missing something obvious ?!

                                  BR

                                  guy038

                                  EkopalypseE 1 Reply Last reply Reply Quote 2
                                  • EkopalypseE
                                    Ekopalypse @guy038
                                    last edited by

                                    @guy038

                                    sorry, yes, I only posted the function itself - it must be called of course :-)

                                    def remove_duplicates():
                                        unique_lines = set()
                                        duplicates = []
                                        for line_num, line in enumerate(editor.getCharacterPointer().splitlines()):
                                            if line not in unique_lines:
                                                unique_lines.add(line)
                                            else:
                                                duplicates.append(line_num)
                                    
                                        for line_num in reversed(duplicates):
                                            editor.deleteLine(line_num)
                                    
                                    remove_duplicates()
                                    
                                    1 Reply Last reply Reply Quote 4
                                    • Alan KilbornA
                                      Alan Kilborn
                                      last edited by

                                      Somewhat equivalently, one could remove the def remove_duplicates(): line (and now also the remove_duplicates() line), and outdent the remaining lines, and it will also work fine. :)

                                      1 Reply Last reply Reply Quote 3
                                      • PeterJonesP
                                        PeterJones
                                        last edited by

                                        @Ekopalypse ,

                                        I just tried it out. With the call, it works for me on @guy038’s data.

                                        The one thing I would suggest would be to wrap it in a editor.beginUndoAction() / editor.endUndoAction() pair. If I’m doing a bulk delete, I want to be able to bulk undo, too. :-)

                                        EkopalypseE 1 Reply Last reply Reply Quote 3
                                        • EkopalypseE
                                          Ekopalypse @PeterJones
                                          last edited by

                                          @PeterJones

                                          depending on the how many duplicates it found, yes, it could become quite cumbersome
                                          if one would try to undo it :-)

                                          def remove_duplicates():
                                              unique_lines = set()
                                              duplicates = []
                                              for line_num, line in enumerate(editor.getCharacterPointer().splitlines()):
                                                  if line not in unique_lines:
                                                      unique_lines.add(line)
                                                  else:
                                                      duplicates.append(line_num)
                                          
                                              for line_num in reversed(duplicates):
                                                  editor.deleteLine(line_num)
                                          
                                          editor.beginUndoAction() 
                                          remove_duplicates()
                                          editor.endUndoAction()
                                          1 Reply Last reply Reply Quote 3
                                          • guy038G
                                            guy038
                                            last edited by

                                            Hi, @Ekopalypse, @alan-kilborn, @peterjones and all,

                                            Oh… my bad ! I’m feeling really silly, right now :-(( So elementary !


                                            Now, as the native Remove consecutive duplicate lines N++ option does not take any selection in account, @ekopalypse, would it be easy enough to just consider the current main selection ? If so, it could be an interesting enhancement of this native N++ command ;-))

                                            Cheers,

                                            guy038

                                            EkopalypseE 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors