Change menu text for "Remove Consecutive Duplicate Lines" ?
-
@Meta-Chuh said:
english_customizable.xml is just a pretty unmaintained (forgotten) copy of english.xml
Ah, I didn’t think of that. I thought english_customizable.xml was maintained. Sad that it is there but out of date then. I do see the appropriate entry in english.xml.
@PeterJones said
if you don’t want to lose your customizations, do a file compare, and copy over the missing lines from english.xml to english_customizable.xml.
Yep, this very thing is about to happen. ;)
Thanks, guys.
-
I got the built-in menu item changed, no probs…then:
I ended up making my own version of this that works “in-selection”. Here’s how:
- Select some text in your active editing tab
- Open the Replace window by pressing ctrl+h
- In the Find what box, put
(?-s)^(.+\R)\1+
- In the Replace with box, put
\1
- Tick the Regular expression radio button
- Tick the In-Selection checkbox
- Untick all other boxes [bah, Transparency, wtf cares? ;) ]
- Start macro recording (macro menu or toolbar)
- Press the Replace All button
- End macro recording (macro menu or toolbar)
- Save macro (macro menu) with nice name (see below)
For a name, I chose
Condense Dupe Non-Empty Adjacent Lines (IN SELECTION) to 1 Copy
. It’s a bit wordy, but it more accurately describes what is going on than the built-in command it is based on. ;) -
@Meta-Chuh said:
for the changes to take effect, please switch to any other language and back to english.
I’m always nervous doing this. What if I can’t read the new language enough to know how to switch back to English? Haha, just kidding. :)
But I had a real reason for replying: There is no need to actually switch to a different language. Just click the dropdown box (so that it drops down) and re-select the current language choice (which should be the top selection available and should be highlighted–in blue, for me). This is enough to get changed customizations put into use. One step instead of 2.
-
I used this technique:
In the Find what box, put (?-s)^(.+\R)\1+
In the Replace with box, put \1It works… BUT… it will remove ALL records after the last duplicate that it finds in our files. It’s because of the size of my files, I know that…because I tested it out with shorter files, and had no issue. Our files are 5.3 million lines long, and we process 3-6 of these per day. We cannot split the files, as they’re used for manufacturing and we don’t want to do multi-step copy/pastes.
For reference, the built in “Remove Consecutive Duplicates” does exactly the same thing with the removing ALL the records after the last duplicate…
Our file generators push out a file with 5.3 million records, and usually we only have 3-5 duplicates, so when we run this command, it may run into the last duplicate on line 500,000 and then delete everything afterwards.
Is there a way to allow larger file sizes to process successfully? TextFX does it perfectly, and I can use that with a 32 bit Notepad++, but I’d like to keep the 64 bit if possible.
Thx.
-
@Brian-Schweitzer said:
TextFX does it perfectly, and I can use that with a 32 bit Notepad++, but I’d like to keep the 64 bit if possible.
The two are not mutually exclusive. You could leave 64-bit as your installed Notepad++, but download a portable (zip-edition) of 32-bit Notepad++, unzipped in to some other directory (not in the
Program Files (x86)
hierarchy; I take a inspiration from the linux world, and put my outside-of-program-files programs inc:\usr\local\apps\____
). You could then use the 64-bit for normal, everyday usage. But when you want to do the removing of duplicates, you can just launch your 32bit instance instead. -
it will remove ALL records after the last duplicate that it finds
Sadly, there are some limitations where the regular expression engine is concerned…but you’ve already discovered this so I’m adding nothing new…
the built in “Remove Consecutive Duplicates” does exactly the same thing
This built-in command uses a regular-expression replacement operation as well (but rather a C++ coded one, not a user-supplied one), so the same outcome makes sense.
Is there a way to allow larger file sizes to process successfully?
If I were doing it, I’d turn to an external tool. Since something existing that does exactly this doesn’t pop to mind, I’d likely roll my own. I’d probably first try Python but if that wasn’t fast enough I’d turn to C. Maybe in your case, sticking with TextFX is the best option.
Sorry I don’t have a more optimistic response – maybe someone else?
-
did a quick test. Creating 6_000_100 lines take much longer than removing its duplicates.
def remove_duplicates(): unique_lines = set() duplicates = [] for line_num, line in enumerate(editor.getCharacterPointer().splitlines()): if line not in unique_lines: unique_lines.add(line) else: duplicates.append(line_num) for line_num in reversed(duplicates): editor.deleteLine(line_num)
Which took 5.8 seconds on my environment. :-)
Note, this script would remove ANY duplicate, not only the ones which are consecutive. -
Nice.
Which took 5.8 seconds on my environment
Nicer.
script would remove ANY duplicate, not only the ones which are consecutive.
Perhaps nicest.
:)
I was just generalizing in my earlier reply; I didn’t know a script was going to come out of it. :)
-
I was just generalizing in my earlier reply; I didn’t know a script was going to come out of it. :)
I had it already but never tested it with really big data and this thread just gave me the trigger to do the test :-)
-
Hello, @ekopalypse,
I’ve just tried out your script, about removing duplicates lines, with a local N++
v7.6.3
, 32 bits release and nothing occured :-((My Python script version is
1.3.0.0
and NO error message is displayed on the console !My Python interpreter seems OK, as other scripts just work as expected !
I used this simple sample text below :
abcde fgh abcde jk opq abcde fgh jk fgh abcde
I also, tried to sort it out first, to select a line, a block of lines or all text => No result --(( I also suppressed the line numbering, just in case…
Here is my debug info :
Notepad++ v7.6.3 (32-bit) Build time : Jan 27 2019 - 17:20:30 Path : D:\@@\763\notepad++.exe Admin mode : OFF Local Conf mode : ON OS : Windows XP (32-bit) Plugins : BetterMultiSelection.dll ComparePlugin.dll DSpellCheck.dll ElasticTabstops.dll mimeTools.dll NppConverter.dll NppExport.dll PythonScript.dll TabIndentSpaceAlign.dll
Note that the
v7.6.3
version is my last version, where I installed the PythonScript plugin and that my Win XP laptop contains numerous portable N++ versions, with various plugins in each ;-))So, am I missing something obvious ?!
BR
guy038
-
sorry, yes, I only posted the function itself - it must be called of course :-)
def remove_duplicates(): unique_lines = set() duplicates = [] for line_num, line in enumerate(editor.getCharacterPointer().splitlines()): if line not in unique_lines: unique_lines.add(line) else: duplicates.append(line_num) for line_num in reversed(duplicates): editor.deleteLine(line_num) remove_duplicates()
-
Somewhat equivalently, one could remove the
def remove_duplicates():
line (and now also theremove_duplicates()
line), and outdent the remaining lines, and it will also work fine. :) -
I just tried it out. With the call, it works for me on @guy038’s data.
The one thing I would suggest would be to wrap it in a
editor.beginUndoAction()
/editor.endUndoAction()
pair. If I’m doing a bulk delete, I want to be able to bulk undo, too. :-) -
depending on the how many duplicates it found, yes, it could become quite cumbersome
if one would try to undo it :-)def remove_duplicates(): unique_lines = set() duplicates = [] for line_num, line in enumerate(editor.getCharacterPointer().splitlines()): if line not in unique_lines: unique_lines.add(line) else: duplicates.append(line_num) for line_num in reversed(duplicates): editor.deleteLine(line_num) editor.beginUndoAction() remove_duplicates() editor.endUndoAction()
-
Hi, @Ekopalypse, @alan-kilborn, @peterjones and all,
Oh… my bad ! I’m feeling really silly, right now :-(( So elementary !
Now, as the native
Remove consecutive duplicate lines
N++ option does not take any selection in account, @ekopalypse, would it be easy enough to just consider the current main selection ? If so, it could be an interesting enhancement of this native N++ command ;-))Cheers,
guy038
-
yes but what should happen with the selection afterwards?
Should it simply disappear or should it select the remaining unique lines? -
To my mind, I don’t think that it’s necessary to keep the selection. Indeed, it just would be a mean to define the part of file to be processed, afterwards !
What’s your feeling about it ?
Cheers,
guy038
-
not sure, I guess providing a flag which can be set is good enough. In case one wants it
turn it on, if not, turn it off.If not someone else is jumping in then I will follow up tomorrow, as it is already past midnight but I know you know this as you are from France as I remember.
'til tomorrow.
-
Hi @guy038
as promised here a version with a selection option.def remove_duplicates(): unselect_after_removable = False unique_lines = set() duplicates = [] if editor.getSelectionEmpty(): for line_num, line in enumerate(editor.getCharacterPointer().splitlines()): if line not in unique_lines: unique_lines.add(line) else: duplicates.append(line_num) else: start, end = editor.getUserLineSelection() for line_num in range(start, end+1) : line = editor.getLine(line_num) if line not in unique_lines: unique_lines.add(line) else: duplicates.append(line_num) for line_num in reversed(duplicates): editor.deleteLine(line_num) if unselect_after_removable: editor.clearSelections()
-
Hi, @ekopalypse and All,
This time, I was warned ;-)) So, adding the part, below, to your script allowed me to appreciate your last version :
editor.beginUndoAction() remove_duplicates() editor.endUndoAction()
If no main selection is present, all file contents are processed. Else, the selection range, only, is concerned. Nice, indeed ;-))
I built a sample file containing, roughly,
497,000
lines, all different and I added a block of15
lines,128
times, each block being separated from the next one, with between800
to7500
lines, which, finally, gave me a file of almost500,000
lines. On my out-dated laptop ( Win XP, 1GB of RAM ! ), No problem. It took31s
about to be processed !BR
guy038
P.S. :
Yes, I know ! Why can’t he buy a recent laptop, with a 250 Gb SSD for Windows 10, 8 Gb of SDRAM, a 2 To SATA HD and 2 Go NVIDIA GeForce, as everybody ? Well, I think I’m about to reach the tipping point ;-))
Note that I did not emphasize these laptop’s characteristics as I’m not quite certain they are all accurate !!