Deleting numbers from LIST 1, that also appear in LIST 2
-
I scripted a solution to a similar problem HERE. It could probably be adapted easily to bookmark all lines with hits, providing a mechanism by which to delete said lines.
If you’re willing to use the Pythonscript plugin, it is a possibility.
@PeterJones Yea, regex solutions such as those of @guy038 are likely to run into the “regex engine” overflow issue when run on such large data.
-
@M-P said in Deleting numbers from LIST 1, that also appear in LIST 2:
It’s easier for me to write the program itself that will do this than to explain how to write it. even vbs will do.
-
@TroshinDV said in Deleting numbers from LIST 1, that also appear in LIST 2:
It’s easier for me to write the program itself that will do this than to explain how to write it. even vbs will do.
Please DON’T do that.
Let’s stick to “inside Notepad++” solutions here (which includes plugins). -
@Alan-Kilborn Ок.
Notepad ++ is still a text editor. But it does have the ability to process files. 2 options: python with pom. python script and javascript from WSH using the JH plugin.
here you just need to throw in a script that will process the text file in a certain way. -
thank you - i have downloaded the script and followed the instructions that were mentioned in the thread, but how can I put it to use now? or like michael scott would say; why don’t you explain it to me like im an 8 year old? ;) - sorry im new to this.
-
No worries, if you’re an intelligent 8 year old, then, well, we’ve got something to work with. We’ll guide you through.
But, what stage did you get to, can you run the script?
Do you see where to put your data from your lists before running the script? -
well, yes, im at the stage saying: Enter PREFIX-word-SUFFIX - should the LIST 2 be entered there?
-
Hmm, you’ve lost me as to where you are in the process as I can’t find “prefix” in anything I’ve pointed to…
-
List 2 should be entered in the “secondary view” of Notepad++, example:
-
Alright i have done the secondary view and downloaded the script that you published 19 hours ago. i tried it with only a couple thousand numbers and it took the program quite a while, but the program also marked all 9s that are in list 1, together with the numbers that are in list 2.
here i’ve done a video about that:
https://streamable.com/l81voi -
It wasn’t considered that numbers in the list could be part of larger numbers. Meaning that you probably have a line with only
9
on it in your list in the secondary view?A similar thing would happen if you had a line of
123
and in the main view you had1234
,61235
,7891230
, etc.We can work around that, but some other further experimentation has shown that this might not be a viable way to do what you need. I don’t think the script logic is flawed, but I don’t have time immediately to dig in deeper to see what is truly going on.
In light of that I suppose I’d advise you to seek an alternate solution if your need is short-term. I’d really not like to see a non-Notepad++ solution worked out in this forum, because it is off-topic, but perhaps this one time is OK (because I feel bad taking you down the N++ road only to abort) if someone has time and wants to present one of those.
-
solved a problem?
-
@Alan-Kilborn i have found the 9, thank you. And i believe that everything worked now. How can i delete the numbers from FILE 1 in the final stage?
-
I’m amazed you didn’t encounter any of the “weirdness” I did.
But I suppose that is a good thing.
I have a “highly scripted” setup, so perhaps that is what is causing the weirdness I see when I was experimenting?But, your next step should be to right-click where I have the red dot in the following (doesn’t have to be on line 6!, just in that same vertical “bookmark margin” area), and choose the indicated command:
-
thank you! hopefully there won’t be any problems when working with the millions of numbers but i would come back in that case. for now everything went super smoothly.
-
BTW, I would be very cautious before finishing up your task, that more things similar to the
9
problem discussed earlier, might have occurred. You should check on that; let me know, there’s an easy way to avoid that. -
I got thinking more about how this problem would be better solved.
If we allow ourselves to dream about features that Notepad++ maybe itself should have, here’s how I think I’d solve it:- Add a delimiter line at the bottom of the file from which lines are to be removed from (delimiter line contains data that doesn’t otherwise occur in either file)
- Paste all lines from the second file (containing the list of things to be removed from the first file), below the delimiter line in the first file
- Choose Delete all non-unique lines from the Edit menu’s Line Operations submenu <— special note: fantasy Notepad++ feature that does not currently exist!!
- Remove the delimiter line added earlier and any lines that remain after it
After that the first file would contain the desired data.
I’m fairly certain I’ve seen a “Delete all non-unique lines” (or maybe a “Keep only unique lines”) in a different editor, but I can’t for sure remember which one. Ultraedit? Hmm.
Anyway, we recently have had Delete Duplicate Lines functionality added, how about the addition of another new command?
@PeterJones Yep, don’t say it…I will…FEATURE REQUEST
-
Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,
As @peterJones said, a simple regex S/R could work for moderate files size. But with files of
2,000,000
lines about, this S/R would probably be totally wrong because of the regex engine’s overflow issue :-((But all is not lost ! The problem is that, in huge files, it may occur a very large gap between a line and its first duplicate one. This problem can, luckily, be eliminated by using these following steps :
-
First, number all the lines
-
Then, sort the lines in an ascending order
-
Delete all lines which exist in more than
1
copy, which should be easy as these lines are, now, consecutive -
Re-sort all the remaining unique lines to restore their initial list order
Below, I’ll try to explain these steps with a short text. However, I quite confident that this method should work with huge lists, too, minus the necessary time, of course, to perform sorts and regex search/replacements !
Let’s go :
- From the
license.txt
file, I extracted only the non-blank lines and shortened the others to, roughly, their first32
characters, ending with this42
-lines text :
Preamble The licenses for most software When we speak of free software, To protect your rights, we need For example, if you distribute We protect your rights with two Also, for each author's protect Finally, any free program is The precise terms and condition TERMS AND CONDITIONS FOR COPYING 0. This License applies to any Activities other copying, 1. You may copy and distribute You may charge a fee for the 2. You may modify your copy or a) You must cause the modified b) You must cause any work that c) If the modified program These requirements apply to the Thus, it is not the intent of In addition, mere aggregation 3. You may copy and distribute a) Accompany it with the b) Accompany it with a written c) Accompany it with the The source code for a work mean If distribution of executable 4. You may not copy, modify, 5. You are not required to 6. Each time you redistribute 7. If, as a consequence of a If any portion of this section It is not the purpose of this This section is intended to make 8. If the distribution and/or 9. The Free Software Foundation Each version is given a 10. If you wish to incorporate NO WARRANTY 11. BECAUSE THE PROGRAM IS 12. IN NO EVENT UNLESS REQUIRED END OF TERMS AND CONDITIONS
- Then I appended, to this list,
33
lines out of these42
lines ( So about80 %
of the total, i.e. the same proportion that your lists1,600,000 / 2,000,000
)
No separation line is needed. Thus, we now start with this text, where the added lines begin at line
43
:Preamble The licenses for most software When we speak of free software, To protect your rights, we need For example, if you distribute We protect your rights with two Also, for each author's protect Finally, any free program is The precise terms and condition TERMS AND CONDITIONS FOR COPYING 0. This License applies to any Activities other copying, 1. You may copy and distribute You may charge a fee for the 2. You may modify your copy or a) You must cause the modified b) You must cause any work that c) If the modified program These requirements apply to the Thus, it is not the intent of In addition, mere aggregation 3. You may copy and distribute a) Accompany it with the b) Accompany it with a written c) Accompany it with the The source code for a work mean If distribution of executable 4. You may not copy, modify, 5. You are not required to 6. Each time you redistribute 7. If, as a consequence of a If any portion of this section It is not the purpose of this This section is intended to make 8. If the distribution and/or 9. The Free Software Foundation Each version is given a 10. If you wish to incorporate NO WARRANTY 11. BECAUSE THE PROGRAM IS 12. IN NO EVENT UNLESS REQUIRED END OF TERMS AND CONDITIONS The licenses for most software When we speak of free software, To protect your rights, we need We protect your rights with two Also, for each author's protect Finally, any free program is The precise terms and condition TERMS AND CONDITIONS FOR COPYING Activities other copying, 1. You may copy and distribute You may charge a fee for the 2. You may modify your copy or b) You must cause any work that c) If the modified program These requirements apply to the Thus, it is not the intent of In addition, mere aggregation b) Accompany it with a written c) Accompany it with the The source code for a work mean If distribution of executable 4. You may not copy, modify, 5. You are not required to 6. Each time you redistribute If any portion of this section It is not the purpose of this This section is intended to make 8. If the distribution and/or 9. The Free Software Foundation 10. If you wish to incorporate NO WARRANTY 12. IN NO EVENT UNLESS REQUIRED END OF TERMS AND CONDITIONS
Note : So, you agree that, after all this stuff done, we should be left with a
9
unique lines text ! (42 - 33
)-
From the end of the first line, we add some space characters till, let’s say, the column
110
-
We open the column editor (
Alt + C
)-
We select the
Number to Insert
option -
Type in the value
1
in each zone -
Tick the
Leading zeros
box -
Verify that the
Dec
format is ticked -
Click on the
OK
button -
Delete the last virtual line
76
-
=> We get this text :
Preamble 01 The licenses for most software 02 When we speak of free software, 03 To protect your rights, we need 04 For example, if you distribute 05 We protect your rights with two 06 Also, for each author's protect 07 Finally, any free program is 08 The precise terms and condition 09 TERMS AND CONDITIONS FOR COPYING 10 0. This License applies to any 11 Activities other copying, 12 1. You may copy and distribute 13 You may charge a fee for the 14 2. You may modify your copy or 15 a) You must cause the modified 16 b) You must cause any work that 17 c) If the modified program 18 These requirements apply to the 19 Thus, it is not the intent of 20 In addition, mere aggregation 21 3. You may copy and distribute 22 a) Accompany it with the 23 b) Accompany it with a written 24 c) Accompany it with the 25 The source code for a work mean 26 If distribution of executable 27 4. You may not copy, modify, 28 5. You are not required to 29 6. Each time you redistribute 30 7. If, as a consequence of a 31 If any portion of this section 32 It is not the purpose of this 33 This section is intended to make 34 8. If the distribution and/or 35 9. The Free Software Foundation 36 Each version is given a 37 10. If you wish to incorporate 38 NO WARRANTY 39 11. BECAUSE THE PROGRAM IS 40 12. IN NO EVENT UNLESS REQUIRED 41 END OF TERMS AND CONDITIONS 42 The licenses for most software 43 When we speak of free software, 44 To protect your rights, we need 45 We protect your rights with two 46 Also, for each author's protect 47 Finally, any free program is 48 The precise terms and condition 49 TERMS AND CONDITIONS FOR COPYING 50 Activities other copying, 51 1. You may copy and distribute 52 You may charge a fee for the 53 2. You may modify your copy or 54 b) You must cause any work that 55 c) If the modified program 56 These requirements apply to the 57 Thus, it is not the intent of 58 In addition, mere aggregation 59 b) Accompany it with a written 60 c) Accompany it with the 61 The source code for a work mean 62 If distribution of executable 63 4. You may not copy, modify, 64 5. You are not required to 65 6. Each time you redistribute 66 If any portion of this section 67 It is not the purpose of this 68 This section is intended to make 69 8. If the distribution and/or 70 9. The Free Software Foundation 71 10. If you wish to incorporate 72 NO WARRANTY 73 12. IN NO EVENT UNLESS REQUIRED 74 END OF TERMS AND CONDITIONS 75
More in the next post !
guy038
-
-
Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,
Continuation of my previous post !
- Then, we use the
Edit > Line Operations > Sort Lines Lexicographically Ascending
menu option, without any selection
=> The example text becomes :
0. This License applies to any 11 1. You may copy and distribute 13 1. You may copy and distribute 52 10. If you wish to incorporate 38 10. If you wish to incorporate 72 11. BECAUSE THE PROGRAM IS 40 12. IN NO EVENT UNLESS REQUIRED 41 12. IN NO EVENT UNLESS REQUIRED 74 2. You may modify your copy or 15 2. You may modify your copy or 54 3. You may copy and distribute 22 4. You may not copy, modify, 28 4. You may not copy, modify, 64 5. You are not required to 29 5. You are not required to 65 6. Each time you redistribute 30 6. Each time you redistribute 66 7. If, as a consequence of a 31 8. If the distribution and/or 35 8. If the distribution and/or 70 9. The Free Software Foundation 36 9. The Free Software Foundation 71 Activities other copying, 12 Activities other copying, 51 Also, for each author's protect 07 Also, for each author's protect 47 END OF TERMS AND CONDITIONS 42 END OF TERMS AND CONDITIONS 75 Each version is given a 37 Finally, any free program is 08 Finally, any free program is 48 For example, if you distribute 05 If any portion of this section 32 If any portion of this section 67 If distribution of executable 27 If distribution of executable 63 In addition, mere aggregation 21 In addition, mere aggregation 59 It is not the purpose of this 33 It is not the purpose of this 68 NO WARRANTY 39 NO WARRANTY 73 Preamble 01 TERMS AND CONDITIONS FOR COPYING 10 TERMS AND CONDITIONS FOR COPYING 50 The licenses for most software 02 The licenses for most software 43 The precise terms and condition 09 The precise terms and condition 49 The source code for a work mean 26 The source code for a work mean 62 These requirements apply to the 19 These requirements apply to the 57 This section is intended to make 34 This section is intended to make 69 Thus, it is not the intent of 20 Thus, it is not the intent of 58 To protect your rights, we need 04 To protect your rights, we need 45 We protect your rights with two 06 We protect your rights with two 46 When we speak of free software, 03 When we speak of free software, 44 You may charge a fee for the 14 You may charge a fee for the 53 a) Accompany it with the 23 a) You must cause the modified 16 b) Accompany it with a written 24 b) Accompany it with a written 60 b) You must cause any work that 17 b) You must cause any work that 55 c) Accompany it with the 25 c) Accompany it with the 61 c) If the modified program 18 c) If the modified program 56
-
Now, we open the Replace dialog (
Ctrl + H
)-
SEARCH
^(.+)(\x20+\d+\R)(\1(?2))+
-
REPLACE
Leave EMPTY
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the
Replace All
button
-
=> You should get the status message
33 occurrences were replaced
, leading to this text :0. This License applies to any 11 11. BECAUSE THE PROGRAM IS 40 3. You may copy and distribute 22 7. If, as a consequence of a 31 Each version is given a 37 For example, if you distribute 05 Preamble 01 a) Accompany it with the 23 a) You must cause the modified 16
-
Although it would be possible, to use the
column mode
selection, to sort the lines by the number, at end of the lines, I’m not sure it would work properly with an huge list. So, I prefer to take a safer method and perform an other regex S/R :-
SEARCH
^(.+?)\x20{2,}(\d+)
-
REPLACE
\2\t\t\1
-
And we end with :
11 0. This License applies to any 40 11. BECAUSE THE PROGRAM IS 22 3. You may copy and distribute 31 7. If, as a consequence of a 37 Each version is given a 05 For example, if you distribute 01 Preamble 23 a) Accompany it with the 16 a) You must cause the modified
- Again, we use the
Edit > Line Operations > Sort Lines Lexicographically Ascending
menu option, to restore the initial file order, giving :
01 Preamble 05 For example, if you distribute 11 0. This License applies to any 16 a) You must cause the modified 22 3. You may copy and distribute 23 a) Accompany it with the 31 7. If, as a consequence of a 37 Each version is given a 40 11. BECAUSE THE PROGRAM IS
And, finally, we perform a last regex S/R, below, to get rid of the temporary numbering !
-
SEARCH
^\d+\t+
-
REPLACE
Leave EMPTY
=> Our expected text, with the
9
unique lines :Preamble For example, if you distribute 0. This License applies to any a) You must cause the modified 3. You may copy and distribute a) Accompany it with the 7. If, as a consequence of a Each version is given a 11. BECAUSE THE PROGRAM IS
Best Regards,
guy038
- Then, we use the
-
Perhaps that regex-intensive solution becomes the defacto standard way of solving this problem.
But, it might be nice to see in Notepad++ itself, a command to “Remove lines from primary view tab that occur in secondary view tab”, or some such less-wordy verbage.