Deleting numbers from LIST 1, that also appear in LIST 2
-
Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,
Continuation of my previous post !
- Then, we use the
Edit > Line Operations > Sort Lines Lexicographically Ascending
menu option, without any selection
=> The example text becomes :
0. This License applies to any 11 1. You may copy and distribute 13 1. You may copy and distribute 52 10. If you wish to incorporate 38 10. If you wish to incorporate 72 11. BECAUSE THE PROGRAM IS 40 12. IN NO EVENT UNLESS REQUIRED 41 12. IN NO EVENT UNLESS REQUIRED 74 2. You may modify your copy or 15 2. You may modify your copy or 54 3. You may copy and distribute 22 4. You may not copy, modify, 28 4. You may not copy, modify, 64 5. You are not required to 29 5. You are not required to 65 6. Each time you redistribute 30 6. Each time you redistribute 66 7. If, as a consequence of a 31 8. If the distribution and/or 35 8. If the distribution and/or 70 9. The Free Software Foundation 36 9. The Free Software Foundation 71 Activities other copying, 12 Activities other copying, 51 Also, for each author's protect 07 Also, for each author's protect 47 END OF TERMS AND CONDITIONS 42 END OF TERMS AND CONDITIONS 75 Each version is given a 37 Finally, any free program is 08 Finally, any free program is 48 For example, if you distribute 05 If any portion of this section 32 If any portion of this section 67 If distribution of executable 27 If distribution of executable 63 In addition, mere aggregation 21 In addition, mere aggregation 59 It is not the purpose of this 33 It is not the purpose of this 68 NO WARRANTY 39 NO WARRANTY 73 Preamble 01 TERMS AND CONDITIONS FOR COPYING 10 TERMS AND CONDITIONS FOR COPYING 50 The licenses for most software 02 The licenses for most software 43 The precise terms and condition 09 The precise terms and condition 49 The source code for a work mean 26 The source code for a work mean 62 These requirements apply to the 19 These requirements apply to the 57 This section is intended to make 34 This section is intended to make 69 Thus, it is not the intent of 20 Thus, it is not the intent of 58 To protect your rights, we need 04 To protect your rights, we need 45 We protect your rights with two 06 We protect your rights with two 46 When we speak of free software, 03 When we speak of free software, 44 You may charge a fee for the 14 You may charge a fee for the 53 a) Accompany it with the 23 a) You must cause the modified 16 b) Accompany it with a written 24 b) Accompany it with a written 60 b) You must cause any work that 17 b) You must cause any work that 55 c) Accompany it with the 25 c) Accompany it with the 61 c) If the modified program 18 c) If the modified program 56
-
Now, we open the Replace dialog (
Ctrl + H
)-
SEARCH
^(.+)(\x20+\d+\R)(\1(?2))+
-
REPLACE
Leave EMPTY
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the
Replace All
button
-
=> You should get the status message
33 occurrences were replaced
, leading to this text :0. This License applies to any 11 11. BECAUSE THE PROGRAM IS 40 3. You may copy and distribute 22 7. If, as a consequence of a 31 Each version is given a 37 For example, if you distribute 05 Preamble 01 a) Accompany it with the 23 a) You must cause the modified 16
-
Although it would be possible, to use the
column mode
selection, to sort the lines by the number, at end of the lines, I’m not sure it would work properly with an huge list. So, I prefer to take a safer method and perform an other regex S/R :-
SEARCH
^(.+?)\x20{2,}(\d+)
-
REPLACE
\2\t\t\1
-
And we end with :
11 0. This License applies to any 40 11. BECAUSE THE PROGRAM IS 22 3. You may copy and distribute 31 7. If, as a consequence of a 37 Each version is given a 05 For example, if you distribute 01 Preamble 23 a) Accompany it with the 16 a) You must cause the modified
- Again, we use the
Edit > Line Operations > Sort Lines Lexicographically Ascending
menu option, to restore the initial file order, giving :
01 Preamble 05 For example, if you distribute 11 0. This License applies to any 16 a) You must cause the modified 22 3. You may copy and distribute 23 a) Accompany it with the 31 7. If, as a consequence of a 37 Each version is given a 40 11. BECAUSE THE PROGRAM IS
And, finally, we perform a last regex S/R, below, to get rid of the temporary numbering !
-
SEARCH
^\d+\t+
-
REPLACE
Leave EMPTY
=> Our expected text, with the
9
unique lines :Preamble For example, if you distribute 0. This License applies to any a) You must cause the modified 3. You may copy and distribute a) Accompany it with the 7. If, as a consequence of a Each version is given a 11. BECAUSE THE PROGRAM IS
Best Regards,
guy038
- Then, we use the
-
Perhaps that regex-intensive solution becomes the defacto standard way of solving this problem.
But, it might be nice to see in Notepad++ itself, a command to “Remove lines from primary view tab that occur in secondary view tab”, or some such less-wordy verbage.
-
@Alan-Kilborn said in Deleting numbers from LIST 1, that also appear in LIST 2:
Perhaps that regex-intensive solution becomes the defacto standard way of solving this problem.
Will be for small volumes.
The volume of data dictates its own terms.
PS. It is better to wrap all actions in a macro. -
@TroshinDV said in Deleting numbers from LIST 1, that also appear in LIST 2:
Will be for small volumes.
The volume of data dictates its own terms.That doesn’t make sense as the solution crafted by @guy038 was specifically considering large “volumes”.
PS. It is better to wrap all actions in a macro.
I don’t believe @guy038 's solution can be made into a macro; can you explain how you think it can be?
-
well, there were definitely some problems. LIST 1 has got 7.4mil number whilst LIST 2 got 1.2mil numbers. I believe that this is definitely too much to deal with. I’ll try the method of @guy038 now even though im not sure if i understood it all. Let’s try it at least
-
Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,
Oh…! Indeed, dealing with two files of
7,400,000
and1,200,000
lines is not an easy task ! So you will have to work with a8,600,000
lines file : good luck !Do not hesitate to ask me for more information if you encounter difficulties in implementing my method !
-
First, I would advice you to repeat my own tiny example, first, to get its general idea
-
Regarding your real example, I would say that :
-
The N++ sort feature is very quick, in all cases
-
I suppose that the numbering operation, with the column editor, should not be very long, too !
-
May be, the first of the three regex S/R will probably take some time. Just be patient : it should work in the end !
-
Best Regards
guy038
-
-