• Login
Community
  • Login

Deleting numbers from LIST 1, that also appear in LIST 2

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
26 Posts 5 Posters 2.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    M P
    last edited by Dec 9, 2020, 3:23 PM

    thank you! hopefully there won’t be any problems when working with the millions of numbers but i would come back in that case. for now everything went super smoothly.

    1 Reply Last reply Reply Quote 0
    • A
      Alan Kilborn @Alan Kilborn
      last edited by Dec 9, 2020, 3:23 PM

      @M-P

      BTW, I would be very cautious before finishing up your task, that more things similar to the 9 problem discussed earlier, might have occurred. You should check on that; let me know, there’s an easy way to avoid that.

      1 Reply Last reply Reply Quote 2
      • A
        Alan Kilborn
        last edited by Alan Kilborn Dec 9, 2020, 8:26 PM Dec 9, 2020, 8:24 PM

        I got thinking more about how this problem would be better solved.
        If we allow ourselves to dream about features that Notepad++ maybe itself should have, here’s how I think I’d solve it:

        1. Add a delimiter line at the bottom of the file from which lines are to be removed from (delimiter line contains data that doesn’t otherwise occur in either file)
        2. Paste all lines from the second file (containing the list of things to be removed from the first file), below the delimiter line in the first file
        3. Choose Delete all non-unique lines from the Edit menu’s Line Operations submenu <— special note: fantasy Notepad++ feature that does not currently exist!!
        4. Remove the delimiter line added earlier and any lines that remain after it

        After that the first file would contain the desired data.

        I’m fairly certain I’ve seen a “Delete all non-unique lines” (or maybe a “Keep only unique lines”) in a different editor, but I can’t for sure remember which one. Ultraedit? Hmm.

        Anyway, we recently have had Delete Duplicate Lines functionality added, how about the addition of another new command?

        @PeterJones Yep, don’t say it…I will…FEATURE REQUEST

        1 Reply Last reply Reply Quote 1
        • G
          guy038
          last edited by guy038 Dec 9, 2020, 11:12 PM Dec 9, 2020, 11:10 PM

          Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,

          As @peterJones said, a simple regex S/R could work for moderate files size. But with files of 2,000,000 lines about, this S/R would probably be totally wrong because of the regex engine’s overflow issue :-((

          But all is not lost ! The problem is that, in huge files, it may occur a very large gap between a line and its first duplicate one. This problem can, luckily, be eliminated by using these following steps :

          • First, number all the lines

          • Then, sort the lines in an ascending order

          • Delete all lines which exist in more than 1 copy, which should be easy as these lines are, now, consecutive

          • Re-sort all the remaining unique lines to restore their initial list order

          Below, I’ll try to explain these steps with a short text. However, I quite confident that this method should work with huge lists, too, minus the necessary time, of course, to perform sorts and regex search/replacements !


          Let’s go :

          • From the license.txt file, I extracted only the non-blank lines and shortened the others to, roughly, their first 32 characters, ending with this 42-lines text :
          Preamble
          The licenses for most software
          When we speak of free software,
          To protect your rights, we need
          For example, if you distribute
          We protect your rights with two
          Also, for each author's protect
          Finally, any free program is
          The precise terms and condition
          TERMS AND CONDITIONS FOR COPYING
          0. This License applies to any
          Activities other copying,
          1. You may copy and distribute
          You may charge a fee for the
          2. You may modify your copy or
          a) You must cause the modified
          b) You must cause any work that
          c) If the modified program
          These requirements apply to the
          Thus, it is not the intent of
          In addition, mere aggregation
          3. You may copy and distribute
          a) Accompany it with the
          b) Accompany it with a written
          c) Accompany it with the
          The source code for a work mean
          If distribution of executable
          4. You may not copy, modify,
          5. You are not required to
          6. Each time you redistribute
          7. If, as a consequence of a
          If any portion of this section
          It is not the purpose of this
          This section is intended to make
          8. If the distribution and/or
          9. The Free Software Foundation
          Each version is given a
          10. If you wish to incorporate
          NO WARRANTY
          11. BECAUSE THE PROGRAM IS
          12. IN NO EVENT UNLESS REQUIRED
          END OF TERMS AND CONDITIONS
          
          • Then I appended, to this list, 33 lines out of these 42 lines ( So about 80 % of the total, i.e. the same proportion that your lists 1,600,000 / 2,000,000 )

          No separation line is needed. Thus, we now start with this text, where the added lines begin at line 43 :

          Preamble
          The licenses for most software
          When we speak of free software,
          To protect your rights, we need
          For example, if you distribute
          We protect your rights with two
          Also, for each author's protect
          Finally, any free program is
          The precise terms and condition
          TERMS AND CONDITIONS FOR COPYING
          0. This License applies to any
          Activities other copying,
          1. You may copy and distribute
          You may charge a fee for the
          2. You may modify your copy or
          a) You must cause the modified
          b) You must cause any work that
          c) If the modified program
          These requirements apply to the
          Thus, it is not the intent of
          In addition, mere aggregation
          3. You may copy and distribute
          a) Accompany it with the
          b) Accompany it with a written
          c) Accompany it with the
          The source code for a work mean
          If distribution of executable
          4. You may not copy, modify,
          5. You are not required to
          6. Each time you redistribute
          7. If, as a consequence of a
          If any portion of this section
          It is not the purpose of this
          This section is intended to make
          8. If the distribution and/or
          9. The Free Software Foundation
          Each version is given a
          10. If you wish to incorporate
          NO WARRANTY
          11. BECAUSE THE PROGRAM IS
          12. IN NO EVENT UNLESS REQUIRED
          END OF TERMS AND CONDITIONS
          The licenses for most software
          When we speak of free software,
          To protect your rights, we need
          We protect your rights with two
          Also, for each author's protect
          Finally, any free program is
          The precise terms and condition
          TERMS AND CONDITIONS FOR COPYING
          Activities other copying,
          1. You may copy and distribute
          You may charge a fee for the
          2. You may modify your copy or
          b) You must cause any work that
          c) If the modified program
          These requirements apply to the
          Thus, it is not the intent of
          In addition, mere aggregation
          b) Accompany it with a written
          c) Accompany it with the
          The source code for a work mean
          If distribution of executable
          4. You may not copy, modify,
          5. You are not required to
          6. Each time you redistribute
          If any portion of this section
          It is not the purpose of this
          This section is intended to make
          8. If the distribution and/or
          9. The Free Software Foundation
          10. If you wish to incorporate
          NO WARRANTY
          12. IN NO EVENT UNLESS REQUIRED
          END OF TERMS AND CONDITIONS
          

          Note : So, you agree that, after all this stuff done, we should be left with a 9 unique lines text ! ( 42 - 33 )

          • From the end of the first line, we add some space characters till, let’s say, the column 110

          • We open the column editor ( Alt + C )

            • We select the Number to Insert option

            • Type in the value 1 in each zone

            • Tick the Leading zeros box

            • Verify that the Dec format is ticked

            • Click on the OK button

            • Delete the last virtual line 76

          => We get this text :

          Preamble                                                                                                     01
          The licenses for most software                                                                               02
          When we speak of free software,                                                                              03
          To protect your rights, we need                                                                              04
          For example, if you distribute                                                                               05
          We protect your rights with two                                                                              06
          Also, for each author's protect                                                                              07
          Finally, any free program is                                                                                 08
          The precise terms and condition                                                                              09
          TERMS AND CONDITIONS FOR COPYING                                                                             10
          0. This License applies to any                                                                               11
          Activities other copying,                                                                                    12
          1. You may copy and distribute                                                                               13
          You may charge a fee for the                                                                                 14
          2. You may modify your copy or                                                                               15
          a) You must cause the modified                                                                               16
          b) You must cause any work that                                                                              17
          c) If the modified program                                                                                   18
          These requirements apply to the                                                                              19
          Thus, it is not the intent of                                                                                20
          In addition, mere aggregation                                                                                21
          3. You may copy and distribute                                                                               22
          a) Accompany it with the                                                                                     23
          b) Accompany it with a written                                                                               24
          c) Accompany it with the                                                                                     25
          The source code for a work mean                                                                              26
          If distribution of executable                                                                                27
          4. You may not copy, modify,                                                                                 28
          5. You are not required to                                                                                   29
          6. Each time you redistribute                                                                                30
          7. If, as a consequence of a                                                                                 31
          If any portion of this section                                                                               32
          It is not the purpose of this                                                                                33
          This section is intended to make                                                                             34
          8. If the distribution and/or                                                                                35
          9. The Free Software Foundation                                                                              36
          Each version is given a                                                                                      37
          10. If you wish to incorporate                                                                               38
          NO WARRANTY                                                                                                  39
          11. BECAUSE THE PROGRAM IS                                                                                   40
          12. IN NO EVENT UNLESS REQUIRED                                                                              41
          END OF TERMS AND CONDITIONS                                                                                  42
          The licenses for most software                                                                               43
          When we speak of free software,                                                                              44
          To protect your rights, we need                                                                              45
          We protect your rights with two                                                                              46
          Also, for each author's protect                                                                              47
          Finally, any free program is                                                                                 48
          The precise terms and condition                                                                              49
          TERMS AND CONDITIONS FOR COPYING                                                                             50
          Activities other copying,                                                                                    51
          1. You may copy and distribute                                                                               52
          You may charge a fee for the                                                                                 53
          2. You may modify your copy or                                                                               54
          b) You must cause any work that                                                                              55
          c) If the modified program                                                                                   56
          These requirements apply to the                                                                              57
          Thus, it is not the intent of                                                                                58
          In addition, mere aggregation                                                                                59
          b) Accompany it with a written                                                                               60
          c) Accompany it with the                                                                                     61
          The source code for a work mean                                                                              62
          If distribution of executable                                                                                63
          4. You may not copy, modify,                                                                                 64
          5. You are not required to                                                                                   65
          6. Each time you redistribute                                                                                66
          If any portion of this section                                                                               67
          It is not the purpose of this                                                                                68
          This section is intended to make                                                                             69
          8. If the distribution and/or                                                                                70
          9. The Free Software Foundation                                                                              71
          10. If you wish to incorporate                                                                               72
          NO WARRANTY                                                                                                  73
          12. IN NO EVENT UNLESS REQUIRED                                                                              74
          END OF TERMS AND CONDITIONS                                                                                  75
          

          More in the next post !

          guy038

          1 Reply Last reply Reply Quote 2
          • G
            guy038
            last edited by Dec 9, 2020, 11:10 PM

            Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,

            Continuation of my previous post !

            • Then, we use the Edit > Line Operations > Sort Lines Lexicographically Ascending menu option, without any selection

            => The example text becomes :

            0. This License applies to any                                                                               11
            1. You may copy and distribute                                                                               13
            1. You may copy and distribute                                                                               52
            10. If you wish to incorporate                                                                               38
            10. If you wish to incorporate                                                                               72
            11. BECAUSE THE PROGRAM IS                                                                                   40
            12. IN NO EVENT UNLESS REQUIRED                                                                              41
            12. IN NO EVENT UNLESS REQUIRED                                                                              74
            2. You may modify your copy or                                                                               15
            2. You may modify your copy or                                                                               54
            3. You may copy and distribute                                                                               22
            4. You may not copy, modify,                                                                                 28
            4. You may not copy, modify,                                                                                 64
            5. You are not required to                                                                                   29
            5. You are not required to                                                                                   65
            6. Each time you redistribute                                                                                30
            6. Each time you redistribute                                                                                66
            7. If, as a consequence of a                                                                                 31
            8. If the distribution and/or                                                                                35
            8. If the distribution and/or                                                                                70
            9. The Free Software Foundation                                                                              36
            9. The Free Software Foundation                                                                              71
            Activities other copying,                                                                                    12
            Activities other copying,                                                                                    51
            Also, for each author's protect                                                                              07
            Also, for each author's protect                                                                              47
            END OF TERMS AND CONDITIONS                                                                                  42
            END OF TERMS AND CONDITIONS                                                                                  75
            Each version is given a                                                                                      37
            Finally, any free program is                                                                                 08
            Finally, any free program is                                                                                 48
            For example, if you distribute                                                                               05
            If any portion of this section                                                                               32
            If any portion of this section                                                                               67
            If distribution of executable                                                                                27
            If distribution of executable                                                                                63
            In addition, mere aggregation                                                                                21
            In addition, mere aggregation                                                                                59
            It is not the purpose of this                                                                                33
            It is not the purpose of this                                                                                68
            NO WARRANTY                                                                                                  39
            NO WARRANTY                                                                                                  73
            Preamble                                                                                                     01
            TERMS AND CONDITIONS FOR COPYING                                                                             10
            TERMS AND CONDITIONS FOR COPYING                                                                             50
            The licenses for most software                                                                               02
            The licenses for most software                                                                               43
            The precise terms and condition                                                                              09
            The precise terms and condition                                                                              49
            The source code for a work mean                                                                              26
            The source code for a work mean                                                                              62
            These requirements apply to the                                                                              19
            These requirements apply to the                                                                              57
            This section is intended to make                                                                             34
            This section is intended to make                                                                             69
            Thus, it is not the intent of                                                                                20
            Thus, it is not the intent of                                                                                58
            To protect your rights, we need                                                                              04
            To protect your rights, we need                                                                              45
            We protect your rights with two                                                                              06
            We protect your rights with two                                                                              46
            When we speak of free software,                                                                              03
            When we speak of free software,                                                                              44
            You may charge a fee for the                                                                                 14
            You may charge a fee for the                                                                                 53
            a) Accompany it with the                                                                                     23
            a) You must cause the modified                                                                               16
            b) Accompany it with a written                                                                               24
            b) Accompany it with a written                                                                               60
            b) You must cause any work that                                                                              17
            b) You must cause any work that                                                                              55
            c) Accompany it with the                                                                                     25
            c) Accompany it with the                                                                                     61
            c) If the modified program                                                                                   18
            c) If the modified program                                                                                   56
            
            • Now, we open the Replace dialog ( Ctrl + H )

              • SEARCH ^(.+)(\x20+\d+\R)(\1(?2))+

              • REPLACE Leave EMPTY

              • Tick the Wrap around option

              • Select the Regular expression search mode

              • Click on the Replace All button

            => You should get the status message 33 occurrences were replaced, leading to this text :

            0. This License applies to any                                                                               11
            11. BECAUSE THE PROGRAM IS                                                                                   40
            3. You may copy and distribute                                                                               22
            7. If, as a consequence of a                                                                                 31
            Each version is given a                                                                                      37
            For example, if you distribute                                                                               05
            Preamble                                                                                                     01
            a) Accompany it with the                                                                                     23
            a) You must cause the modified                                                                               16
            
            • Although it would be possible, to use the column mode selection, to sort the lines by the number, at end of the lines, I’m not sure it would work properly with an huge list. So, I prefer to take a safer method and perform an other regex S/R :

              • SEARCH ^(.+?)\x20{2,}(\d+)

              • REPLACE \2\t\t\1

            And we end with :

            11		0. This License applies to any
            40		11. BECAUSE THE PROGRAM IS
            22		3. You may copy and distribute
            31		7. If, as a consequence of a
            37		Each version is given a
            05		For example, if you distribute
            01		Preamble
            23		a) Accompany it with the
            16		a) You must cause the modified
            
            • Again, we use the Edit > Line Operations > Sort Lines Lexicographically Ascending menu option, to restore the initial file order, giving :
            01		Preamble
            05		For example, if you distribute
            11		0. This License applies to any
            16		a) You must cause the modified
            22		3. You may copy and distribute
            23		a) Accompany it with the
            31		7. If, as a consequence of a
            37		Each version is given a
            40		11. BECAUSE THE PROGRAM IS
            

            And, finally, we perform a last regex S/R, below, to get rid of the temporary numbering !

            • SEARCH ^\d+\t+

            • REPLACE Leave EMPTY

            => Our expected text, with the 9 unique lines :

            Preamble
            For example, if you distribute
            0. This License applies to any
            a) You must cause the modified
            3. You may copy and distribute
            a) Accompany it with the
            7. If, as a consequence of a
            Each version is given a
            11. BECAUSE THE PROGRAM IS
            

            Best Regards,

            guy038

            A 1 Reply Last reply Dec 10, 2020, 12:50 PM Reply Quote 2
            • A
              Alan Kilborn @guy038
              last edited by Dec 10, 2020, 12:50 PM

              @guy038

              Perhaps that regex-intensive solution becomes the defacto standard way of solving this problem.

              But, it might be nice to see in Notepad++ itself, a command to “Remove lines from primary view tab that occur in secondary view tab”, or some such less-wordy verbage.

              T 1 Reply Last reply Dec 11, 2020, 8:11 AM Reply Quote 1
              • T
                TroshinDV @Alan Kilborn
                last edited by Dec 11, 2020, 8:11 AM

                @Alan-Kilborn said in Deleting numbers from LIST 1, that also appear in LIST 2:

                Perhaps that regex-intensive solution becomes the defacto standard way of solving this problem.

                Will be for small volumes.
                The volume of data dictates its own terms.
                PS. It is better to wrap all actions in a macro.

                A 1 Reply Last reply Dec 11, 2020, 1:09 PM Reply Quote 0
                • A
                  Alan Kilborn @TroshinDV
                  last edited by Dec 11, 2020, 1:09 PM

                  @TroshinDV said in Deleting numbers from LIST 1, that also appear in LIST 2:

                  Will be for small volumes.
                  The volume of data dictates its own terms.

                  That doesn’t make sense as the solution crafted by @guy038 was specifically considering large “volumes”.

                  PS. It is better to wrap all actions in a macro.

                  I don’t believe @guy038 's solution can be made into a macro; can you explain how you think it can be?

                  1 Reply Last reply Reply Quote 0
                  • M
                    M P
                    last edited by Dec 11, 2020, 4:06 PM

                    well, there were definitely some problems. LIST 1 has got 7.4mil number whilst LIST 2 got 1.2mil numbers. I believe that this is definitely too much to deal with. I’ll try the method of @guy038 now even though im not sure if i understood it all. Let’s try it at least

                    1 Reply Last reply Reply Quote 2
                    • G
                      guy038
                      last edited by guy038 Dec 11, 2020, 5:16 PM Dec 11, 2020, 5:12 PM

                      Hello, @m-p, @peterjones, @alan-kilborn, @troshindv and All,

                      Oh…! Indeed, dealing with two files of 7,400,000 and 1,200,000 lines is not an easy task ! So you will have to work with a 8,600,000 lines file : good luck !

                      Do not hesitate to ask me for more information if you encounter difficulties in implementing my method !

                      • First, I would advice you to repeat my own tiny example, first, to get its general idea

                      • Regarding your real example, I would say that :

                        • The N++ sort feature is very quick, in all cases

                        • I suppose that the numbering operation, with the column editor, should not be very long, too !

                        • May be, the first of the three regex S/R will probably take some time. Just be patient : it should work in the end !

                      Best Regards

                      guy038

                      1 Reply Last reply Reply Quote 1
                      • T Terry R referenced this topic on Mar 25, 2023, 2:08 AM
                      26 out of 26
                      • First post
                        26/26
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors