Community
    • Login

    Use Text File to Remove Lines?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mrmagnum8841M
      mrmagnum8841
      last edited by mrmagnum8841

      Hi. This is sorta hard to explain, but here goes.
      I have a big text file filled with a lot of ID’s. Is it possible to use the text file to remove lines?
      In other words is it possible to use it like a database of lines to remove?

      PeterJonesP 1 Reply Last reply Reply Quote 1
      • PeterJonesP
        PeterJones @mrmagnum8841
        last edited by PeterJones

        @mrmagnum8841 said in Use Text File to Remove Lines?:

        In other words is it possible to use it like a database of lines to remove?

        Kindof. I am going to assume you have two files – the huge file that has lines you want to remove (database.txt), and another smaller file with a list of IDs that will indicate which lines to delete (list.txt).

        Most important, backup both files – I recommend working from a copy rather than the original.

        example list.txt:

        ID005
        ID007
        ID011
        ID013
        

        example database.txt:

        This has many things, ID001, yada
        This has many things, ID002, yada
        This has many things, ID003, yada
        This has many things, ID004, yada
        This has many things, ID005, yada
        This has many things, ID006, yada
        This has many things, ID007, yada
        This has many things, ID008, yada
        This has many things, ID009, yada
        This has many things, ID010, yada
        This has many things, ID011, yada
        This has many things, ID012, yada
        This has many things, ID013, yada
        This has many things, ID014, yada
        This has many things, ID015, yada
        This has many things, ID016, yada
        

        desired result (5, 7, 11, 13 deleted):

        This has many things, ID001, yada
        This has many things, ID002, yada
        This has many things, ID003, yada
        This has many things, ID004, yada
        This has many things, ID006, yada
        This has many things, ID008, yada
        This has many things, ID009, yada
        This has many things, ID010, yada
        This has many things, ID012, yada
        This has many things, ID014, yada
        This has many things, ID015, yada
        This has many things, ID016, yada
        
        1. In list.txt,
          1. Ctrl+A, Ctrl+J: this joins everything into one long line, space-separated
          2. Search > Replace (or Ctrl+H): goal is to make the list |-separated
            • FIND = \h+
            • REPLACE = |
            • MODE = regular expression
            • Replace All
          3. Another Replacement: goal is to get the lines down to less than 1000 characters each (assuming no ID is greater than 50 characters)
            • FIND = (?-s)^(.{950,}?)\|
            • REPLACE = $1\r\n
            • MODE = regular expression
            • Replace All
            • You don’t need this step 3 if your line is less than 1000 characters after step 2.
          4. Next replacement: goal is to make each line look like (ID#|ID#|...|ID#), with a bit of stuff at the beginning and end
            • FIND = (?-s)^.*$
            • REPLACE = \(?-s\)^.*\($0\).*\\R?
            • MODE = regular expression
            • Replace All
        2. Now, for each line in list.txt:
          1. Copy the line from list.txt
          2. Switch to database.txt window
          3. Search > Replace (or Ctrl+H)
            • paste the line into the FIND box, so it looks like FIND = (?-s)^.*(ID#|ID#|...|ID#).*\R?
            • make the REPLACE box empty
            • MODE = regular expression
            • Replace All
          4. Repeat as necessary for each line from list.txt

        -----

        caveat emptor

        This sequence seemed to work for me, based on my understanding of your issue, and is published here to help you learn how to do this. I make no guarantees or warranties as to the functionality for you. You are responsible to save and backup all data before and after running this sequence. If you want to use it long term, I recommend investing time in adding error checking and verifying with edge cases.

        1 Reply Last reply Reply Quote 1
        • mrmagnum8841M
          mrmagnum8841
          last edited by

          Thank you. However I’ve already seen something similar and the only real issue with it is the length.
          As of right now, I have over 23738 lines with 32 characters being the length of each line and more and more get added.

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @mrmagnum8841, @peterjones and All,

            Hi Peter, we have already seen this type of request, many times, on our forum !

            So @mrmagnum8841, here is the road map :

            • Open a N++ new tab

            • First, paste the contents of the database.txt file in that new tab

            • Secondly, add a line containing, at least, 3 equal signs ( === )

            • Thirdly, append the contents of the list.txt file

            • Now open the Replace dialog ( Ctrl + H )

              • SEARCH (?s-i)^(?-s:.*(\w+).*\R)(?=.*^===+.+?^\1$)|^===.+

              • REPLACE Leave EMPTY

              • Tick the Wrap around option, if necessary

              • Select the Regular expression search mode

              • Click on the Replace All button

            Voila, that’s all !

            Notes :

            • Globally, this regex searches, in current line, if a word is also present, with the same case, in the 2nd part of file, after the line of equal signs =====, in the nearest complete line

            • If so, all the current line contents, with its line-break, are selected and, as the replacement zone is empty, this line is just deleted

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones
              last edited by

              @guy038 said,

              Hi Peter, we have already seen this type of request,

              I know. I just couldn’t quickly find any of them to link. Unfortunately, you have so many excellent regex posts on this forum, but I could never bookmark enough of them in an organized fashion to be able to always find the one I am thinking of for any given future reply. :-) I tried my best to recreate it from memory (not knowing if you were going to be answering over the weekend or not), but I forgot your trick for doing it in the same file rather than in 1000-character chunks. I guess I should have just waited for you to reply. ;-)

              @mrmagnum8841 ,

              As of right now, I have over 23738 lines with 32 characters being the length of each line and more and more get added.

              You’ve now told us you have 23738 lines of data, which is helpful information; but you haven’t told us how many IDs there are to delete from those 23738 lines (is it a few IDs, a few hundred IDs, a few thousand IDs, more than half the IDs)?

              Using the nomenclature I did in my first reply: with my solution, it shouldn’t be a problem if database.txt has 23k lines; with my solution, it became an issue if list.txt had more than ~1000 characters in the list of IDs… which is why my procedure grouped it into multiple groups of IDs if there were too many characters in your list.txt.

              That’s one of the benefits of @guy038’s solution: his solution can have as many IDs as you want to delete from the main database.txt, and it will still work – Unfortunately, historically, if there are too many characters in the lookahead expression, Notepad++ occasionally gives up and just selects everything, which would have the unfortunate side effect of deleting everything.

              If you try @guy038’s solution, and if it deletes everything or otherwise deletes too much, let us know, and we will try to help you through. But if that happens, it will help us to help you if you could give us a better representation of what you have – whether you really have two going to have to stop making us guess what you data is structured like. I assumed two files: one as a database.txt that had the main data you wanted to process, and a second list.txt that just listed the IDs you wanted to delete from database.txt. But that was just an assumption, which you have neither confirmed or denied. So if you need more help from us, you will need to provide more information, including dummy data. You can use the </> button on the post toolbar to format data as text (like I did in my original reply); give us an excerpt (not the full 23000 lines) of your data – if there is sensitive/secret information, just make up a handful of lines of dummy data that looks similar but with fake names, numbers, etc; and give us an example of the IDs you’d like to delete from your dummy data.

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @mrmagnum8841, @peterjones and All,

                I re-tested my regex S/R with a consequent amount of lines and I must say that this S/R, proposed in my previous post, failed miserably, even with very little data.: -(((

                Assuming that each line of the database.txt file contains 32 characters per line, it does not work when more than 160 lines :-(( A pity !

                Even this modified S/R, where I use delimiters to better catch the identifier ID### :

                SEARCH (?s-i)^(?-s:.*,\x20(\w+),.*\R)(?=.*^===+.+?^\1$)|^===.+

                With ,\x20 as the start delimiter and , as an end delimiter, can support about 2,200 lines but not more :-(

                I also used this other version without the line delimiter ======= :

                SEARCH (?-si)^.*,\x20(\w+),.*\R(?=(?s).+?^\1$)|^(ID...\R)+

                But, though the regex seems more simple, the result is worse as it can only handle about 1.850 lines !

                And, anyway, all of these regexes S/R end up , selecting all the file contents which is, obviously, not the desired goal !


                Finally, it seems that the @peterjones’s solution is the more efficient ! The only drawback of his method is when the list.txt file contains too many identifiers OR when a lot of these identifiers do not exist in the database.txt file !. In this later case, this leads to a resulting regex (?-s)^.*(ID#|ID#|...|ID#).*\R? containing two many useless ID# alternatives !

                So, here is my new attempt :

                • It should support, both, important size of the database.txt and list.txt files

                • The contents of the list.txt file may refer to the identifiers whose lines containing them, in the database.txt file, have to be deleted or, on the contrary, have to be retained !

                • Of course, it does not alter the initial order of lines of the database.txt file

                • It minimizes the number of alternatives of the final regex, created in order to process the database.txt contents

                Here is the road map, assuming the following examples of :

                • The initial list.txt file ( 12 lines, not sorted ) :
                ID011
                ID000
                ID005
                ID037
                ID008
                ID013
                ID024
                ID043
                ID003
                ID026
                ID028
                ID016
                
                • The initial database.txt file ( 50 lines, of 32 chars, not sorted ) :
                This a simple test, ID007, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID001, ABCDE
                This a simple test, ID002, ABCDE
                This a simple test, ID004, ABCDE
                This a simple test, ID005, ABCDE
                This a simple test, ID006, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID009, ABCDE
                This a simple test, ID010, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID003, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID013, ABCDE
                This a simple test, ID014, ABCDE
                This a simple test, ID017, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID018, ABCDE
                This a simple test, ID019, ABCDE
                This a simple test, ID020, ABCDE
                This a simple test, ID021, ABCDE
                This a simple test, ID022, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID025, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID027, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID028, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID030, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID003, ABCDE
                This a simple test, ID024, ABCDE
                

                • First, copy the database.txt contents as, let’s say, the file dummy.txt

                • Now, open the dummy.txt file

                • Perform the regex S/R :

                  • SEARCH (?-s)^.*(ID\d\d\d).*\R

                  • REPLACE \1\t$0

                => We copy the identifier at beginning of all lines, followed with a tab separator and get :

                ID007	This a simple test, ID007, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID001	This a simple test, ID001, ABCDE
                ID002	This a simple test, ID002, ABCDE
                ID004	This a simple test, ID004, ABCDE
                ID005	This a simple test, ID005, ABCDE
                ID006	This a simple test, ID006, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID009	This a simple test, ID009, ABCDE
                ID010	This a simple test, ID010, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID013	This a simple test, ID013, ABCDE
                ID014	This a simple test, ID014, ABCDE
                ID017	This a simple test, ID017, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID018	This a simple test, ID018, ABCDE
                ID019	This a simple test, ID019, ABCDE
                ID020	This a simple test, ID020, ABCDE
                ID021	This a simple test, ID021, ABCDE
                ID022	This a simple test, ID022, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID025	This a simple test, ID025, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID027	This a simple test, ID027, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID028	This a simple test, ID028, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID030	This a simple test, ID030, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID024	This a simple test, ID024, ABCDE
                
                • Append the list.txt contents, at the end of the dummy.txt file, giving these 62 lines, below :
                ID007	This a simple test, ID007, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID001	This a simple test, ID001, ABCDE
                ID002	This a simple test, ID002, ABCDE
                ID004	This a simple test, ID004, ABCDE
                ID005	This a simple test, ID005, ABCDE
                ID006	This a simple test, ID006, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID009	This a simple test, ID009, ABCDE
                ID010	This a simple test, ID010, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID013	This a simple test, ID013, ABCDE
                ID014	This a simple test, ID014, ABCDE
                ID017	This a simple test, ID017, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID018	This a simple test, ID018, ABCDE
                ID019	This a simple test, ID019, ABCDE
                ID020	This a simple test, ID020, ABCDE
                ID021	This a simple test, ID021, ABCDE
                ID022	This a simple test, ID022, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID025	This a simple test, ID025, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID027	This a simple test, ID027, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID028	This a simple test, ID028, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID030	This a simple test, ID030, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID011
                ID000
                ID005
                ID037
                ID008
                ID013
                ID024
                ID043
                ID003
                ID026
                ID028
                ID016
                
                • Select the option Edit > Line operations > Sort Lines Lexicographically Descending

                We get the following text :

                ID043
                ID037
                ID030	This a simple test, ID030, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID029	This a simple test, ID029, ABCDE
                ID028	This a simple test, ID028, ABCDE
                ID028
                ID027	This a simple test, ID027, ABCDE
                ID026
                ID025	This a simple test, ID025, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024	This a simple test, ID024, ABCDE
                ID024
                ID023	This a simple test, ID023, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID023	This a simple test, ID023, ABCDE
                ID022	This a simple test, ID022, ABCDE
                ID021	This a simple test, ID021, ABCDE
                ID020	This a simple test, ID020, ABCDE
                ID019	This a simple test, ID019, ABCDE
                ID018	This a simple test, ID018, ABCDE
                ID017	This a simple test, ID017, ABCDE
                ID016
                ID014	This a simple test, ID014, ABCDE
                ID013	This a simple test, ID013, ABCDE
                ID013
                ID012	This a simple test, ID012, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID012	This a simple test, ID012, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID011	This a simple test, ID011, ABCDE
                ID011
                ID010	This a simple test, ID010, ABCDE
                ID009	This a simple test, ID009, ABCDE
                ID008
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID007	This a simple test, ID007, ABCDE
                ID006	This a simple test, ID006, ABCDE
                ID005	This a simple test, ID005, ABCDE
                ID005
                ID004	This a simple test, ID004, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID003	This a simple test, ID003, ABCDE
                ID003
                ID002	This a simple test, ID002, ABCDE
                ID001	This a simple test, ID001, ABCDE
                ID000
                
                • Now, perform the regex S/R :

                  • SEARCH (?-s)ID(.{3}).+\RID\1\R|.+\R

                  • REPLACE ?1|\1

                => You should always obtain a single line, like below :

                |028|024|013|011|005|003
                

                Remark : This line should not exceed 2,010 characters long. However, this should be generally the case as we collect only identifiers present in the database.txt file. I also omitted the common part ID to get a smaller expression !

                • At this penultimate step, we’ll use a regex S/R to … create an new search regex ! So :

                  • SEARCH ^\|(.+)

                  • REPLACE \(?-s\)^\(?=.*ID\(\1\)\).+\\R

                which give us the regex :

                (?-s)^(?=.*ID(028|024|013|011|005|003)).+\R
                
                • Save the one-line file dummy.txt

                • Finally, open your database.txt file

                • And, here is the final regex S/R to perform. Two cases :

                  • (A)   If the list.txt file contains all the identifiers to be deleted, in database.txt, use the new search regex :

                    • SEARCH (?-s)^(?=.*ID(028|024|013|011|005|003)).+\R

                    • REPLACE Leave EMPTY

                  • (B)   If the list.txt file contains the identifiers which must be only retained, in database.txt, add the part |^.+\R at end and modify the replacement part :

                    • SEARCH (?-s)^(?=.*ID(028|024|013|011|005|003)).+\R|^.+\R

                    • REPLACE ?1$0


                • With the regex S/R (A), we get a final database.txt file of 33 lines, below :
                This a simple test, ID007, ABCDE
                This a simple test, ID001, ABCDE
                This a simple test, ID002, ABCDE
                This a simple test, ID004, ABCDE
                This a simple test, ID006, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID009, ABCDE
                This a simple test, ID010, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID014, ABCDE
                This a simple test, ID017, ABCDE
                This a simple test, ID018, ABCDE
                This a simple test, ID019, ABCDE
                This a simple test, ID020, ABCDE
                This a simple test, ID021, ABCDE
                This a simple test, ID022, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID007, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID025, ABCDE
                This a simple test, ID027, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID023, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID012, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID029, ABCDE
                This a simple test, ID030, ABCDE
                This a simple test, ID007, ABCDE
                
                • With the regex S/R (B), we get a final database.txt file of 17 lines, below :
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID005, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID003, ABCDE
                This a simple test, ID013, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID011, ABCDE
                This a simple test, ID024, ABCDE
                This a simple test, ID028, ABCDE
                This a simple test, ID003, ABCDE
                This a simple test, ID024, ABCDE
                

                Note that, if we had followed the @peterjones’s method, we would have ended up with this search regex, a bit longer :

                (?-s)^.*(ID011|ID000|ID005|ID037|ID008|ID013|ID024|ID043|ID003|ID026|ID028|ID016).*\R?

                Of course, here, there no notable difference but, depending of the list.txt contents, it could be of some importance !

                Best Regards,

                guy038

                PeterJonesP 1 Reply Last reply Reply Quote 2
                • PeterJonesP
                  PeterJones @guy038
                  last edited by

                  @guy038 ,

                  Great effort, hopefully not wasted.

                  Unfortunately, @mrmagnum8841 has never come back and answered whether my interpretation of the problem is in any way accurate.

                  I am the one who introduced the database.txt and list.txt idea (as the most reasonable way I could come up with of interpreting what was asked for, but never explicitly stated). And I am the one who used “ID###” – @mrmagnum8841 just said “big text file filled with a lot of ID’s”. There may not be anything so easy to capture as the “ID” prefix before each ID. It may be that each ID is really a UUID, or it may be that each ID is really exactly 32 hexadecimal characters, or it may be that each ID is really someone’s name with all spaces and special characters removed, or it may be that each ID just appears to our eyes to be a random set of characters with a random length. Making too many optimizations without any feedback from @mrmagnum8841 might be an interesting mental exercise, but we have no idea if we’ve ever been answering @mrmagnum8841’s actual need.

                  @mrmagnum8841 , if you want more help than we have provided, please actually respond with answers to the questions raised, and let us know how close, or far, we are to actually solving your problem.

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    I think that what I now call “the Dail-ism” is an applicable comment at this point.

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @mrmagnum8841, @peterjones, @alan-kilborn and All,

                      When elaborating my previous post, I remenbered, from this post :

                      https://community.notepad-plus-plus.org/post/51385

                      This following regex (?-s)^(.+\R)(?=(?s).+?^\1), which, indeed, could work with a 5 Mb file ,containing more than 200,000 lines ! Much better, isn’t it ?

                      Seemingly, the fact that, in this regex, the group 1 corresponds to an entire line, with its line-break, whereas the (?-si)^.*,\x20(\w+),.*\R(?=(?s).+?^\1$) syntax stores, only, the ID### part, of each line, in group 1 ( which fails with a file over 82 Kb - 2,500 lines ! ) makes all the difference !! Why ?

                      As you said, Peter, it was a mental exercise, not specifically intended for the OP, in order to find a correct way to filter fairly large files, as I’m rather irritated by the limitations of my various regular expression attempts :-((

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • PeterJonesP PeterJones referenced this topic on
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors