Community
    • Login

    How to delete all lines found in another txt document

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 5 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DimakSerpgD
      DimakSerpg @guy038
      last edited by DimakSerpg

      @guy038 why it’s so complicated?
      You are saying “add a new line beginning with, at least, three = equal symbols”
      But then you are saying “after the line =====…”

      So it’s eight symbols now, and for some reason there are 3 dots??
      What?
      I’m unfamiliar with notepad and this doesn’t work.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @DimakSerpg
        last edited by PeterJones

        @DimakSerpg said in How to delete all lines found in another txt document:

        why it’s so complicated?

        Because it’s essentially trying to recreate a full programming language or database system in something that’s meant for text editing, not database manipulation. I have never heard of a text editor in which “delete all lines found in another txt document” is implemented natively.

        If you search the forum, there’s also examples of using the PythonScript plugin to programmatically do essentially the same thing.

        I’m unfamiliar with notepad

        The application is Notepad++, not notepad. There’s a difference (the latter being the simple app that Microsoft has included with Windows for decades, the former being the high-powered text editor that we talk about in this Forum).

        and this doesn’t work.
        …
        So it’s eight symbols now, and for some reason there are 3 dots??

        He is this forum’s acknowledged regex guru, but even a guru can sometimes make mistakes or not explain things well (especially when they are communicating technical information in a language other than their native language)

        I believe the ... was supposed to indicate that there could be more beyond the initial three equals symbols. And I believe that showing five equals ===== instead of three equals === was just enthusiasm on Guy’s part.

        If it helps, think of those instructions as

        • At the end of source.txt, add a new line beginning with at least three = equal symbols
        • Then append the contents of the delete.txt file after the line you just added

        And given the instructions above, the SEARCH line needs to change as well:

        • SEARCH (?s-i)^((?-s).+\R)(?=.*^===+\R.*^\1)|^=+\R.+
          (it should only have 3 = in a row, not the 4 that Guy originally showed)

        So assuming
        original source.txt:

        this is okay
        delete me
        this was good
        i should be deleted
        fine
        

        and original delete.txt:

        i should be deleted
        delete me
        

        those would be merged into

        this is okay
        delete me
        this was good
        i should be deleted
        fine
        ===
        i should be deleted
        delete me
        
        

        then running FIND WHAT (?s-i)^((?-s).+\R)(?=.*^===+\R.*^\1)|^=+\R.+ REPLACE WITH <empty>, SEARCH MODE = regular expression, click REPLACE ALL, I get:

        this is okay
        this was good
        fine
        
        

        This sequence successfully eliminated the lines from delete.txt that were in source.txt …

        As with all search/replace instructions that you get from a forum, I highly recommend having a backup copy of any data before you run a REPLACE ALL that you don’t understand.

        DimakSerpgD Alan KilbornA 2 Replies Last reply Reply Quote 1
        • DimakSerpgD
          DimakSerpg @PeterJones
          last edited by DimakSerpg

          @PeterJones for some reason it works with your examples.

          But when I use the same method with my text, it doesn’t work.

          Maybe it’s because my text are big? There are like 4.4 million lines, when source and delete files are merged.
          It’s all just numbers. So i want to delete 2 million numbers that are in my source file with 2.4 million numbers.

          After i click “replace all” it just deletes everything.

          But it works without any problems when i pick like 100 lines. So the problem is in 4.4 million lines.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @DimakSerpg
            last edited by PeterJones

            @DimakSerpg said in How to delete all lines found in another txt document:

            It’s all just numbers. So i want to delete 2 million numbers that are in my source file with 2.4 million numbers.
            After i click “replace all” it just deletes everything.

            That’s a different problem than we normally see with big files and such activity. Normally, big files make it so that there’s not enough space in the regex memory, and the regex will thus not run… But Guy’s regex was intended to be immune to long files (and my modification should have been, too), since the capture-memory of the regex should only be one line’s worth.

            I’m really surprised that its fallback would be to delete everything. (Well, unless the 2.4M in source.txt aren’t unique, and it just so happens that every line in source.txt is also contained in the 2M lines of delete.txt. It might be worth trying Edit > Line Operations > Remove Duplicate Lines on a copy of source.txt, and seeing if there are still more than 2M lines after the removal; if there are 2M or fewer lines, then it’s entirely possible that every unique line matches a line from delete.txt.)

            But it works without any problems when I pick like 100 lines. So the problem is in 4.4 million lines.

            If it’s not multiples of the same line in source.txt, then it’s beyond me. Maybe when Guy or one of the other regex greats has a chance, they can come try to give an alternative that will work with your data.

            It would help if you could provide a list of like 20 lines of source.txt and 5 lines of delete.txt – you can use fake numbers, if there’s something confidential about the numbers, but they should “look like” real data. Someone that has the time and ability could then take those examples, and make huge datafiles that have lots of numbers that are similar to those examples, and see if they can come up with something that works for deleting 2M lines from 2.4M lines of source.

            But I hinted at it before, and will phrase it differently to make it explicit: a text editor is the wrong tool for the job. You are essentially trying to delete a huge number of records from a database – this could probably be done in a database application, and it could be easily done in a few lines of code with a good programming language – but we cannot help you with either database or programming solutions here, because this forum is about Notepad++.

            1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @PeterJones
              last edited by

              @PeterJones said in How to delete all lines found in another txt document:

              but even a guru can sometimes make mistakes or not explain things well (especially when they are communicating technical information in a language other than their native language)

              I believe the … was supposed to indicate that there could be more beyond the initial three equals symbols.

              And I think that the posters receiving information need to actually do some THINKING about what they’re being given…

              DimakSerpgD 1 Reply Last reply Reply Quote 0
              • DimakSerpgD
                DimakSerpg @Alan Kilborn
                last edited by

                @PeterJones I updated notepad, thought it might help, and now there’s error.error

                @Alan-Kilborn said in How to delete all lines found in another txt document:

                And I think that the posters receiving information need to actually do some THINKING about what they’re being given…

                Uhh… no? It’s pretty simple.

                1. do this
                2. then this
                3. done
                  I don’t need to know exactly what this command means, I don’t need to learn regex for this. It’s a simple command that would work as it is, but the problem is on my side because of the large text.
                PeterJonesP 1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @dimakserpg and All,

                  Could you provide us a small part of your source.txt and delete.txt ( let"s say about 50 lines of each ) ?

                  Try to insert these sections as raw text, using the </> button when writing your post !

                  I will try to find out a new method, suitable for big files !

                  Best Regards,

                  guy038

                  BTW, in my regex, I used this part ^===+\R which represents a complete line of, at least, 3 equal signs. followed with its line-break

                  Thus, as long as this line begins with ===, it doesn"t matter if more equal signs are written right after !

                  DimakSerpgD 1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @DimakSerpg
                    last edited by

                    @DimakSerpg said in How to delete all lines found in another txt document:

                    Uhh… no? It’s pretty simple.

                    If it were simple, you would’ve figured it out without help.

                    I don’t need to know exactly what this command means, I don’t need to learn regex for this.

                    That’s a poor attitude. So essentially you are saying, “I don’t need to learn because I can dupe other people into doing it for free for me”. See how much help you receive if you continue with that attitude in life. I have already given you a working solution for reasonable quantities of data, and given you alternate suggestions of non-Notepad++ ideas that you might want to pursue; after this post, I’ve had my say.

                    I notice you also didn’t bother showing any example data, like I requested. And now Guy has requested it as well. If you don’t at least put in that much thought and effort, it will be virtually impossible for someone to help you, even if they were willing to look beyond your attitude.

                    It’s a simple command that would work as it is, but the problem is on my side because of the large text.

                    It’s not a simple command, but it does work correctly with smaller datasets.

                    ----

                    Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

                    1 Reply Last reply Reply Quote 1
                    • DimakSerpgD
                      DimakSerpg @guy038
                      last edited by

                      @guy038 said in How to delete all lines found in another txt document:

                      Could you provide us a small part of your source.txt and delete.txt ( let"s say about 50 lines of each ) ?

                      @PeterJones said in How to delete all lines found in another txt document:

                      I notice you also didn’t bother showing any example data, like I requested. And now Guy has requested it as well. If you don’t at least put in that much thought and effort, it will be virtually impossible for someone to help you, even if they were willing to look beyond your attitude.

                      149f9210-c899-43a8-a6c0-484d99e9ef93-image.png

                      PeterJonesP 2 Replies Last reply Reply Quote -3
                      • PeterJonesP
                        PeterJones @DimakSerpg
                        last edited by

                        @DimakSerpg ,

                        fca34970-75b6-4cc5-91a9-00a15ecc195d-image.png

                        Did you notice the part where I said, Someone ... could then take those examples, and make huge datafiles – I wasn’t claiming that they would use just the small example; I was saying they needed that small example as a starting point, to try to replicate the problem with the original regex and try to solve it using the extended data.

                        I don’t understand why you are unwilling to provide even that much. Guy has said he’s willing to help you, and all you have to do to receive that help is to provide example data that he can start from. If you choose not to share a small amount of example data, I think even Guy’s willingness to help you will not be able to overcome your lack of effort.

                        1 Reply Last reply Reply Quote 2
                        • PeterJonesP
                          PeterJones @DimakSerpg
                          last edited by PeterJones

                          @DimakSerpg ,

                          REGEX IN NOTEPAD++ IS THE WRONG TOOL FOR THIS JOB!

                          I created three sets of files:

                          1. 100,000 7-digit numbers in each, where it will delete about 1/3 of the ones from source.txt
                          2. 1,000,000 7-digit numbers in each, where it will delete about 1/2 of the ones from source.txt
                          3. 10,000,000 9-digit numbers in each, where it will delete about 1/3 of the ones from source.txt

                          I started notepad++ -nosession -multiInst -noPlugin src1e5.txt del1e5.txt running on the regex for the smallest of those.
                          Then in another Notepad++ session, I spend about 10minutes coding up a script in Perl, and made sure it worked on the 100,000 line file in under a second. It then worked on the 1,000,000 line file in about 4 seconds. And then it processed the 10,000,000 line file in 4 minutes.

                          I then wrote up this post. By the time I was done with that, it still hadn’t finished running the regex in Notepad++.

                          IyFwZXJsDQp1c2UgNS4wMTI7DQp1c2Ugd2FybmluZ3M7DQp1c2Ugc3RyaWN0Ow0KdXNlIFRpbWU6OkhpUmVzIHF3L3RpbWUvOw0KDQpwcmludCBTVERFUlIgc2NhbGFyIHRpbWUsICJcbiI7DQpteSBAc3JjID0gZG8geyBvcGVuIG15ICRmaCwgJzwnLCAnc3JjMWU3LnR4dCc7IDwkZmg+IH07DQpteSBAZGVsID0gZG8geyBvcGVuIG15ICRmaCwgJzwnLCAnZGVsMWU3LnR4dCc7IDwkZmg+IH07DQpteSAlaDsgQGh7QGRlbH0gPSBAZGVsOw0Kb3BlbiBteSAkZmgsICc+JywgJ291dDFlNy50eHQnOw0Kc2VsZWN0ICRmaDsNCiRcID0gIiI7DQpwcmludCBmb3IgZ3JlcCB7IWV4aXN0cyAkaHskX319IEBzcmM7DQpwcmludCBTVERFUlIgc2NhbGFyIHRpbWUsICJcbiI7DQo
                          

                          If you can figure out how to decode that text box using Notepad++, and run a perl script (not in Notepad++), it’s yours, for free, no tech support provided. Good luck,

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors