• Login
Community
  • Login

Find specific lines in a txt file, if found delete them.

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
filtersearch in filesstringsfindsearch
6 Posts 3 Posters 1.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Shyvering Barhard
    last edited by May 15, 2018, 7:56 PM

    Basically, I have 2 txt files, let’s call them Txt1 and Txt2:

    Keep in mind each string is one line:

    Txt1 is a file that contains over 100.000 memory processes’ strings, I also have another file called Txt2 that has 400.000 memory processes’ strings.

    We can call Txt2 as strings that possible contain infected strings, basically, I’m looking to search any of the 400.000 strings on the 100.000 strings on Txt1. If any of the Txt2 strings are found in Txt1 that means those ones are clean strings, however, the ones that do no get found on Txt1 are the infected ones, is there any way I can filter them?

    TL:DR:

    clean.txt has 100.000 strings
    infected.txt has 400.000 strings

    -> Search for any of the infected.txt strings in clean.txt
    -> If strings that match were found, delete them, these are clean strings.
    -> Remaining strings that were non-existent from infected.txt are infected.

    How to filter these infected ones?

    1 Reply Last reply Reply Quote 1
    • G
      guy038
      last edited by guy038 May 15, 2018, 8:52 PM May 15, 2018, 8:46 PM

      Hi, @shyvering-barhard, and All,

      First of all, many thanks for your clear description of your problem ;-)) However, some questions still remain !

      So, here is the way I understand you :

      Assuming the two files, below, with some simple strings :

      Clean.txt :
      
      XYZ
      ABC
      DEF
      000
      HIJ
      

      and :

      Infected.txt :
      
      999
      KLM
      XYZ
      UVW
      ABC
      000
      HIJ
      DEF
      PQR
      

      I would create a third temporary file, containing the contents of the two files, in any order

      Total.txt :
      
      XYZ
      ABC
      DEF
      000
      HIJ
      999
      KLM
      XYZ
      UVW
      ABC
      000
      HIJ
      DEF
      PQR
      

      Then, I would perform a classical sort ( Edit > Line Operations > Sort Lines Lexicographically Ascending )

      000
      000
      999
      ABC
      ABC
      DEF
      DEF
      HIJ
      HIJ
      KLM
      PQR
      UVW
      XYZ
      XYZ
      

      And using the simple regex S/R :

      SEARCH ^(.+\R)\1

      REPLACE Leave EMPTY

      we easily obtain the expected result :

      999
      KLM
      PQR
      UVW
      

      Indeed, any of these 4 remaining strings are located in the Infected.txt file, only !

      Note that I assume some hypotheses :

      • The Clean.txt file does not contain duplicates

      • The Infected.txt file does not contain duplicates, too

      • You don’t mind about the sort process

      Some questions :

      Is there one memory process per line, in both files ?

      Could you show us some lines of each file, to get a general idea of the strings that must be matched, then deleted ?

      See you later !

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 2
      • S
        Shyvering Barhard
        last edited by Shyvering Barhard May 15, 2018, 10:01 PM May 15, 2018, 9:59 PM

        @guy038 Thanks a lot for the detailed answer, this will help me one little more step ahead of what I’m trying to do.
        I tried asking somewhere else but I wasn’t lucky enough to get my answer, I still have some missing-links on my equation.

        This is the process I’m looking forward to do, basically involves 3 .txt files.

        I’m looking for something that allows me to compare two processes strings, the comparison would come from a .txt file, let me elaborate.

        • I have a .txt file with over 400.000+ strings from a memory’s process.(Those being saved by Process Hacker 2)
          My goal is to open two instances of X process; process named X1 will be a clean process (or just call it vanilla) and process named X2 will be the same process but in this instance, it will be infected/hacked/modified.

        • Process X2 is infected as mentioned before but will try to self-destruct and try to restore as an original X1 process. Of course, it’s not 100% perfect and it will leave strings that never should have existed in an original X1 process.

        • Basically, I have to test all the 400.000+ strings on each process one by one, if any of the strings show at X1(clean file) it means the process itself starts with that specific string and it’s all good (the string is discarded); on the other hand if the string does not show up in process X1 but does in X2 it automatically means the process is infected.

        Why is so complicated to find the non-vanilla strings from X2 process?

        • Because the ‘X2’ process self-destructs the infected files and tries to revert it back to the original process, trying to look like the X1 process.

        What I’m looking for is to find that short amount of strings that won’t be erased when self-destructed, so whenever you check both processes you find the ones that are not original from the process, and as a result, claim the process was infected.

        You can consider this 3 .txt files as clean.txt ; infected.txt and suspicious_strings.txt

        C 1 Reply Last reply May 15, 2018, 11:15 PM Reply Quote 0
        • C
          Claudia Frank @Shyvering Barhard
          last edited by May 15, 2018, 11:15 PM

          @Shyvering-Barhard

          do you know python and python script plugin?
          If so, it might make your life easier, see here .

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • S
            Shyvering Barhard
            last edited by May 15, 2018, 11:55 PM

            @Claudia-Frank
            I’ll take a look at it, currently looking forward to remove all the 0x… prefixes, used reged ^............. (14 charcs) and some other variations because it’s not the same lenght. Any tips? Literally just looking to keep the strings which are likely the ones in red (not all were marked)
            [img]https://i.imgur.com/7So1FcM.jpg [/img]

            1 Reply Last reply Reply Quote 0
            • S
              Shyvering Barhard
              last edited by Shyvering Barhard May 16, 2018, 1:47 AM May 16, 2018, 1:46 AM

              Forget about that before, I found the solution @scott-summer

              1 Reply Last reply Reply Quote 0
              5 out of 6
              • First post
                5/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors