Community
    • Login

    Show (or keep) subsets of a file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 47 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mark BoonieM
      Mark Boonie
      last edited by

      I have a file that contains many many blocks of text. Each block starts with a line containing <start string> and ends with a line containing <end string>. I’m only interested in those blocks that have <target string> somewhere in the block; there are far fewer occurrences of <target string> than <start string> but still too many to process manually. What I’d like to be able to do is find each occurrence of <target string>and keep only those lines from the preceding <start string> to the next <end string>. I don’t really care whether everything else is hidden or deleted. Is there a way to do something like this? Thanks for any suggestions.

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @mark-boonie and All,

        Mark, I think there is a way to achieve what you want with regular expressions !

        Could you provide one or two examples of these <start string>......<end string> blocks ?

        When posting, try to first hit the </> Code button to ensure that your text is copied literally

        See you later,

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • Mark BoonieM
          Mark Boonie
          last edited by

          Hi @guy038. I’m changing it a bit for business reasons, but basically it would look like this:

          *Block start                                                          
          00000000013FC200     00200280     00010000     00000000     00000000  
          00000000013FC210     00000002     CC5CDDA0     00000000     00000000  
          00000000013FC220     00000000     00000000     01266100     01266100  
          00000000013FC230     00808000     013FC2B8     00000000     00000000  
          00000000013FC240     0003D000     03A8A1A0     03A8A670     03A8A710  
          00000000013FC250     00000000     0003DD88     013FD020     00000000  
          00000000013FC260     00000000     00000000     00000000     0C000002  
          00000000013FC270     11804017     03A8A718     0E000000     00800000  
          00000000013FC280     40000020     013FC280     00000000     00000000  
          00000000013FC290     00000000     00000000     00000000     00000000  
          00000000013FC2A0     00000000     0C000002     10800011     00000000  
          00000000013FC2B0     06000000     00800000     40000020     013FC2B8  
          00000000013FC2C0     00000000     00000000     00000000     00000000  
          00000000013FC320     00000000     01421800     00000000     00000000  
          00000000013FC330     00000000     00000000     00000000     00000000  
          00000000013FC3C0     00000000     00000000     00000000     00000571  
          00000000013FC3D0     00000000     00000000     00000000     00000000  
          00000000013FC3F0     00000000     00000000     00000000     00000000  
                                                                                
          *Block end                                                            
          

          The first and last lines shown are the delimiter lines. The target string would vary, but obviously it’s another hex string. Thanks for any suggestions.

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @mark-boonie,

            Hum, I’m a bit upset with the example that you provided !


            Indeed, I’ve already find out a regex solution, following exactly what you said in your initial post

            So, I created this sample of text below :

            bla bla
            
            <start string>
            
            blo blo
            
            
            <target string> 
            
            <end string>
            
            bla bal
            blah blah
            
            <start string>
            <end string>
            
            <start string>
            
            <target string> 
            
            
            blu blu
            
            <end string>
            
            bla bla
            
            <start string>
            bla bla
            blah blah
            <end string>
            
            bla bla
            
            <start string>
            <target string> 
            <end string>
            
            bla bla
            

            Now :

            • Open the Replace dialog ( Ctrl + H )

            • Uncheck any box option

            • Find (?s)^<start string>((?!<start string>).)+?<target string>.+?^<end string>\R(*SKIP)(*F)|(?-s)^.*\R

            • Replace Leave EMPTY

            • Check the Wrap around option

            • Select the Regular expression search mode

            • Click on the Replace All button

            => You should be left with this text :

            <start string>
            
            blo blo
            
            
            <target string> 
            
            <end string>
            <start string>
            
            <target string> 
            
            
            blu blu
            
            <end string>
            <start string>
            <target string> 
            <end string>
            

            You can verify that, in this OUTPUT :

            • Any text outside the blocks <start string>....<end string> have been deleted

            • Text within blocks <start string>....<end string> which do not contain the line <target string> have been deleted, too

            • It just remains blocks starting with a <start string> line and ending with a <end string> line which do contain a line <target string>

            • And note that, in this last case, all the lines of these blocks are kept, too !


            However, with the text provided :

            • The delimiters *Block start and *Block end are different than in your initial post, but this is not a problem

            • But the fact that the <target string> cannot be clearly identify is a BIG problem

            Indeed, from your example, how may I know that this block of text must be kept or not ??

            I probably miss something …

            BR

            guy038

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors