Show (or keep) subsets of a file
-
I have a file that contains many many blocks of text. Each block starts with a line containing <start string> and ends with a line containing <end string>. I’m only interested in those blocks that have <target string> somewhere in the block; there are far fewer occurrences of <target string> than <start string> but still too many to process manually. What I’d like to be able to do is find each occurrence of <target string>and keep only those lines from the preceding <start string> to the next <end string>. I don’t really care whether everything else is hidden or deleted. Is there a way to do something like this? Thanks for any suggestions.
-
Hello, @mark-boonie and All,
Mark, I think there is a way to achieve what you want with regular expressions !
Could you provide one or two examples of these
<start string>......<end string>blocks ?When posting, try to first hit the
</>Code button to ensure that your text is copied literallySee you later,
Best Regards,
guy038
-
Hi @guy038. I’m changing it a bit for business reasons, but basically it would look like this:
*Block start 00000000013FC200 00200280 00010000 00000000 00000000 00000000013FC210 00000002 CC5CDDA0 00000000 00000000 00000000013FC220 00000000 00000000 01266100 01266100 00000000013FC230 00808000 013FC2B8 00000000 00000000 00000000013FC240 0003D000 03A8A1A0 03A8A670 03A8A710 00000000013FC250 00000000 0003DD88 013FD020 00000000 00000000013FC260 00000000 00000000 00000000 0C000002 00000000013FC270 11804017 03A8A718 0E000000 00800000 00000000013FC280 40000020 013FC280 00000000 00000000 00000000013FC290 00000000 00000000 00000000 00000000 00000000013FC2A0 00000000 0C000002 10800011 00000000 00000000013FC2B0 06000000 00800000 40000020 013FC2B8 00000000013FC2C0 00000000 00000000 00000000 00000000 00000000013FC320 00000000 01421800 00000000 00000000 00000000013FC330 00000000 00000000 00000000 00000000 00000000013FC3C0 00000000 00000000 00000000 00000571 00000000013FC3D0 00000000 00000000 00000000 00000000 00000000013FC3F0 00000000 00000000 00000000 00000000 *Block endThe first and last lines shown are the delimiter lines. The target string would vary, but obviously it’s another hex string. Thanks for any suggestions.
-
Hi, @mark-boonie,
Hum, I’m a bit upset with the example that you provided !
Indeed, I’ve already find out a regex solution, following exactly what you said in your initial post
So, I created this sample of text below :
bla bla <start string> blo blo <target string> <end string> bla bal blah blah <start string> <end string> <start string> <target string> blu blu <end string> bla bla <start string> bla bla blah blah <end string> bla bla <start string> <target string> <end string> bla blaNow :
-
Open the Replace dialog (
Ctrl + H) -
Uncheck any box option
-
Find
(?s)^<start string>((?!<start string>).)+?<target string>.+?^<end string>\R(*SKIP)(*F)|(?-s)^.*\R -
Replace
Leave EMPTY -
Check the
Wrap aroundoption -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton
=> You should be left with this text :
<start string> blo blo <target string> <end string> <start string> <target string> blu blu <end string> <start string> <target string> <end string>You can verify that, in this OUTPUT :
-
Any text outside the blocks
<start string>....<end string>have been deleted -
Text within blocks
<start string>....<end string>which do not contain the line<target string>have been deleted, too -
It just remains blocks starting with a
<start string>line and ending with a<end string>line which do contain a line<target string> -
And note that, in this last case, all the lines of these blocks are kept, too !
However, with the text provided :
-
The delimiters
*Block startand*Block endare different than in your initial post, but this is not a problem -
But the fact that the
<target string>cannot be clearly identify is a BIG problem
Indeed, from your example, how may I know that this block of text must be kept or not ??
I probably miss something …
BR
guy038
-