Show (or keep) subsets of a file
-
I have a file that contains many many blocks of text. Each block starts with a line containing <start string> and ends with a line containing <end string>. I’m only interested in those blocks that have <target string> somewhere in the block; there are far fewer occurrences of <target string> than <start string> but still too many to process manually. What I’d like to be able to do is find each occurrence of <target string>and keep only those lines from the preceding <start string> to the next <end string>. I don’t really care whether everything else is hidden or deleted. Is there a way to do something like this? Thanks for any suggestions.
-
Hello, @mark-boonie and All,
Mark, I think there is a way to achieve what you want with regular expressions !
Could you provide one or two examples of these
<start string>......<end string>blocks ?When posting, try to first hit the
</>Code button to ensure that your text is copied literallySee you later,
Best Regards,
guy038
-
Hi @guy038. I’m changing it a bit for business reasons, but basically it would look like this:
*Block startlock endThe first and last lines shown are the delimiter lines. The target string would vary, but obviously it’s another hex string. Thanks for any suggestions.
-
Hi, @mark-boonie,
Hum, I’m a bit upset with the example that you provided !
Indeed, I’ve already find out a regex solution, following exactly what you said in your initial post
So, I created this sample of text below :
bla bla <start string> blo blo <target string> <end string> bla bal blah blah <start string> <end string> <start string> <target string> blu blu <end string> bla bla <start string> bla bla blah blah <end string> bla bla <start string> <target string> <end string> bla blaNow :
-
Open the Replace dialog (
Ctrl + H) -
Uncheck any box option
-
Find
(?s)^<start string>((?!<start string>).)+?<target string>.+?^<end string>\R(*SKIP)(*F)|(?-s)^.*\R -
Replace
Leave EMPTY -
Check the
Wrap aroundoption -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton
=> You should be left with this text :
<start string> blo blo <target string> <end string> <start string> <target string> blu blu <end string> <start string> <target string> <end string>You can verify that, in this OUTPUT :
-
Any text outside the blocks
<start string>....<end string>have been deleted -
Text within blocks
<start string>....<end string>which do not contain the line<target string>have been deleted, too -
It just remains blocks starting with a
<start string>line and ending with a<end string>line which do contain a line<target string> -
And note that, in this last case, all the lines of these blocks are kept, too !
However, with the text provided :
-
The delimiters
*Block startand*Block endare different than in your initial post, but this is not a problem -
But the fact that the
<target string>cannot be clearly identify is a BIG problem
Indeed, from your example, how may I know that this block of text must be kept or not ??
I probably miss something …
BR
guy038
-