How to delete blocks of simmilar text which doesn't contain a specific word?
-
Fellow Notepad++ Users,
Could you please help me with the following search-and-replace problem I am having?
I have a large txt file which contains blocks of very similar text (blocks begin and end with same words), and I would like to delete those blocks that doesn’t contain a specific word. Or select and then copy into a new file those blocks which contain the desired word.
I dont know how to make notepad++ recognize these blocks of similar text as separate entities in the same file.
Is it possible to do?I am using the latest version of this software.
Thanks a lot
-
@veruc-w ,
Start off by reading the FAQ’s about how to use these forums and the markup language to describe and show your problem, and then read the FAQ on how to use Regex and the Search and Replace capability of Notepad++. This isn’t a mind reading forum, nor is this a one stop answering service for your vague descriptions. We are users helping users, not doing their work for them. Start here by reading the Online User Manual.
-
Hello, @veruc-w, @lycan-thrope and All,
In addition to @lycan-thrope’s advice, just one hint to begin with.
Simply replace the zones
BEGIN_BOUNDARY
andEND_BOUNDARY
with your current boundaries and the stringABSENT_WORD
with the word which must not be included into the blocks to deleteThen, follow the steps below :
-
Start
N++
and select the tab or open your file -
Open the Replace dialog (
Ctrl + H
) -
Untick all box options
-
SEARCH
(?s-i)^\h*BEGIN_BOUNDARY((?!ABSENT_WORD).)+?END_BOUNDARY.*?$\R
-
REPLACE
Leave EMPTY
-
Check the
Wrap around
box option -
Select the
Regular expression
search mode -
Click, once only, on the
Replace All
button
Here you are !
Best Regards,
guy038
-
-
-
@guy038 said :
SEARCH
(?s-i)^\h*BEGIN_BOUNDARY((?!ABSENT_WORD).)+?END_BOUNDARY.*?$\R
Maybe this is better without capturing into group1?, i.e. :
SEARCH
(?s-i)^\h*BEGIN_BOUNDARY(?:(?!ABSENT_WORD).)+?END_BOUNDARY.*?$\R
Also, OP said nothing about what comes before the word that begins the block nor what comes after the word that ends the block (assumption is the begin-word and the end-word are different) so thus maybe this is an even better expression:
SEARCH
(?s-i)BEGIN_BOUNDARY(?:(?!ABSENT_WORD).)+?END_BOUNDARY
It’s a pity the OP never returned to either provide more specifics (and sample data) or to say whether or not the originally proposed solution was successful.