Regex: Can I Delete the content of files that doesn't have some words?

Robin Cruise

good day, everyone. Just a question. I have this words in many files, but not in all files. For example:

++++++++++++++++±
text text
my baby goes away
text text
++++++++++++++++±

I want to delete all contents of those files that doesn’t have this unique words.

I try something, but doesn’t work too well.

check dot matches newlines and Search ^(?!.*\s(my baby goes away)\s).*

Any suggestion?

guy038

Hello @robin-cruise,

First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))

Now, if all these files are located in a specific folder :

Open the Find in Files dialog ( Ctrl +Shift +F )
In the Find what: zone, type (?s).*\s(my baby goes away)\s?.*|.+
In the Replace with: zone, type ?1$0
In the Filters zone, enter \*.txt or else…
In the Directory zone, specify the folder, containing all the concerned files
If necessary, select the Match case option, if the string to search for, must have this exact case
Select, of course, the Regular expression search mode
Click on the Replace in Files button
Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !
Click on the Yes button, of the dialog Are you sure?

Et voilà !

=> All the contents of the files, that do NOT contain the string my baby goes away ( not embedded in a larger word ), are deleted

Notes :

The (?s) syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character )
Then, the remainder is an alternative between :
- .*\s(my baby goes away)\s?.* : All the contents of the current file scanned, containing, at least, one string my baby goes away, not glued in a larger expression. So, the last string my baby goes away is stored as group 1
- .+ : All the contents of the current file scanned, which do NOT contain the string my baby goes away
In replacement, the syntax ?1$0, strictly (?1$0), is a conditional replacement that means :
- If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string ( $0 ), that is to say all the contents matched !
- If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted
A question mark ? , after the final syntax \s , is necessary, for the unique case, where the string my baby goes away ends the current file, without any final line break !

Best Regards,

guy038

P.S :

As described above, sometimes, it’s easier to use the general template of a list of alternatives : (NOT This|NOT That|.....)|(This)|(That)......

All the alternatives to EXCLUDE, are re-written, with the syntax \1, in the replacement part
All the alternatives to INCLUDE, are replaced, thanks to each syntax (?#....), in the remplacement part ( # > 1 ) OR deleted if this syntax is absent

Consider, for instance, the original text, below :

Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is !
    
"Tarzan and Jane"  or "Jane and Tarzan"

And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !

Then, we could use the simple S/R :

SEARCH : ("Tarzan"|"Jane")|(Tarzan)|(Jane)

REPLACE \1(?2TARZAN)(?3JANE)

As the replacement action is identical, for each first name, we could also use :

SEARCH ("Tarzan"|"Jane")|(Tarzan|Jane)

REPLACE \1(?2\U\2)

Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form \1 stands for an empty string !

Of course, the two following S/R, more complicated, may be used and produce the same replacements :

SEARCH (?<!")(Tarzan|Jane)|(Tarzan|Jane)(?!")

REPLACE \U\1\2

or

SEARCH (?<!")(Tarzan|Jane)|((?1))(?!")

REPLACE \U\1\2

After replacement, we get, in all cases, the new text, below :

JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is !

"TARZAN and JANE"  or "JANE and TARZAN"

For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

You may, also, look for valuable informations, on the sites, below :

http://www.regular-expressions.info

http://www.rexegg.com

http://perldoc.perl.org/perlre.html

Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))