Regex: Can I Delete the content of files that doesn't have some words?
-
good day, everyone. Just a question. I have this words in many files, but not in all files. For example:
++++++++++++++++±
text text
my baby goes away
text text
++++++++++++++++±I want to delete all contents of those files that doesn’t have this unique words.
I try something, but doesn’t work too well.
check dot matches newlines and Search
^(?!.*\s(my baby goes away)\s).*
Any suggestion?
-
Hello @robin-cruise,
First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))
Now, if all these files are located in a specific folder :
-
Open the Find in Files dialog ( Ctrl +Shift +F )
-
In the Find what: zone, type
(?s).*\s(my baby goes away)\s?.*|.+
-
In the Replace with: zone, type
?1$0
-
In the Filters zone, enter
\*.txt
or else… -
In the Directory zone, specify the folder, containing all the concerned files
-
If necessary, select the Match case option, if the string to search for, must have this exact case
-
Select, of course, the Regular expression search mode
-
Click on the Replace in Files button
-
Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !
-
Click on the Yes button, of the dialog Are you sure?
Et voilà !
=> All the contents of the files, that do NOT contain the string
my baby goes away
( not embedded in a larger word ), are deletedNotes :
-
The
(?s)
syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character ) -
Then, the remainder is an alternative between :
-
.*\s(my baby goes away)\s?.*
: All the contents of the current file scanned, containing, at least, one stringmy baby goes away
, not glued in a larger expression. So, the last stringmy baby goes away
is stored as group 1 -
.+
: All the contents of the current file scanned, which do NOT contain the stringmy baby goes away
-
-
In replacement, the syntax
?1$0
, strictly(?1$0)
, is a conditional replacement that means :-
If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string (
$0
), that is to say all the contents matched ! -
If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted
-
-
A question mark
?
, after the final syntax\s
, is necessary, for the unique case, where the stringmy baby goes away
ends the current file, without any final line break !
Best Regards,
guy038
P.S :
As described above, sometimes, it’s easier to use the general template of a list of alternatives :
(NOT This|NOT That|.....)|(This)|(That)......
-
All the alternatives to
EXCLUDE
, are re-written, with the syntax\1
, in the replacement part -
All the alternatives to
INCLUDE
, are replaced, thanks to each syntax(?#....)
, in the remplacement part (# > 1
) OR deleted if this syntax is absent
Consider, for instance, the original text, below :
Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is ! "Tarzan and Jane" or "Jane and Tarzan"
And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !
Then, we could use the simple S/R :
SEARCH :
("Tarzan"|"Jane")|(Tarzan)|(Jane)
REPLACE
\1(?2TARZAN)(?3JANE)
As the replacement action is identical, for each first name, we could also use :
SEARCH
("Tarzan"|"Jane")|(Tarzan|Jane)
REPLACE
\1(?2\U\2)
Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form
\1
stands for an empty string !
Of course, the two following S/R, more complicated, may be used and produce the same replacements :
SEARCH
(?<!")(Tarzan|Jane)|(Tarzan|Jane)(?!")
REPLACE
\U\1\2
or
SEARCH
(?<!")(Tarzan|Jane)|((?1))(?!")
REPLACE
\U\1\2
After replacement, we get, in all cases, the new text, below :
JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is ! "TARZAN and JANE" or "JANE and TARZAN"
For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :
http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by
Notepad++
, since its6.0
version, at the TWO addresses below :http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
-
The FIRST link explains the syntax, of regular expressions, in the SEARCH part
-
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part
You may, also, look for valuable informations, on the sites, below :
http://www.regular-expressions.info
http://perldoc.perl.org/perlre.html
Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))
-