Regex: Can I Delete the content of files that doesn't have some words?



  • good day, everyone. Just a question. I have this words in many files, but not in all files. For example:

    ++++++++++++++++±
    text text
    my baby goes away
    text text
    ++++++++++++++++±

    I want to delete all contents of those files that doesn’t have this unique words.

    I try something, but doesn’t work too well.

    check dot matches newlines and Search ^(?!.*\s(my baby goes away)\s).*

    Any suggestion?



  • Hello @robin-cruise,

    First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))

    Now, if all these files are located in a specific folder :

    • Open the Find in Files dialog ( Ctrl +Shift +F )

    • In the Find what: zone, type (?s).*\s(my baby goes away)\s?.*|.+

    • In the Replace with: zone, type ?1$0

    • In the Filters zone, enter \*.txt or else…

    • In the Directory zone, specify the folder, containing all the concerned files

    • If necessary, select the Match case option, if the string to search for, must have this exact case

    • Select, of course, the Regular expression search mode

    • Click on the Replace in Files button

    • Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !

    • Click on the Yes button, of the dialog Are you sure?

    Et voilà !

    => All the contents of the files, that do NOT contain the string my baby goes away ( not embedded in a larger word ), are deleted

    Notes :

    • The (?s) syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character )

    • Then, the remainder is an alternative between :

      • .*\s(my baby goes away)\s?.* : All the contents of the current file scanned, containing, at least, one string my baby goes away, not glued in a larger expression. So, the last string my baby goes away is stored as group 1

      • .+ : All the contents of the current file scanned, which do NOT contain the string my baby goes away

    • In replacement, the syntax ?1$0, strictly (?1$0), is a conditional replacement that means :

      • If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string ( $0 ), that is to say all the contents matched !

      • If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted

    • A question mark ? , after the final syntax \s , is necessary, for the unique case, where the string my baby goes away ends the current file, without any final line break !

    Best Regards,

    guy038

    P.S :

    As described above, sometimes, it’s easier to use the general template of a list of alternatives : (NOT This|NOT That|.....)|(This)|(That)......

    • All the alternatives to EXCLUDE, are re-written, with the syntax \1, in the replacement part

    • All the alternatives to INCLUDE, are replaced, thanks to each syntax (?#....), in the remplacement part ( # > 1 ) OR deleted if this syntax is absent


    Consider, for instance, the original text, below :

    Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is !
        
    "Tarzan and Jane"  or "Jane and Tarzan"
    

    And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !

    Then, we could use the simple S/R :

    SEARCH : ("Tarzan"|"Jane")|(Tarzan)|(Jane)

    REPLACE \1(?2TARZAN)(?3JANE)

    As the replacement action is identical, for each first name, we could also use :

    SEARCH ("Tarzan"|"Jane")|(Tarzan|Jane)

    REPLACE \1(?2\U\2)

    Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form \1 stands for an empty string !


    Of course, the two following S/R, more complicated, may be used and produce the same replacements :

    SEARCH (?<!")(Tarzan|Jane)|(Tarzan|Jane)(?!")

    REPLACE \U\1\2

    or

    SEARCH (?<!")(Tarzan|Jane)|((?1))(?!")

    REPLACE \U\1\2


    After replacement, we get, in all cases, the new text, below :

    JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is !
    
    "TARZAN and JANE"  or "JANE and TARZAN"
    

    For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

    In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


    You may, also, look for valuable informations, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html

    Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))


Log in to reply