Community
    • Login

    Regex: Can I Delete the content of files that doesn't have some words?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by Robin Cruise

      good day, everyone. Just a question. I have this words in many files, but not in all files. For example:

      ++++++++++++++++±
      text text
      my baby goes away
      text text
      ++++++++++++++++±

      I want to delete all contents of those files that doesn’t have this unique words.

      I try something, but doesn’t work too well.

      check dot matches newlines and Search ^(?!.*\s(my baby goes away)\s).*

      Any suggestion?

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello @robin-cruise,

        First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))

        Now, if all these files are located in a specific folder :

        • Open the Find in Files dialog ( Ctrl +Shift +F )

        • In the Find what: zone, type (?s).*\s(my baby goes away)\s?.*|.+

        • In the Replace with: zone, type ?1$0

        • In the Filters zone, enter \*.txt or else…

        • In the Directory zone, specify the folder, containing all the concerned files

        • If necessary, select the Match case option, if the string to search for, must have this exact case

        • Select, of course, the Regular expression search mode

        • Click on the Replace in Files button

        • Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !

        • Click on the Yes button, of the dialog Are you sure?

        Et voilà !

        => All the contents of the files, that do NOT contain the string my baby goes away ( not embedded in a larger word ), are deleted

        Notes :

        • The (?s) syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character )

        • Then, the remainder is an alternative between :

          • .*\s(my baby goes away)\s?.* : All the contents of the current file scanned, containing, at least, one string my baby goes away, not glued in a larger expression. So, the last string my baby goes away is stored as group 1

          • .+ : All the contents of the current file scanned, which do NOT contain the string my baby goes away

        • In replacement, the syntax ?1$0, strictly (?1$0), is a conditional replacement that means :

          • If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string ( $0 ), that is to say all the contents matched !

          • If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted

        • A question mark ? , after the final syntax \s , is necessary, for the unique case, where the string my baby goes away ends the current file, without any final line break !

        Best Regards,

        guy038

        P.S :

        As described above, sometimes, it’s easier to use the general template of a list of alternatives : (NOT This|NOT That|.....)|(This)|(That)......

        • All the alternatives to EXCLUDE, are re-written, with the syntax \1, in the replacement part

        • All the alternatives to INCLUDE, are replaced, thanks to each syntax (?#....), in the remplacement part ( # > 1 ) OR deleted if this syntax is absent


        Consider, for instance, the original text, below :

        Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is !
            
        "Tarzan and Jane"  or "Jane and Tarzan"
        

        And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !

        Then, we could use the simple S/R :

        SEARCH : ("Tarzan"|"Jane")|(Tarzan)|(Jane)

        REPLACE \1(?2TARZAN)(?3JANE)

        As the replacement action is identical, for each first name, we could also use :

        SEARCH ("Tarzan"|"Jane")|(Tarzan|Jane)

        REPLACE \1(?2\U\2)

        Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form \1 stands for an empty string !


        Of course, the two following S/R, more complicated, may be used and produce the same replacements :

        SEARCH (?<!")(Tarzan|Jane)|(Tarzan|Jane)(?!")

        REPLACE \U\1\2

        or

        SEARCH (?<!")(Tarzan|Jane)|((?1))(?!")

        REPLACE \U\1\2


        After replacement, we get, in all cases, the new text, below :

        JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is !
        
        "TARZAN and JANE"  or "JANE and TARZAN"
        

        For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

        http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

        In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

        http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

        http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

        • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

        • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


        You may, also, look for valuable informations, on the sites, below :

        http://www.regular-expressions.info

        http://www.rexegg.com

        http://perldoc.perl.org/perlre.html

        Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors