Community
    • Login

    Regex: Can I Delete the content of files that doesn't have some words?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 2.0k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR Offline
      Robin Cruise
      last edited by Robin Cruise

      good day, everyone. Just a question. I have this words in many files, but not in all files. For example:

      ++++++++++++++++±
      text text
      my baby goes away
      text text
      ++++++++++++++++±

      I want to delete all contents of those files that doesn’t have this unique words.

      I try something, but doesn’t work too well.

      check dot matches newlines and Search ^(?!.*\s(my baby goes away)\s).*

      Any suggestion?

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hello @robin-cruise,

        First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))

        Now, if all these files are located in a specific folder :

        • Open the Find in Files dialog ( Ctrl +Shift +F )

        • In the Find what: zone, type (?s).*\s(my baby goes away)\s?.*|.+

        • In the Replace with: zone, type ?1$0

        • In the Filters zone, enter \*.txt or else…

        • In the Directory zone, specify the folder, containing all the concerned files

        • If necessary, select the Match case option, if the string to search for, must have this exact case

        • Select, of course, the Regular expression search mode

        • Click on the Replace in Files button

        • Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !

        • Click on the Yes button, of the dialog Are you sure?

        Et voilà !

        => All the contents of the files, that do NOT contain the string my baby goes away ( not embedded in a larger word ), are deleted

        Notes :

        • The (?s) syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character )

        • Then, the remainder is an alternative between :

          • .*\s(my baby goes away)\s?.* : All the contents of the current file scanned, containing, at least, one string my baby goes away, not glued in a larger expression. So, the last string my baby goes away is stored as group 1

          • .+ : All the contents of the current file scanned, which do NOT contain the string my baby goes away

        • In replacement, the syntax ?1$0, strictly (?1$0), is a conditional replacement that means :

          • If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string ( $0 ), that is to say all the contents matched !

          • If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted

        • A question mark ? , after the final syntax \s , is necessary, for the unique case, where the string my baby goes away ends the current file, without any final line break !

        Best Regards,

        guy038

        P.S :

        As described above, sometimes, it’s easier to use the general template of a list of alternatives : (NOT This|NOT That|.....)|(This)|(That)......

        • All the alternatives to EXCLUDE, are re-written, with the syntax \1, in the replacement part

        • All the alternatives to INCLUDE, are replaced, thanks to each syntax (?#....), in the remplacement part ( # > 1 ) OR deleted if this syntax is absent


        Consider, for instance, the original text, below :

        Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is !
            
        "Tarzan and Jane"  or "Jane and Tarzan"
        

        And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !

        Then, we could use the simple S/R :

        SEARCH : ("Tarzan"|"Jane")|(Tarzan)|(Jane)

        REPLACE \1(?2TARZAN)(?3JANE)

        As the replacement action is identical, for each first name, we could also use :

        SEARCH ("Tarzan"|"Jane")|(Tarzan|Jane)

        REPLACE \1(?2\U\2)

        Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form \1 stands for an empty string !


        Of course, the two following S/R, more complicated, may be used and produce the same replacements :

        SEARCH (?<!")(Tarzan|Jane)|(Tarzan|Jane)(?!")

        REPLACE \U\1\2

        or

        SEARCH (?<!")(Tarzan|Jane)|((?1))(?!")

        REPLACE \U\1\2


        After replacement, we get, in all cases, the new text, below :

        JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is !
        
        "TARZAN and JANE"  or "JANE and TARZAN"
        

        For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

        http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

        In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

        http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

        http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

        • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

        • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


        You may, also, look for valuable informations, on the sites, below :

        http://www.regular-expressions.info

        http://www.rexegg.com

        http://perldoc.perl.org/perlre.html

        Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

        1 Reply Last reply Reply Quote 0

        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

        With your input, this post could be even better 💗

        Register Login
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors