Community
    • Login

    Delete the entire content of all files with less than 100 words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 6 Posters 5.0k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G Offline
      guy038
      last edited by guy038

      Hello, @rodica-f and All,

      Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :

      SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

      REPLACE Leave EMPTY

      And the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, becomes :

      SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

      REPLACE Leave EMPTY


      This regex will delete all file contents in all these cases :

      • If there no non-space char ( 0 word ), and only some space chars => the regex is \A.*<START>[[:space:]]+<FINAL>.*\z ( the part after the | symbol )

      • If there are several non-space chars ( one word ), possibly surrounded with space chars => quantifier = 0 and the regex becomes (?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z

      • If there are several non-space chars followed with space chars, twice ( so two words) => quantifier = 1 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z

      • If there are several non-space chars followed with space chars, third times ( so three words) => quantifier = 2 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z

      and so on… till :

      • If there are several non-space chars followed with space chars, ninth times ( so nine words) => quantifier = 8 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z

      Now, to answer your question, I would say :

      SEARCH (?s)\A.*BSR(FR)ESR.*\z

      where FR = [[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*    OR    FR = [[:space:]]+ ( case no word )

      Best Regards,

      guy038

      rodica FR 1 Reply Last reply Reply Quote 1
      • rodica FR Offline
        rodica F @guy038
        last edited by

        @guy038 thank you very much !

        rodica FR 1 Reply Last reply Reply Quote 0
        • rodica FR Offline
          rodica F @rodica F
          last edited by

          @rodica-f

          Delete the entire content of all files with less than 6 words

          FIND:
          \A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z

          REPLACE: (LEAVE EMPTY)

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hi, @rodica-f and All,

            I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !

            First, and just anecdotal, the (?i) modifier is useless as no range of letters occurs in your regex

            Secondly, this regex will delete all file contents if more than 0 word char and less than 7 word chars

            Thirdly, let’s consider this somple phrase :

            let abc - xyz
            

            It contains 4 non-space expressions ( let, abc, - and xyz )

            Your regex seems OK as it correctly select all text which contains less than 7 words

            Now, change the - sign by a + sign :

            let abc + xyz
            

            This time, your regex does not match anything although there are, still, 4 non-space expressions :((


            Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !

            [^\w+]* means “find a a char different from a word char and different from the + sign”, repeated from 0 to any

            [\w*]+ means “find a word char or a * symbol”, repeated from 1 to any

            [^\w*]+ means “find a char different from a word char and different from the * symbol”, repeated from 1 to any

            So, an almost-correct solution would be \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!


            Now, the important drawback of using word chars \w and non-word chars [^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :

            This is a simple example
            

            and :

            This is a sim-ple example
            

            If I use my last “word” version \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the text This is a simple example and not the text This is a sim-ple example ! Because, in the former case, it counts 5 words and, in the later case, it counts 6 words

            That’s why my previous and @terry-r’s version, using non-space characters [[:^space:]] and space chars [[:space:]], seems more rigorous and practical ;-))

            Best Regards

            guy038

            rodica FR 1 Reply Last reply Reply Quote 3
            • rodica FR Offline
              rodica F @guy038
              last edited by

              @guy038 said in Delete the entire content of all files with less than 100 words:

              \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z

              My joy is that, thanks to my regex, an alternative method has been discovered, quite good.

              thank you @guy038

              1 Reply Last reply Reply Quote 1

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors