Community
    • Login

    Delete the entire content of all files with less than 100 words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 6 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @rodica-f and All,

      Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :

      SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

      REPLACE Leave EMPTY

      And the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, becomes :

      SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

      REPLACE Leave EMPTY


      This regex will delete all file contents in all these cases :

      • If there no non-space char ( 0 word ), and only some space chars => the regex is \A.*<START>[[:space:]]+<FINAL>.*\z ( the part after the | symbol )

      • If there are several non-space chars ( one word ), possibly surrounded with space chars => quantifier = 0 and the regex becomes (?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z

      • If there are several non-space chars followed with space chars, twice ( so two words) => quantifier = 1 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z

      • If there are several non-space chars followed with space chars, third times ( so three words) => quantifier = 2 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z

      and so on… till :

      • If there are several non-space chars followed with space chars, ninth times ( so nine words) => quantifier = 8 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z

      Now, to answer your question, I would say :

      SEARCH (?s)\A.*BSR(FR)ESR.*\z

      where FR = [[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*    OR    FR = [[:space:]]+ ( case no word )

      Best Regards,

      guy038

      rodica FR 1 Reply Last reply Reply Quote 1
      • rodica FR
        rodica F @guy038
        last edited by

        @guy038 thank you very much !

        rodica FR 1 Reply Last reply Reply Quote 0
        • rodica FR
          rodica F @rodica F
          last edited by

          @rodica-f

          Delete the entire content of all files with less than 6 words

          FIND:
          \A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z

          REPLACE: (LEAVE EMPTY)

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @rodica-f and All,

            I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !

            First, and just anecdotal, the (?i) modifier is useless as no range of letters occurs in your regex

            Secondly, this regex will delete all file contents if more than 0 word char and less than 7 word chars

            Thirdly, let’s consider this somple phrase :

            let abc - xyz
            

            It contains 4 non-space expressions ( let, abc, - and xyz )

            Your regex seems OK as it correctly select all text which contains less than 7 words

            Now, change the - sign by a + sign :

            let abc + xyz
            

            This time, your regex does not match anything although there are, still, 4 non-space expressions :((


            Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !

            [^\w+]* means “find a a char different from a word char and different from the + sign”, repeated from 0 to any

            [\w*]+ means “find a word char or a * symbol”, repeated from 1 to any

            [^\w*]+ means “find a char different from a word char and different from the * symbol”, repeated from 1 to any

            So, an almost-correct solution would be \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!


            Now, the important drawback of using word chars \w and non-word chars [^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :

            This is a simple example
            

            and :

            This is a sim-ple example
            

            If I use my last “word” version \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the text This is a simple example and not the text This is a sim-ple example ! Because, in the former case, it counts 5 words and, in the later case, it counts 6 words

            That’s why my previous and @terry-r’s version, using non-space characters [[:^space:]] and space chars [[:space:]], seems more rigorous and practical ;-))

            Best Regards

            guy038

            rodica FR 1 Reply Last reply Reply Quote 3
            • rodica FR
              rodica F @guy038
              last edited by

              @guy038 said in Delete the entire content of all files with less than 100 words:

              \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z

              My joy is that, thanks to my regex, an alternative method has been discovered, quite good.

              thank you @guy038

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors