Community
    • Login

    Delete the entire content of all files with less than 100 words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 6 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Paul WormerP
      Paul Wormer @rodica F
      last edited by

      @rodica-f
      Npp user manual

      1 Reply Last reply Reply Quote 0
      • rodica FR
        rodica F @guy038
        last edited by rodica F

        @guy038 said in Delete the entire content of all files with less than 100 words:
        (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

        One more question I have for @guy038 I want to use one of your GENERIC S/R for this case. SO I need to delete the content of a file that have less then 10 words between section <START> and <FINAL>

        <START>
        
        The first, thing to note when
        
        <FINAL>
        

        So, I test with all your GENERIC regex formulas you done a long time ago.

        BSR = <START>
        ESR = <FINAL>
        FR = (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,10}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

        REGEX:

        (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)

        (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

        (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR

        (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

        (?-i:BSR|\G(?!^))(?s:(?!ESR).)*?\K(?-i:FR)

        (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

        (?-i:BSR|(?!^)\G)(?s:(?!ESR).)*?\K(?-i:FR)

        (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

        It is not working, in any of the cases. I get the same message on F/R: “Cannot find the text…”

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hi, @rodica-f and All,

          EDIT : The regexes, below, are incomplete. See the correct solution in my next post

          You do not need to use these generic regexes at all !

          Simply, replace \A by <START> and \z by <FINAL> and, of course, change the value of the quantifier of the non-capturing group from 98 to 8, giving the functional regex S/R below :

          SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

          REPLACE Leave EMPTY


          So, the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, is :

          SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

          REPLACE Leave EMPTY

          BR

          guy038

          rodica FR 1 Reply Last reply Reply Quote 1
          • rodica FR
            rodica F @guy038
            last edited by

            @guy038 correct me if I’m wrong. The GENERIC formula in this case will be:

            (?s)BSR(FR)*ESR|BSR+ESR

            I think I’m wrong somewhere.

            rodica FR 1 Reply Last reply Reply Quote 0
            • rodica FR
              rodica F @rodica F
              last edited by

              @guy038 by the way I test your generic formula you done for me.

              (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

              In the context below, delete only everything that is framed in <START> and <FINAL>

              But does not delete the entire file, I mean the other words around it.

              blah blah     blah
              
              
              <START>
              
              The first, thing to note when
              
              <FINAL>
              
                 blah blah
              
              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hello, @rodica-f and All,

                Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :

                SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

                REPLACE Leave EMPTY

                And the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, becomes :

                SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

                REPLACE Leave EMPTY


                This regex will delete all file contents in all these cases :

                • If there no non-space char ( 0 word ), and only some space chars => the regex is \A.*<START>[[:space:]]+<FINAL>.*\z ( the part after the | symbol )

                • If there are several non-space chars ( one word ), possibly surrounded with space chars => quantifier = 0 and the regex becomes (?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z

                • If there are several non-space chars followed with space chars, twice ( so two words) => quantifier = 1 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z

                • If there are several non-space chars followed with space chars, third times ( so three words) => quantifier = 2 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z

                and so on… till :

                • If there are several non-space chars followed with space chars, ninth times ( so nine words) => quantifier = 8 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z

                Now, to answer your question, I would say :

                SEARCH (?s)\A.*BSR(FR)ESR.*\z

                where FR = [[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*    OR    FR = [[:space:]]+ ( case no word )

                Best Regards,

                guy038

                rodica FR 1 Reply Last reply Reply Quote 1
                • rodica FR
                  rodica F @guy038
                  last edited by

                  @guy038 thank you very much !

                  rodica FR 1 Reply Last reply Reply Quote 0
                  • rodica FR
                    rodica F @rodica F
                    last edited by

                    @rodica-f

                    Delete the entire content of all files with less than 6 words

                    FIND:
                    \A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z

                    REPLACE: (LEAVE EMPTY)

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @rodica-f and All,

                      I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !

                      First, and just anecdotal, the (?i) modifier is useless as no range of letters occurs in your regex

                      Secondly, this regex will delete all file contents if more than 0 word char and less than 7 word chars

                      Thirdly, let’s consider this somple phrase :

                      let abc - xyz
                      

                      It contains 4 non-space expressions ( let, abc, - and xyz )

                      Your regex seems OK as it correctly select all text which contains less than 7 words

                      Now, change the - sign by a + sign :

                      let abc + xyz
                      

                      This time, your regex does not match anything although there are, still, 4 non-space expressions :((


                      Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !

                      [^\w+]* means “find a a char different from a word char and different from the + sign”, repeated from 0 to any

                      [\w*]+ means “find a word char or a * symbol”, repeated from 1 to any

                      [^\w*]+ means “find a char different from a word char and different from the * symbol”, repeated from 1 to any

                      So, an almost-correct solution would be \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!


                      Now, the important drawback of using word chars \w and non-word chars [^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :

                      This is a simple example
                      

                      and :

                      This is a sim-ple example
                      

                      If I use my last “word” version \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the text This is a simple example and not the text This is a sim-ple example ! Because, in the former case, it counts 5 words and, in the later case, it counts 6 words

                      That’s why my previous and @terry-r’s version, using non-space characters [[:^space:]] and space chars [[:space:]], seems more rigorous and practical ;-))

                      Best Regards

                      guy038

                      rodica FR 1 Reply Last reply Reply Quote 3
                      • rodica FR
                        rodica F @guy038
                        last edited by

                        @guy038 said in Delete the entire content of all files with less than 100 words:

                        \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z

                        My joy is that, thanks to my regex, an alternative method has been discovered, quite good.

                        thank you @guy038

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors