Community
    • Login

    Delete the entire content of all files with less than 100 words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 6 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @rodica-f, @neil-schipper, @alan-kilborn, @terry-r and All,

      @terry-r :

      I found out a variant , based on your use of the [[:space:]] POSIX character class !

      SEARCH (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

      REPLACE Leave EMPTY

      This regex S/R will delete any content of files containing less than 100 words OR even 0 non-space char followed with some [[:space:]] chars

      Best Regards,

      guy038

      rodica FR 2 Replies Last reply Reply Quote 3
      • rodica FR
        rodica F @guy038
        last edited by

        @guy038 @Terry-R @Alan-Kilborn @Neil-Schipper

        thank you all. It is always a challenge to discover regex solutions.

        by the way, I didn’t know the method with [[:punct:]] Where can I find about this regex method on internet? I don’t know how to search about it…

        Paul WormerP 1 Reply Last reply Reply Quote 0
        • Paul WormerP
          Paul Wormer @rodica F
          last edited by

          @rodica-f
          Npp user manual

          1 Reply Last reply Reply Quote 0
          • rodica FR
            rodica F @guy038
            last edited by rodica F

            @guy038 said in Delete the entire content of all files with less than 100 words:
            (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

            One more question I have for @guy038 I want to use one of your GENERIC S/R for this case. SO I need to delete the content of a file that have less then 10 words between section <START> and <FINAL>

            <START>
            
            The first, thing to note when
            
            <FINAL>
            

            So, I test with all your GENERIC regex formulas you done a long time ago.

            BSR = <START>
            ESR = <FINAL>
            FR = (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,10}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

            REGEX:

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

            (?-i:BSR|\G(?!^))(?s:(?!ESR).)*?\K(?-i:FR)

            (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

            (?-i:BSR|(?!^)\G)(?s:(?!ESR).)*?\K(?-i:FR)

            (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

            It is not working, in any of the cases. I get the same message on F/R: “Cannot find the text…”

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @rodica-f and All,

              EDIT : The regexes, below, are incomplete. See the correct solution in my next post

              You do not need to use these generic regexes at all !

              Simply, replace \A by <START> and \z by <FINAL> and, of course, change the value of the quantifier of the non-capturing group from 98 to 8, giving the functional regex S/R below :

              SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

              REPLACE Leave EMPTY


              So, the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, is :

              SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

              REPLACE Leave EMPTY

              BR

              guy038

              rodica FR 1 Reply Last reply Reply Quote 1
              • rodica FR
                rodica F @guy038
                last edited by

                @guy038 correct me if I’m wrong. The GENERIC formula in this case will be:

                (?s)BSR(FR)*ESR|BSR+ESR

                I think I’m wrong somewhere.

                rodica FR 1 Reply Last reply Reply Quote 0
                • rodica FR
                  rodica F @rodica F
                  last edited by

                  @guy038 by the way I test your generic formula you done for me.

                  (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

                  In the context below, delete only everything that is framed in <START> and <FINAL>

                  But does not delete the entire file, I mean the other words around it.

                  blah blah     blah
                  
                  
                  <START>
                  
                  The first, thing to note when
                  
                  <FINAL>
                  
                     blah blah
                  
                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @rodica-f and All,

                    Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :

                    SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

                    REPLACE Leave EMPTY

                    And the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, becomes :

                    SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

                    REPLACE Leave EMPTY


                    This regex will delete all file contents in all these cases :

                    • If there no non-space char ( 0 word ), and only some space chars => the regex is \A.*<START>[[:space:]]+<FINAL>.*\z ( the part after the | symbol )

                    • If there are several non-space chars ( one word ), possibly surrounded with space chars => quantifier = 0 and the regex becomes (?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z

                    • If there are several non-space chars followed with space chars, twice ( so two words) => quantifier = 1 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z

                    • If there are several non-space chars followed with space chars, third times ( so three words) => quantifier = 2 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z

                    and so on… till :

                    • If there are several non-space chars followed with space chars, ninth times ( so nine words) => quantifier = 8 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z

                    Now, to answer your question, I would say :

                    SEARCH (?s)\A.*BSR(FR)ESR.*\z

                    where FR = [[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*    OR    FR = [[:space:]]+ ( case no word )

                    Best Regards,

                    guy038

                    rodica FR 1 Reply Last reply Reply Quote 1
                    • rodica FR
                      rodica F @guy038
                      last edited by

                      @guy038 thank you very much !

                      rodica FR 1 Reply Last reply Reply Quote 0
                      • rodica FR
                        rodica F @rodica F
                        last edited by

                        @rodica-f

                        Delete the entire content of all files with less than 6 words

                        FIND:
                        \A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z

                        REPLACE: (LEAVE EMPTY)

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hi, @rodica-f and All,

                          I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !

                          First, and just anecdotal, the (?i) modifier is useless as no range of letters occurs in your regex

                          Secondly, this regex will delete all file contents if more than 0 word char and less than 7 word chars

                          Thirdly, let’s consider this somple phrase :

                          let abc - xyz
                          

                          It contains 4 non-space expressions ( let, abc, - and xyz )

                          Your regex seems OK as it correctly select all text which contains less than 7 words

                          Now, change the - sign by a + sign :

                          let abc + xyz
                          

                          This time, your regex does not match anything although there are, still, 4 non-space expressions :((


                          Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !

                          [^\w+]* means “find a a char different from a word char and different from the + sign”, repeated from 0 to any

                          [\w*]+ means “find a word char or a * symbol”, repeated from 1 to any

                          [^\w*]+ means “find a char different from a word char and different from the * symbol”, repeated from 1 to any

                          So, an almost-correct solution would be \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!


                          Now, the important drawback of using word chars \w and non-word chars [^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :

                          This is a simple example
                          

                          and :

                          This is a sim-ple example
                          

                          If I use my last “word” version \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the text This is a simple example and not the text This is a sim-ple example ! Because, in the former case, it counts 5 words and, in the later case, it counts 6 words

                          That’s why my previous and @terry-r’s version, using non-space characters [[:^space:]] and space chars [[:space:]], seems more rigorous and practical ;-))

                          Best Regards

                          guy038

                          rodica FR 1 Reply Last reply Reply Quote 3
                          • rodica FR
                            rodica F @guy038
                            last edited by

                            @guy038 said in Delete the entire content of all files with less than 100 words:

                            \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z

                            My joy is that, thanks to my regex, an alternative method has been discovered, quite good.

                            thank you @guy038

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors