• Login
Community
  • Login

Delete the entire content of all files with less than 100 words

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
25 Posts 6 Posters 1.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    Paul Wormer @rodica F
    last edited by Apr 7, 2022, 6:57 AM

    @rodica-f
    Npp user manual

    1 Reply Last reply Reply Quote 0
    • R
      rodica F @guy038
      last edited by rodica F Apr 7, 2022, 8:15 AM Apr 7, 2022, 8:14 AM

      @guy038 said in Delete the entire content of all files with less than 100 words:
      (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

      One more question I have for @guy038 I want to use one of your GENERIC S/R for this case. SO I need to delete the content of a file that have less then 10 words between section <START> and <FINAL>

      <START>
      
      The first, thing to note when
      
      <FINAL>
      

      So, I test with all your GENERIC regex formulas you done a long time ago.

      BSR = <START>
      ESR = <FINAL>
      FR = (?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,10}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z

      REGEX:

      (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)

      (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

      (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR

      (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

      (?-i:BSR|\G(?!^))(?s:(?!ESR).)*?\K(?-i:FR)

      (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

      (?-i:BSR|(?!^)\G)(?s:(?!ESR).)*?\K(?-i:FR)

      (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

      It is not working, in any of the cases. I get the same message on F/R: “Cannot find the text…”

      1 Reply Last reply Reply Quote 0
      • G
        guy038
        last edited by guy038 Apr 7, 2022, 10:11 AM Apr 7, 2022, 8:53 AM

        Hi, @rodica-f and All,

        EDIT : The regexes, below, are incomplete. See the correct solution in my next post

        You do not need to use these generic regexes at all !

        Simply, replace \A by <START> and \z by <FINAL> and, of course, change the value of the quantifier of the non-capturing group from 98 to 8, giving the functional regex S/R below :

        SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

        REPLACE Leave EMPTY


        So, the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, is :

        SEARCH (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

        REPLACE Leave EMPTY

        BR

        guy038

        R 1 Reply Last reply Apr 7, 2022, 9:10 AM Reply Quote 1
        • R
          rodica F @guy038
          last edited by Apr 7, 2022, 9:10 AM

          @guy038 correct me if I’m wrong. The GENERIC formula in this case will be:

          (?s)BSR(FR)*ESR|BSR+ESR

          I think I’m wrong somewhere.

          R 1 Reply Last reply Apr 7, 2022, 9:21 AM Reply Quote 0
          • R
            rodica F @rodica F
            last edited by Apr 7, 2022, 9:21 AM

            @guy038 by the way I test your generic formula you done for me.

            (?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>

            In the context below, delete only everything that is framed in <START> and <FINAL>

            But does not delete the entire file, I mean the other words around it.

            blah blah     blah
            
            
            <START>
            
            The first, thing to note when
            
            <FINAL>
            
               blah blah
            
            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by guy038 Apr 7, 2022, 10:26 AM Apr 7, 2022, 10:07 AM

              Hello, @rodica-f and All,

              Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :

              SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

              REPLACE Leave EMPTY

              And the general formula for deleting all file contents, if there are less than N words between the two boundaries <START> and <FINAL>, becomes :

              SEARCH (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z

              REPLACE Leave EMPTY


              This regex will delete all file contents in all these cases :

              • If there no non-space char ( 0 word ), and only some space chars => the regex is \A.*<START>[[:space:]]+<FINAL>.*\z ( the part after the | symbol )

              • If there are several non-space chars ( one word ), possibly surrounded with space chars => quantifier = 0 and the regex becomes (?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z

              • If there are several non-space chars followed with space chars, twice ( so two words) => quantifier = 1 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z

              • If there are several non-space chars followed with space chars, third times ( so three words) => quantifier = 2 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z

              and so on… till :

              • If there are several non-space chars followed with space chars, ninth times ( so nine words) => quantifier = 8 and the regex becomes (?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z

              Now, to answer your question, I would say :

              SEARCH (?s)\A.*BSR(FR)ESR.*\z

              where FR = [[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*    OR    FR = [[:space:]]+ ( case no word )

              Best Regards,

              guy038

              R 1 Reply Last reply Apr 7, 2022, 10:51 AM Reply Quote 1
              • R
                rodica F @guy038
                last edited by Apr 7, 2022, 10:51 AM

                @guy038 thank you very much !

                R 1 Reply Last reply Apr 7, 2022, 2:33 PM Reply Quote 0
                • R
                  rodica F @rodica F
                  last edited by Apr 7, 2022, 2:33 PM

                  @rodica-f

                  Delete the entire content of all files with less than 6 words

                  FIND:
                  \A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z

                  REPLACE: (LEAVE EMPTY)

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 Apr 8, 2022, 5:24 AM Apr 8, 2022, 4:59 AM

                    Hi, @rodica-f and All,

                    I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !

                    First, and just anecdotal, the (?i) modifier is useless as no range of letters occurs in your regex

                    Secondly, this regex will delete all file contents if more than 0 word char and less than 7 word chars

                    Thirdly, let’s consider this somple phrase :

                    let abc - xyz
                    

                    It contains 4 non-space expressions ( let, abc, - and xyz )

                    Your regex seems OK as it correctly select all text which contains less than 7 words

                    Now, change the - sign by a + sign :

                    let abc + xyz
                    

                    This time, your regex does not match anything although there are, still, 4 non-space expressions :((


                    Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !

                    [^\w+]* means “find a a char different from a word char and different from the + sign”, repeated from 0 to any

                    [\w*]+ means “find a word char or a * symbol”, repeated from 1 to any

                    [^\w*]+ means “find a char different from a word char and different from the * symbol”, repeated from 1 to any

                    So, an almost-correct solution would be \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!


                    Now, the important drawback of using word chars \w and non-word chars [^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :

                    This is a simple example
                    

                    and :

                    This is a sim-ple example
                    

                    If I use my last “word” version \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the text This is a simple example and not the text This is a sim-ple example ! Because, in the former case, it counts 5 words and, in the later case, it counts 6 words

                    That’s why my previous and @terry-r’s version, using non-space characters [[:^space:]] and space chars [[:space:]], seems more rigorous and practical ;-))

                    Best Regards

                    guy038

                    R 1 Reply Last reply Apr 8, 2022, 8:33 AM Reply Quote 3
                    • R
                      rodica F @guy038
                      last edited by Apr 8, 2022, 8:33 AM

                      @guy038 said in Delete the entire content of all files with less than 100 words:

                      \A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z

                      My joy is that, thanks to my regex, an alternative method has been discovered, quite good.

                      thank you @guy038

                      1 Reply Last reply Reply Quote 1
                      25 out of 25
                      • First post
                        25/25
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors