Delete the entire content of all files with less than 100 words
-
@guy038 @Terry-R @Alan-Kilborn @Neil-Schipper
thank you all. It is always a challenge to discover regex solutions.
by the way, I didn’t know the method with
[[:punct:]]Where can I find about this regex method on internet? I don’t know how to search about it… -
-
@guy038 said in Delete the entire content of all files with less than 100 words:
(?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\zOne more question I have for @guy038 I want to use one of your GENERIC S/R for this case. SO I need to delete the content of a file that have less then 10 words between section <START> and <FINAL>
<START> The first, thing to note when <FINAL>So, I test with all your GENERIC regex formulas you done a long time ago.
BSR =
<START>
ESR =<FINAL>
FR =(?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,10}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\zREGEX:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)(?-i:BSR|\G(?!^))(?s:(?!ESR).)*?\K(?-i:FR)(?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)(?-i:BSR|(?!^)\G)(?s:(?!ESR).)*?\K(?-i:FR)(?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)It is not working, in any of the cases. I get the same message on F/R: “Cannot find the text…”
-
Hi, @rodica-f and All,
EDIT : The regexes, below, are incomplete. See the correct solution in my next post
You do not need to use these generic regexes at all !
Simply, replace
\Aby<START>and\zby<FINAL>and, of course, change the value of the quantifier of the non-capturing group from98to8, giving the functional regex S/R below :SEARCH
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>REPLACE
Leave EMPTY
So, the general formula for deleting all file contents, if there are less than
Nwords between the two boundaries<START>and<FINAL>, is :SEARCH
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>REPLACE
Leave EMPTYBR
guy038
-
@guy038 correct me if I’m wrong. The GENERIC formula in this case will be:
(?s)BSR(FR)*ESR|BSR+ESRI think I’m wrong somewhere.
-
@guy038 by the way I test your generic formula you done for me.
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>In the context below, delete only everything that is framed in <START> and <FINAL>
But does not delete the entire file, I mean the other words around it.
blah blah blah <START> The first, thing to note when <FINAL> blah blah -
Hello, @rodica-f and All,
Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :
SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\zREPLACE
Leave EMPTYAnd the general formula for deleting all file contents, if there are less than
Nwords between the two boundaries<START>and<FINAL>, becomes :SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\zREPLACE
Leave EMPTY
This regex will delete all file contents in all these cases :
-
If there no
non-spacechar (0word ), and only somespacechars => the regex is\A.*<START>[[:space:]]+<FINAL>.*\z( the part after the|symbol ) -
If there are several
non-spacechars ( one word ), possibly surrounded withspacechars => quantifier =0and the regex becomes(?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z -
If there are several
non-spacechars followed withspacechars, twice ( so two words) => quantifier =1and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z -
If there are several
non-spacechars followed withspacechars, third times ( so three words) => quantifier =2and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z
and so on… till :
- If there are several
non-spacechars followed withspacechars, ninth times ( so nine words) => quantifier =8and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z
Now, to answer your question, I would say :
SEARCH
(?s)\A.*BSR(FR)ESR.*\zwhere FR =
[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,N-2}[^[:space:]]+[[:space:]]*OR FR =[[:space:]]+( case no word )Best Regards,
guy038
-
-
@guy038 thank you very much !
-
Delete the entire content of all files with less than 6 words
FIND:
\A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\zREPLACE:
(LEAVE EMPTY) -
Hi, @rodica-f and All,
I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !
First, and just anecdotal, the
(?i)modifier is useless as no range of letters occurs in your regexSecondly, this regex will delete all file contents if more than
0word char and less than7word charsThirdly, let’s consider this somple phrase :
let abc - xyzIt contains
4non-space expressions (let,abc,-andxyz)Your regex seems OK as it correctly select all text which contains less than
7wordsNow, change the
-sign by a+sign :let abc + xyzThis time, your regex does not match anything although there are, still,
4non-space expressions :((
Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !
[^\w+]*means “find a a char different from a word char and different from the + sign”, repeated from0to any[\w*]+means “find a word char or a * symbol”, repeated from1to any[^\w*]+means “find a char different from a word char and different from the * symbol”, repeated from1to anySo, an almost-correct solution would be
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z. However, note that it also matches a true empty file which does not need any replacement as already empty !!
Now, the important drawback of using word chars
\wand non-word chars[^\w], is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :This is a simple exampleand :
This is a sim-ple exampleIf I use my last “word” version
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z, it matches the textThis is a simple exampleand not the textThis is a sim-ple example! Because, in the former case, it counts5words and, in the later case, it counts6wordsThat’s why my previous and @terry-r’s version, using non-space characters
[[:^space:]]and space chars[[:space:]], seems more rigorous and practical ;-))Best Regards
guy038
-
@guy038 said in Delete the entire content of all files with less than 100 words:
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\zMy joy is that, thanks to my regex, an alternative method has been discovered, quite good.
thank you @guy038