Delete the entire content of all files with less than 100 words
-
Hello, @rodica-f and All,
Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :
SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z
REPLACE
Leave EMPTY
And the general formula for deleting all file contents, if there are less than
N
words between the two boundaries<START>
and<FINAL>
, becomes :SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,
N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z
REPLACE
Leave EMPTY
This regex will delete all file contents in all these cases :
-
If there no
non-space
char (0
word ), and only somespace
chars => the regex is\A.*<START>[[:space:]]+<FINAL>.*\z
( the part after the|
symbol ) -
If there are several
non-space
chars ( one word ), possibly surrounded withspace
chars => quantifier =0
and the regex becomes(?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z
-
If there are several
non-space
chars followed withspace
chars, twice ( so two words) => quantifier =1
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z
-
If there are several
non-space
chars followed withspace
chars, third times ( so three words) => quantifier =2
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z
and so on… till :
- If there are several
non-space
chars followed withspace
chars, ninth times ( so nine words) => quantifier =8
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z
Now, to answer your question, I would say :
SEARCH
(?s)\A.*
BSR(
FR)
ESR.*\z
where FR =
[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,
N-2}[^[:space:]]+[[:space:]]*
OR FR =[[:space:]]+
( case no word )Best Regards,
guy038
-
-
@guy038 thank you very much !
-
Delete the entire content of all files with less than 6 words
FIND:
\A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z
REPLACE:
(LEAVE EMPTY)
-
Hi, @rodica-f and All,
I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !
First, and just anecdotal, the
(?i)
modifier is useless as no range of letters occurs in your regexSecondly, this regex will delete all file contents if more than
0
word char and less than7
word charsThirdly, let’s consider this somple phrase :
let abc - xyz
It contains
4
non-space expressions (let
,abc
,-
andxyz
)Your regex seems OK as it correctly select all text which contains less than
7
wordsNow, change the
-
sign by a+
sign :let abc + xyz
This time, your regex does not match anything although there are, still,
4
non-space expressions :((
Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !
[^\w+]*
means “find a a char different from a word char and different from the + sign”, repeated from0
to any[\w*]+
means “find a word char or a * symbol”, repeated from1
to any[^\w*]+
means “find a char different from a word char and different from the * symbol”, repeated from1
to anySo, an almost-correct solution would be
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
. However, note that it also matches a true empty file which does not need any replacement as already empty !!
Now, the important drawback of using word chars
\w
and non-word chars[^\w]
, is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :This is a simple example
and :
This is a sim-ple example
If I use my last “word” version
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
, it matches the textThis is a simple example
and not the textThis is a sim-ple example
! Because, in the former case, it counts5
words and, in the later case, it counts6
wordsThat’s why my previous and @terry-r’s version, using non-space characters
[[:^space:]]
and space chars[[:space:]]
, seems more rigorous and practical ;-))Best Regards
guy038
-
@guy038 said in Delete the entire content of all files with less than 100 words:
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
My joy is that, thanks to my regex, an alternative method has been discovered, quite good.
thank you @guy038