Delete the entire content of all files with less than 100 words
-
Hello, @rodica-f, @neil-schipper, @alan-kilborn, @terry-r and All,
@terry-r :
I found out a variant , based on your use of the
[[:space:]]
POSIX character class !SEARCH
(?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z
REPLACE
Leave EMPTY
This regex S/R will delete any content of files containing less than
100
words OR even0
non-space char followed with some[[:space:]]
charsBest Regards,
guy038
-
@guy038 @Terry-R @Alan-Kilborn @Neil-Schipper
thank you all. It is always a challenge to discover regex solutions.
by the way, I didn’t know the method with
[[:punct:]]
Where can I find about this regex method on internet? I don’t know how to search about it… -
-
@guy038 said in Delete the entire content of all files with less than 100 words:
(?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,98}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z
One more question I have for @guy038 I want to use one of your GENERIC S/R for this case. SO I need to delete the content of a file that have less then 10 words between section <START> and <FINAL>
<START> The first, thing to note when <FINAL>
So, I test with all your GENERIC regex formulas you done a long time ago.
BSR =
<START>
ESR =<FINAL>
FR =(?s)\A[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,10}[^[:space:]]+[[:space:]]*\z|\A[[:space:]]+\z
REGEX:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)
(?-i:BSR|\G(?!^))(?s:(?!ESR).)*?\K(?-i:FR)
(?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
(?-i:BSR|(?!^)\G)(?s:(?!ESR).)*?\K(?-i:FR)
(?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
It is not working, in any of the cases. I get the same message on F/R: “Cannot find the text…”
-
Hi, @rodica-f and All,
EDIT : The regexes, below, are incomplete. See the correct solution in my next post
You do not need to use these generic regexes at all !
Simply, replace
\A
by<START>
and\z
by<FINAL>
and, of course, change the value of the quantifier of the non-capturing group from98
to8
, giving the functional regex S/R below :SEARCH
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>
REPLACE
Leave EMPTY
So, the general formula for deleting all file contents, if there are less than
N
words between the two boundaries<START>
and<FINAL>
, is :SEARCH
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,
N-2}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>
REPLACE
Leave EMPTY
BR
guy038
-
@guy038 correct me if I’m wrong. The GENERIC formula in this case will be:
(?s)BSR(FR)*ESR|BSR+ESR
I think I’m wrong somewhere.
-
@guy038 by the way I test your generic formula you done for me.
(?s)<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>|<START>[[:space:]]+<FINAL>
In the context below, delete only everything that is framed in <START> and <FINAL>
But does not delete the entire file, I mean the other words around it.
blah blah blah <START> The first, thing to note when <FINAL> blah blah
-
Hello, @rodica-f and All,
Oh… Yes ! I was wrong about it ! The correct regex S/R is, of course :
SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,8}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z
REPLACE
Leave EMPTY
And the general formula for deleting all file contents, if there are less than
N
words between the two boundaries<START>
and<FINAL>
, becomes :SEARCH
(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,
N-2}[^[:space:]]+[[:space:]]*<FINAL>.*\z|\A.*<START>[[:space:]]+<FINAL>.*\z
REPLACE
Leave EMPTY
This regex will delete all file contents in all these cases :
-
If there no
non-space
char (0
word ), and only somespace
chars => the regex is\A.*<START>[[:space:]]+<FINAL>.*\z
( the part after the|
symbol ) -
If there are several
non-space
chars ( one word ), possibly surrounded withspace
chars => quantifier =0
and the regex becomes(?s)\A.*<START>[[:space:]]*[^[:space:]]+[[:space:]]*<FINAL>.*\z
-
If there are several
non-space
chars followed withspace
chars, twice ( so two words) => quantifier =1
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+)[^[:space:]]+[[:space:]]*<FINAL>.*\z
-
If there are several
non-space
chars followed withspace
chars, third times ( so three words) => quantifier =2
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){2}[^[:space:]]+[[:space:]]*<FINAL>.*\z
and so on… till :
- If there are several
non-space
chars followed withspace
chars, ninth times ( so nine words) => quantifier =8
and the regex becomes(?s)\A.*<START>[[:space:]]*(?:[^[:space:]]+[[:space:]]+){8}[^[:space:]]+[[:space:]]*<FINAL>.*\z
Now, to answer your question, I would say :
SEARCH
(?s)\A.*
BSR(
FR)
ESR.*\z
where FR =
[[:space:]]*(?:[^[:space:]]+[[:space:]]+){0,
N-2}[^[:space:]]+[[:space:]]*
OR FR =[[:space:]]+
( case no word )Best Regards,
guy038
-
-
@guy038 thank you very much !
-
Delete the entire content of all files with less than 6 words
FIND:
\A(?i)[^\w+]*(?:[\w*]+[^\w*]+){0,5}(?:[\w*]+[^\w+]*)?\z
REPLACE:
(LEAVE EMPTY)
-
Hi, @rodica-f and All,
I sorry to tell you that your last regex does not meet exactly the previous rules and is rather erroneous !
First, and just anecdotal, the
(?i)
modifier is useless as no range of letters occurs in your regexSecondly, this regex will delete all file contents if more than
0
word char and less than7
word charsThirdly, let’s consider this somple phrase :
let abc - xyz
It contains
4
non-space expressions (let
,abc
,-
andxyz
)Your regex seems OK as it correctly select all text which contains less than
7
wordsNow, change the
-
sign by a+
sign :let abc + xyz
This time, your regex does not match anything although there are, still,
4
non-space expressions :((
Why this behaviour occurs ? Well, the different sub-expressions, that you used in your regex, are erroneous !
[^\w+]*
means “find a a char different from a word char and different from the + sign”, repeated from0
to any[\w*]+
means “find a word char or a * symbol”, repeated from1
to any[^\w*]+
means “find a char different from a word char and different from the * symbol”, repeated from1
to anySo, an almost-correct solution would be
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
. However, note that it also matches a true empty file which does not need any replacement as already empty !!
Now, the important drawback of using word chars
\w
and non-word chars[^\w]
, is that any symbol, met in text, will increase the number of words !. For instance, see the difference betwen :This is a simple example
and :
This is a sim-ple example
If I use my last “word” version
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
, it matches the textThis is a simple example
and not the textThis is a sim-ple example
! Because, in the former case, it counts5
words and, in the later case, it counts6
wordsThat’s why my previous and @terry-r’s version, using non-space characters
[[:^space:]]
and space chars[[:space:]]
, seems more rigorous and practical ;-))Best Regards
guy038
-
@guy038 said in Delete the entire content of all files with less than 100 words:
\A[^\w]*(?:\w+[^\w]+){0,4}(?:\w+[^\w]*)?\z
My joy is that, thanks to my regex, an alternative method has been discovered, quite good.
thank you @guy038