How to remove paragraphs with specific pattern ?
- 
 I work with hundreds of txt files that formatted as follow : LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] 
 Room ‘xxx’ Seat #2 is occupied
 Seat 1: Mr.Hotseat
 Seat 2: könönen84
 *** NOTES ***
 seated at, amount ($$)LOG #7: 2020/04/15 0:48:55 CUST [2020/04/15 13:48:55 ET] 
 Room ‘xxx’ Seat #2 is occupied
 Seat 1: Mr.Hotseat
 Seat 2: könönen84
 Seat 3: -
 Seat 4: -
 *** NOTES ***
 seated at, amount ($$)***I wish to delete entire paragraphs that word ‘Seat’ occurs exactly 3 times or less (in these case 1st paragraph). 
 Can someone please provide some suggestions and thoughts on this?Thank you very much. 
- 
 Hi @Harl-Xu, All Try this: Place the caret at the beginning of the file. Then open the Find panel (Control + F) and copy the following line in the Find what:field:(?-s)^LOG #.*\R^Room.*\R(?:^Seat \d+:.*\R){1,2}^\*.*\R^seated.*\R\R?Leave empty the Replace with:field.Select the Regular expression search mode, and click on theReplace Allbutton.The regex will delete all paragraphs not containing the Seat 3:string.Hope this helps. 
- 
 It’s not exactly to the OP’s spec, but it may be fulfilling the OP’s need! We will see. :-) 
- 
 I guess we are reading again a message that is ambiguous in a different way. I count three times the term Seat in the paragraph to be deleted, but OP may have meant that the three seats should be at the beginning of a line. It doesn’t matter much anyway, since the regex is very easy to adapt to how many times Seat should appear. Let’s see :) 
- 
 Hello Sir, Thank you for the help. I’m sorry if you found my post ambiguous. I’m trying hard to compose my post in English. The only thing that constant between those paragraphs i’m working on is they always start with ‘LOG #’. And there are always blank line to separate those paragraphs. The wording or number of lines in a paragraph will varies, hence the code doesn’t work with other paragraphs. ‘Seat’ could be placed anywhere. All I want is to select ‘LOG #’ until blank line, count the word ‘Seat’, then delete entire selection if they matched my criteria. Thank you. 
- 
 Hi @Harl-Xu Don’t worry about languages issues, as English isn’t my first language either. When in troubles, try to use a translator service as DeepL.com, if it is available for your language. Your message is ambiguous in a crucial sense, because we aren’t sure how to count the Seat instances. Let me show you what I mean, say: LOG #7: 2020/04/15 0:48:55 CUST [2020/04/15 13:48:55 ET] Room ‘xxx’ Seat #2 is occupied Seat 1: Mr.Hotseat Seat 2: könönen84 Seat 3: - *** NOTES *** seated at, amount ($$)***If I take into account Seat #2 —mentioned in line 2—, then the paragraph includes 4 instancesof the word Seat, so, applying the provided rule, the paragraph LOG #7 should not be deleted. However, if Seat #2 should not be counted, then LOG #7 includes only3 instancesof the word Seat and by the rule it should be deleted. See our problem?So, in order to better help you, I (we) need to know exactly how to count those instances. Also, please provide at least 3 examplesof paragraphs that match the posted regex and3 examplesthat fail to match. The examples are necessary to try to catch some regularity in them, which in turn will make a regex approach possible.Best Regards. 
- 
 Hello, @harl-xu, @Astrosofista, @alan-kilborn and All, @harl-xu, one @astrosofista’s statement is fundamental. He said : Also, please provide at least 3 examples of paragraphs that match the posted regex and 3 examples that fail to match. The examples are necessary to try to catch some regularity in them, which in turn will make a regex approach possible. Statement which could be simplify as : A faily number of examples of WHAT must be catched and WHAT must be ignored, to find out some regularity in these two sets of examples ! This approch helps us to build up the perfect regular expression, adapted to your personal case ! Now, I was waiting for an @astrosofista’s reply to propose my own solution 
 I tried to guess your needs and I supposed that you want to count the Seatwords only if they begin a line and are followed with a space char- 
If we also assume that all the lines Seat <number>:, in aLOG #section, are consecutive, here is my first version :- 
SEARCH (?s-i)^LOG\x20#((?:(?!^Seat\x20).)+?)(?-s:^Seat.+\R){0,3}?(?1)\R{2,}
- 
REPLACE Leave EMPTY
 
- 
- 
Later, I found out a second improved version which supports that the lines Seat <Number>:may be located anythere in a section, after the lineLOG #.......- 
SEARCH (?s-i)^LOG\x20#(?:((?:(?!^Seat\x20).)+?)^Seat\x20){0,3}?(?1)\R{2,}
- 
REPLACE Leave EMPTY
 
- 
 Notes : - 
This 2ndversion still counts lines which begin withSeat <number>:, ONLY
- 
You may modify the number of required lines, changing the lazy quantifier {0,3}?. Note that this regex S/R will also delete any section without any line, beginning withSeat <Number>, with that exact case. If not desired, change the quantifier to{1,3}?
- 
Moreover, any LOG #section can be separated, from an other section, byanypositive number of pure empty lines !
 
 Here is an extended version of the second version, using the FREE-spacing regex mode, with some explanations in comments : (?xs-i) # Search in FREE-SPACING, SINGLE line and NON-INSENSITIVE modes ^LOG\x20\# # String "LOG #", BEGINNING of line (?: # START of the first NON-CAPTURING group ( # START of Group 1 (?: (?!^Seat\x20) . )+? # SHORTEST NON-NULL Range of ANY char, WITHOUT "Seat\x20" at BEGINNING of line ) # END of Group 1 ^Seat\x20 # followed with the STRING "Seat " at BEGINNING of line ) # END of the first NON-CAPTURING group {0,3}? # present a MINIMUM of 0 to 3 TIMES (?1) # followed, again, with ANOTHER group 1 ( a SUBROUTINE CALL to the group 1 REGEX ) \R{2,} # ENDING with, at least, TWO CONSECUTIVE line-breaks
 Finally, from the last @astrosofista’s post, if we consider that we must count any Seat <Number>string, whatever its location in a section, after theLOG #string, here is my third version regex version :- 
SEARCH (?s-i)^LOG\x20#(?:((?:(?!Seat\x20).)+?)Seat\x20){0,3}?(?1)\R{2,}
- 
REPLACE LEave EMPTY
 Best Regards, guy038 
- 
- 
 Hi, @harl-xu, @Astrosofista, @alan-kilborn and All, To simplify and understand the general architecture, we can decompose, for instance, the secondversion of the search regex, according to this schema :ANY char ANY char ANY char ANY char V V V V LOG #.................^Seat\x20.................^Seat\x20.................^Seat\x20.................\R{2,} \_______________/ \_______________/ \_______________/ \_______________/ v v v v Group 1 Group 1 Group 1 (?1) = Group 1 \________________________/\________________________/\________________________/\_______________/ v v v NON-capturing group NON-capturing group NON-capturing group ______________________________________________________________________________ REPEATED a MINIMUM, from ZERO to THREE times Note : ALL the GROUP 1 do NOT contain any string "^Seat ", due to the LOOK-AHEAD structure (?!^Seat\x20)Hope you like it ! Cheers, guy038 
- 
 Hi @guy038, @astrosofista, All I want to match 'Seat ', wherever their positions are. So I go with solution#3. But upon testing, solution#2 seems to have same hit with solution#3. But at least I can continue with my project now… @astrosofista, the word ROOM and seated in my explanation are irrelevant, because they might not be there. That’s my bad, sorry. You all are my saviors. Thank you so much. 
- 
 Hi, @harl-xu, @astrosofista, @alan-kilborn, @ekopalypse, @michael-vincent and All, @harl-xu, there is, indeed, a difference between solutions 2and3, below :- 
Regex 2:(?s-i)^LOG\x20#(?:((?:(?!^Seat\x20).)+?)^Seat\x20){0,3}?(?1)\R{2,}
- 
Regex 3:(?s-i)^LOG\x20#(?:((?:(?!Seat\x20).)+?)Seat\x20){0,3}?(?1)\R{2,}
 
 For instance, against this short example, below, which contains four LOG #sections :LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Room ‘xxx’ Seat #2 is occupied *** NOTES *** seated at, amount ($$) LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Room ‘xxx’ Seat #2 is occupied Seat 1: Mr.Hotseat *** NOTES *** seated at, amount ($$) LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Room ‘xxx’ Seat #2 is occupied Seat 1: Mr.Hotseat Seat 2: könönen84 *** NOTES *** seated at, amount ($$) LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Room ‘xxx’ Seat #2 is occupied Seat 1: Mr.Hotseat Seat 2: könönen84 Seat 3: Blah blah *** NOTES *** seated at, amount ($$)The regex 2matches all the4sections whereas the regex3does not match the last section ! Why ?- 
With regex 2, it looks for not more than3strings"Seat ", beginning a line, within aLOG #section
- 
With regex 3, it looks for not more than3strings"Seat ", anytwhere in a line, within aLOG #section
 So, because of the line Room ‘xxx’ Seat #2 is occupied, in all sections, which contains the string Seat #2, the lastLOG #section has, finally, FOUR strings"Seat ". Thus, the regex3cannot match the lastLOG #section. Elementary !Best Regards, guy038 P.S. : With regexes 2or3, aLOG #section will be considered as having3sections even if the lines"Seat "are not consecutivesThe regex 1, below, was more restrictive because, both, the strings"Seat "must begin a line and all these lines must also be consecutive !- Regex 1;(?s-i)^LOG\x20#((?:(?!^Seat\x20).)+?)(?-s:^Seat.+\R){0,3}?(?1)\R{2,}
 For instance, the regex 1would only match the secondLOG #, below :LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Seat 1: Mr.Hotseat *** NOTES *** Seat 2: könönen84 seated at, amount ($$) Seat 3: Blah blah LOG #2: 2020/04/14 0:48:55 CUST [2020/04/14 13:48:55 ET] Seat 1: Mr.Hotseat Seat 2: könönen84 Seat 3: Blah blah *** NOTES *** seated at, amount ($$)
- 
- 
 Hi @guy038 , and All Thank you for taking some extra work to explain the differences. Those schematic and details… You are so cool… :) 
 Then like I said in above post, regex 3 is what I need.Best Regards, Harl 



