Help with Regex to delete a block in paragraph/line

du p

Hi all,
Let say I have the paragraph below. Is it possible to use notepade++ to delete just the <block…>. the <block> could vary in length ,but it is shorter than 20 character

Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed concerns can men now.

Thank you

Scott Sumner

@du-p

I would try a Replace operation as follows:

Find-what zone: <block\d+>
Replace-with zone: make sure this is empty
Search mode: Regular expression

\d+ stands for one or more digit characters, 0-9. From your description It is hard to tell what you might need to delete as far as whitespace goes, on either side of the bracketed text…

du p

Forgot to mention that there can be any character besides the number after the word “block”. Ex: <block a> , <block bac> . I’m wondering what you can delete the block base on the start character “<” and the end character “>”. If I use the wild "* " like <(.)>, it would select all the character until the end of the paragraph. I’m wondering if there is such a thing so that you can define a range for the wild card. For example something arbitrary like <(20)> would look for 20 characters within the <>.

Scott Sumner

@du-p

Sorry, if you can’t describe your data well with the first go-round, I lose interest; maybe somebody else can pick it up and help…?

guy038

Hello @du-p, and All,

Ah, du-p! I understood your problem !

Let’s start with the text, below, where I added three ranges <block....>, to you initial text :

Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.

You’ve certainly gave a try to the regex <block.*> and you were surprised to notice that it matches from <block01> to the last block <block123456789012345678901>, included ! weren’t you ?

Simply, because the dot ( .), followed by the quantifier * ( or its equivalent {0,} ) represents the greatest range of any standard character, till a > symbol. That’s the default “greedy” behaviour.

And I guess that you wanted to limit the search to the next 20 characters, after <block, to get individual blocks, only !

This is useless, as you may use the “lazy” behaviour, by adding an interrogation mark, after the * quantifier !

So, the final regex S/R would be :

SEARCH (?-si)<block.*?>

REPLACE Leave EMPTY

OPTIONS Regular expression

ACTION : Replace or Replace All

Notes :

The first part (?-si) ( equivalent to (?-s)(?-i) ) means that :
- The dot special character matches any single standard character, only, and NOT End of Line chars
- The search is performed, in a sensitive way. So, it would not match, for instance, the string <BlocK...>
Then it matches the exact string <block, followed by the smallest range of standard characters, till an ending symbol >
As the replacement zone is empty, theses ranges <block....> are simply deleted

Note that you may, as well, use the 20 characters limit, if you prefer to ! This time, the regex becomes :

(?-si)<block.{0,20}?>

Applied against the text below :

Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.

It would miss the last <block...>, because 21 digits are located, between <block and > !

Best Regards,

guy038

du p

@guy038
Thank you for the thorough explanation

(?-si)<block.*?> works better than expected for me, even when inside the <block…> exceed 20 characters.