Help with Regex to delete a block in paragraph/line
-
Hi all,
Let say I have the paragraph below. Is it possible to use notepade++ to delete just the <block…>. the <block> could vary in length ,but it is shorter than 20 characterAnswer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed concerns can men now.
Thank you
-
I would try a Replace operation as follows:
Find-what zone:
<block\d+>
Replace-with zone: make sure this is empty
Search mode:Regular expression
\d+
stands for one or more digit characters, 0-9. From your description It is hard to tell what you might need to delete as far as whitespace goes, on either side of the bracketed text… -
Forgot to mention that there can be any character besides the number after the word “block”. Ex: <block a> , <block bac> . I’m wondering what you can delete the block base on the start character “<” and the end character “>”. If I use the wild "* " like <(.)>, it would select all the character until the end of the paragraph. I’m wondering if there is such a thing so that you can define a range for the wild card. For example something arbitrary like <(20)> would look for 20 characters within the <>.
-
Sorry, if you can’t describe your data well with the first go-round, I lose interest; maybe somebody else can pick it up and help…?
-
Hello @du-p, and All,
Ah, du-p! I understood your problem !
Let’s start with the text, below, where I added three ranges
<block....>
, to you initial text :Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
You’ve certainly gave a try to the regex
<block.*>
and you were surprised to notice that it matches from<block01>
to the last block<block123456789012345678901>
, included ! weren’t you ?Simply, because the dot (
.
), followed by the quantifier*
( or its equivalent{0,}
) represents the greatest range of any standard character, till a>
symbol. That’s the default “greedy” behaviour.And I guess that you wanted to limit the search to the next 20 characters, after
<block
, to get individual blocks, only !This is useless, as you may use the “lazy” behaviour, by adding an interrogation mark, after the
*
quantifier !So, the final regex S/R would be :
SEARCH
(?-si)<block.*?>
REPLACE
Leave EMPTY
OPTIONS
Regular expression
ACTION :
Replace
orReplace All
Notes :
-
The first part
(?-si)
( equivalent to(?-s)(?-i)
) means that :-
The dot special character matches any single standard character, only, and NOT End of Line chars
-
The search is performed, in a sensitive way. So, it would not match, for instance, the string
<BlocK...>
-
-
Then it matches the exact string
<block
, followed by the smallest range of standard characters, till an ending symbol>
-
As the replacement zone is empty, theses ranges
<block....>
are simply deleted
Note that you may, as well, use the 20 characters limit, if you prefer to ! This time, the regex becomes :
(?-si)<block.{0,20}?>
Applied against the text below :
Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
It would miss the last
<block...>
, because 21 digits are located, between<block
and>
!Best Regards,
guy038
-
-
@guy038
Thank you for the thorough explanation(?-si)<block.*?> works better than expected for me, even when inside the <block…> exceed 20 characters.