Help with Regex to delete a block in paragraph/line



  • Hi all,
    Let say I have the paragraph below. Is it possible to use notepade++ to delete just the <block…>. the <block> could vary in length ,but it is shorter than 20 character

    Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed concerns can men now.

    Thank you



  • @du-p

    I would try a Replace operation as follows:

    Find-what zone: <block\d+>
    Replace-with zone: make sure this is empty
    Search mode: Regular expression

    \d+ stands for one or more digit characters, 0-9. From your description It is hard to tell what you might need to delete as far as whitespace goes, on either side of the bracketed text…



  • Forgot to mention that there can be any character besides the number after the word “block”. Ex: <block a> , <block bac> . I’m wondering what you can delete the block base on the start character “<” and the end character “>”. If I use the wild "* " like <(.)>, it would select all the character until the end of the paragraph. I’m wondering if there is such a thing so that you can define a range for the wild card. For example something arbitrary like <(20)> would look for 20 characters within the <>.



  • @du-p

    Sorry, if you can’t describe your data well with the first go-round, I lose interest; maybe somebody else can pick it up and help…?



  • Hello @du-p, and All,

    Ah, du-p! I understood your problem !

    Let’s start with the text, below, where I added three ranges <block....>, to you initial text :

    Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
    

    You’ve certainly gave a try to the regex <block.*> and you were surprised to notice that it matches from <block01> to the last block <block123456789012345678901>, included ! weren’t you ?

    Simply, because the dot ( .), followed by the quantifier * ( or its equivalent {0,} ) represents the greatest range of any standard character, till a > symbol. That’s the default “greedy” behaviour.

    And I guess that you wanted to limit the search to the next 20 characters, after <block, to get individual blocks, only !

    This is useless, as you may use the “lazy” behaviour, by adding an interrogation mark, after the * quantifier !

    So, the final regex S/R would be :

    SEARCH (?-si)<block.*?>

    REPLACE Leave EMPTY

    OPTIONS Regular expression

    ACTION : Replace or Replace All

    Notes :

    • The first part (?-si) ( equivalent to (?-s)(?-i) ) means that :

      • The dot special character matches any single standard character, only, and NOT End of Line chars

      • The search is performed, in a sensitive way. So, it would not match, for instance, the string <BlocK...>

    • Then it matches the exact string <block, followed by the smallest range of standard characters, till an ending symbol >

    • As the replacement zone is empty, theses ranges <block....> are simply deleted


    Note that you may, as well, use the 20 characters limit, if you prefer to ! This time, the regex becomes :

    (?-si)<block.{0,20}?>

    Applied against the text below :

    Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
    

    It would miss the last <block...>, because 21 digits are located, between <block and > !

    Best Regards,

    guy038



  • @guy038
    Thank you for the thorough explanation

    (?-si)<block.*?> works better than expected for me, even when inside the <block…> exceed 20 characters.


Log in to reply