Notepad++ How to delete sections of text starting with a line containing a certain phrase



  • I’m trying to edit a few calibre files that have tags attached, but the tag line is not always formatted the same.

    Eg.

    div class=“pcalibre1 pcalibre2 pcalibre tags-list”

    div class=“pcalibre1 pcalibre2 tags-list pcalibre”.

    I want to delete everything including and between the lines containing tags-list and entry-speaker.

    Is there an easy way to do this with regex?



  • @Banjo-G said in Notepad++ How to delete sections of text starting with a line containing a certain phrase:

    delete everything including and between the lines containing tags-list and entry-speaker

    Your example doesn’t show any entry-speaker so I suggest you have a look HERE.



  • @Alan-Kilborn Sorry, those were examples of the lines not being formatted the same.
    An better example would be

    <div class=“pcalibre2 pcalibre1 pcalibre tags-list”>







    <h4 class=“pcalibre2 pcalibre1 pcalibre entry-speaker”>

    With all lines wanting to be deleted. The problem is that the tags-list and entry-speaker lines are often scrambled.

    Eg.
    <div class=“pcalibre2 pcalibre tags-list pcalibre1”>
    <div class=“pcalibre1 pcalibre2 tags-list pcalibre”>
    <div class=“pcalibre1 pcalibre2 pcalibre tags-list”>
    <div class=“pcalibre2 pcalibre1 pcalibre tags-list”>



  • Hello, @banjo-g, @alan-kilborn and All,

    I think you could test this regex S/R, below :

    SEARCH (?-s)^.+?tags-list(?s).+?entry-speaker.+?$\R

    REPLACE Leave EMPTY

    against this sample text :

    blabla
    bhahblah
    blabla
    <div class=“pcalibre2 pcalibre1 pcalibre tags-list”>
    …
    …
    …
    …
    <h4 class=“pcalibre2 entry-speaker pcalibre1 pcalibre”>
    blabla
    bhahblah
    blabla
    <div class=“pcalibre1 tags-list pcalibre2 pcalibre”>
    …
    …
    …
    …
    <h4 class=“pcalibre2 pcalibre1 pcalibre entry-speaker”>
    blabla
    bhahblah
    blabla
    <div class=“pcalibre2 pcalibre tags-list pcalibre1”>
    …
    …
    …
    …
    <h4 class=“entry-speaker pcalibre2 pcalibre1 pcalibre”>
    blabla
    bhahblah
    blabla
    rt
    <div class=“tags-list pcalibre1 pcalibre2 pcalibre”>
    …
    …
    …
    …
    <h4 class=“pcalibre2 pcalibre1 entry-speaker pcalibre”>
    blabla
    bhahblah
    blabla
    

    should be OK ;-))

    Best Regards,

    guy038



  • @guy038 That works wonders, thanks!



  • Hi, @banjo-g, @alan-kilborn and All,

    A generic and general form of the regex, described in my previous post, could be :

    SEARCH (?-is)^.*?Expression A(?s).*?Expression B.*?$\R

    Basically, this regex :

    • Searches for two lines :

      • A line A, containing Expression A, at any location of line A

      • A line B, containing Expression B, at any location of line B

    • Selects all range of characters, generally multi-lines, from the beginning of line A till the end of line B, with its EOL characters

    • The lines A and B may be identical. However, in that case, Expression B must be located after Expression A, in current line !

    Notes :

    • First, the in-line modifier (?-is)

      • Carries a non-insensitive search ( so sensitive to case )

      • Forces the regex engine to interpret the regex dot symbol . as matching a single standard character ( not EOL ones )

    • Then, the part ^.*?Expression A matches, from beginnning of line, the shortest range, possibly null, of standard characters, followed by Expression A, with that exact case

    • Now, the part (?s).*?Expression B looks for the shortest range, possibly null, of characters, including EOL, followed by Expression B, with that exact case

    • Finally, the part .*?$\R searches for the shortest range, possibly null, of characters till an end of line, followed with its line-break

    Cheers,

    guy038


Log in to reply