Regex: Find all between strings. Select everything that is extra after `html` on the same line



  • hi, I have this strange situation. I have a lots of line that contain links that ends with html But in some cases, I have some lines that contain more than a simple link.

    the-book-is-here.html
    yes-I-love-you.html
                     contact.html
                     continuation-of-the-last-harmony.html"><img src="frrr/flag_lsd_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; create-a-beautiful-team.html
                     create-a-new-vision-of-the-art-down.html
                     the-cat-i-like-here.html
                     important-dates-here.html
    

    The output should be

    "><img src="frrr/flag_lsd_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp;
    

    So I must select everything that is extra after html on the same line.

    I try this regex, but are not good:

    FIND: (html)(.*?)(html)$
    REPLACE BY: \2



  • Hello, @robin-cruise and All,

    Try this regex S/R :

    SEARCH (?-s)\.html.+</a>&nbsp;\K.+html$

    REPLACE Leave EMPTY


    Or, may be, this more simple one :

    SEARCH (?-s)\.html.+\K\x20.+html$

    REPLACE Leave EMPTY

    BR

    guy038



  • This post is deleted!


  • thank you @guy038 but I wanted to select something else. The regex should select exactly this line, that is framed by html tag

    "><img src="frrr/flag_lsd_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp;
    


  • SEARCH: (")(.*?;)(.*?)\s+
    REPLACE BY: \1\r

    OR

    SEARCH: (")(.*?;)(.*?)\s+
    REPLACE BY: \1\r\t\t\t\t\t


Log in to reply