save hyperlinks in large html which are underlined?



  • Hey out there;
    I was wondering , since using notepad noticed when hyperlinks (underscore characters) appear allows you to navigate www easily. I want to filter out all except these links / underscore to have only http addresse(s)? Way to do it just in notepad++? I know a few sed , grep tools , but how do you look for underscore char, different question I suppose. Tks in advance. Doug. ( water buoy)



  • @water-buoy

    The “underscoring” you are talking about is only visual.

    You would have to go after – via search – the text of the http addresses themselves.

    This is easier said than done, as it gets into “what is a regular expression for finding http addresses”?

    Notepad++ itself uses this, which may not be perfect (I suspect this because sometimes Notepad++ interprets URLs in my text files incorrectly, usually by “spilling over” the underlining into adjacent non-URL text):

    #define URL_REG_EXPR "[A-Za-z]+://[A-Za-z0-9_\\-\\+~.:?&@=/%#,;\\{\\}\\(\\)\[\]\\|\\*\\!\\\]+"
    

    Anyway, finding the text of URLs isn’t quite what you’d want; you’d want to find non-URL text so that you can delete it.



  • I just noticed that I ended my previous posting early. It should have read:

    “Anyway, finding the text of URLs isn’t quite what you’d want; you’d want to find non-URL text so that you can delete it, leaving only the desired URL text.

    If you want further help in pursing this type of solution, please indicate that.


Log in to reply