Keep links and discard rest from page sourcecode.

  • How I can keep links similar to this

    and delete rest of source code?
    Source code text file have 55 links similar like.

  • @Ravi-K ,

    Some questions:

    • Did you intend for that to be two separate lines (ie, you want to keep both normal URLs and filenames-with-extension) or was that meant to be a single line with the filename immediately after the trailing slash in the URL?
      • for the URL protocol, will they all be https://, or will there also be http://, ftp://, file://, irc://, or other such links?
      • if there really are links that are filename.ext, does it have to be able to recognize any .ext, or just .jpg, or just some small set of extensions (if the last choice, please list them)
    • The search/replace expression can be very different if there is only ever one link per line of text vs potentially needing to recognize multiple links in the same line

    If you follow the advice below and give us some more information (especially “before” and “after” data, with examples of data that will stay and data that will go), it will make it more likely that you will get an answer that meets your needs. But I will start by showing what I would do under one such circumstance

    If I were solving this problem for just https://... links, where they may or may not be more than one link per line, I would probably do it as a multi-step process

    1. get them onto lines by themselves: open Search > Replace dialog:
      FIND = https://[^\s()<>'"]* – assume that spaces, tabs, newlines, parentheses, angle brackets, and double or single quotes will all end a URL
      REPLACE = \r\n$0\r\n – put CRLF newlines on both sides of the matched link
      SEARCH MODE = regular expression
    2. Get rid of everything but those URLs-on-a-line: Search > Mark dialog (or if it is still open, just go to the Mark tab of the Replace dialog you were already in
      FIND = ^https://[^\s()<>'"]*$
      BOOKMARK LINE = enabled
      MARK ALL
      Search > Bookmark > Remove unmarked lines

    Given the number of open questions about your data, I doubt that will immediately solve your problem. But it would be my first step. I recommend figuring out how all the pieces of my recommendation worked, and trying to tweak them to match your actual data. If you have trouble, follow the advice below, share what you tried and why you thought it would work when you tweaked my regular expressions, and ask specific questions.


