Keep links and discard rest from page sourcecode.



  • How I can keep links similar to this

    https://d1a0n9gptf7ayu.cloudfront.net/photos/
    9fc6d4cc85ce7b5e33d0ae73ecddcb.jpg
    

    and delete rest of source code?
    Source code text file have 55 links similar like.



  • @Ravi-K ,

    Some questions:

    • Did you intend for that to be two separate lines (ie, you want to keep both normal URLs and filenames-with-extension) or was that meant to be a single line with the filename immediately after the trailing slash in the URL?
      • for the URL protocol, will they all be https://, or will there also be http://, ftp://, file://, irc://, or other such links?
      • if there really are links that are filename.ext, does it have to be able to recognize any .ext, or just .jpg, or just some small set of extensions (if the last choice, please list them)
    • The search/replace expression can be very different if there is only ever one link per line of text vs potentially needing to recognize multiple links in the same line

    If you follow the advice below and give us some more information (especially “before” and “after” data, with examples of data that will stay and data that will go), it will make it more likely that you will get an answer that meets your needs. But I will start by showing what I would do under one such circumstance

    If I were solving this problem for just https://... links, where they may or may not be more than one link per line, I would probably do it as a multi-step process

    1. get them onto lines by themselves: open Search > Replace dialog:
      FIND = https://[^\s()<>'"]* – assume that spaces, tabs, newlines, parentheses, angle brackets, and double or single quotes will all end a URL
      REPLACE = \r\n$0\r\n – put CRLF newlines on both sides of the matched link
      SEARCH MODE = regular expression
      REPLACE ALL
    2. Get rid of everything but those URLs-on-a-line: Search > Mark dialog (or if it is still open, just go to the Mark tab of the Replace dialog you were already in
      FIND = ^https://[^\s()<>'"]*$
      BOOKMARK LINE = enabled
      MARK ALL
      Search > Bookmark > Remove unmarked lines

    Given the number of open questions about your data, I doubt that will immediately solve your problem. But it would be my first step. I recommend figuring out how all the pieces of my recommendation worked, and trying to tweak them to match your actual data. If you have trouble, follow the advice below, share what you tried and why you thought it would work when you tweaked my regular expressions, and ask specific questions.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.