Keep links and discard rest from page sourcecode.
-
How I can keep links similar to this
https://d1a0n9gptf7ayu.cloudfront.net/photos/ 9fc6d4cc85ce7b5e33d0ae73ecddcb.jpg
and delete rest of source code?
Source code text file have 55 links similar like. -
@Ravi-K ,
Some questions:
- Did you intend for that to be two separate lines (ie, you want to keep both normal URLs and filenames-with-extension) or was that meant to be a single line with the filename immediately after the trailing slash in the URL?
- for the URL protocol, will they all be
https://
, or will there also behttp://
,ftp://
,file://
,irc://
, or other such links? - if there really are links that are
filename.ext
, does it have to be able to recognize any.ext
, or just.jpg
, or just some small set of extensions (if the last choice, please list them)
- for the URL protocol, will they all be
- The search/replace expression can be very different if there is only ever one link per line of text vs potentially needing to recognize multiple links in the same line
If you follow the advice below and give us some more information (especially “before” and “after” data, with examples of data that will stay and data that will go), it will make it more likely that you will get an answer that meets your needs. But I will start by showing what I would do under one such circumstance
If I were solving this problem for just
https://...
links, where they may or may not be more than one link per line, I would probably do it as a multi-step process- get them onto lines by themselves: open Search > Replace dialog:
FIND =https://[^\s()<>'"]*
– assume that spaces, tabs, newlines, parentheses, angle brackets, and double or single quotes will all end a URL
REPLACE =\r\n$0\r\n
– put CRLF newlines on both sides of the matched link
SEARCH MODE = regular expression
REPLACE ALL - Get rid of everything but those URLs-on-a-line: Search > Mark dialog (or if it is still open, just go to the Mark tab of the Replace dialog you were already in
FIND =^https://[^\s()<>'"]*$
BOOKMARK LINE = enabled
MARK ALL
Search > Bookmark > Remove unmarked lines
Given the number of open questions about your data, I doubt that will immediately solve your problem. But it would be my first step. I recommend figuring out how all the pieces of my recommendation worked, and trying to tweak them to match your actual data. If you have trouble, follow the advice below, share what you tried and why you thought it would work when you tweaked my regular expressions, and ask specific questions.
----
Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the
</>
toolbar button or manual Markdown syntax. To makeregex in red
(and so they keep their special characters like *), use backticks, like`^.*?blah.*?\z`
. Screenshots can be pasted from the clipboard to your post usingCtrl+V
to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries. - Did you intend for that to be two separate lines (ie, you want to keep both normal URLs and filenames-with-extension) or was that meant to be a single line with the filename immediately after the trailing slash in the URL?