[nsfw] Help extracting links from page source code
- 
 [NSFW] 
 Hi Guys, I have this source code of a page.
 https://workupload.com/file/mGszjxtb7vBWhat I want to achieve. - Extract all direct image file links like this from source code text file above.
 h ttps://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/amy038SRS_232716004.jpg and discard rest of source code. - Then add  /3000/   before jpg file in the link, so above link
 will be become
 h ttps://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716004.jpg All links in seperate lines. And Finally Save! 
 I want to this on multiple text files i have.
 I failed at first step. regex is my weakness.I want to do same on multiple files. 
 /a/ can be any alphabet in other file ,/amy038/, and /232716/ also be different for other files.If anyone can help me create macro for this. Now cherry on top would be if final text file is saved as name amy038-232716.txt Thanks 
- 
 Hello, @ravi-k and All, After downloading your page.txtfile, I could identify, with regexes and sorting,763links to a.jpgpicture, divided into four classes :- 
254lines<a href="https ............... .jpg ...............> <img src="http ..... /thumbs/ ........ .jpg .............></a>, so containing 2 x 254 =508links
- 
254linesContent-Location: http ... cdn04 .............. .jpg ....., so254links
- 
1linehttps .... www.atkpetites.com .................... .jpg, so1link
 After sorting, we get : <a href="https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/amy038SRS_232716001.jpg .... .... (A) .... <a href="https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/amy038SRS_232716254.jpg <img src="http://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/thumbs/amy038SRS_232716001.jpg .... .... (B) .... <img src="http://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/thumbs/amy038SRS_232716254.jpg Content-Location: http://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/thumbs/amy038SRS_232716001.jpg .... .... (C) .... Content-Location: http://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/thumbs/amy038SRS_232716254.jpg Content-Location: https://www.atkpetites.com/css/images/america_flag.jpg (D)
 My question is : in order to get the right search regex and isolate the right links, which kind of links are you looking at, given the four lists A,B,CorD, above ?Best Regards, guy038 
- 
- 
 @guy038 A. and adding /3000/ before jpg. https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716001.jpg
- 
 Hi, @ravi-k and All, Assuming that : - 
All your files are located in a specific folder 
- 
The 3digits, before the string.jpg, are reserved for numbering
- 
You want to add the 3000folder in any link, right before the picture name
 Here is the road map : - 
Duplicate this folder. So the duplicated folder contains exactly the same files as the initial folder 
- 
Open the Find in Files dialog ( Ctrl + Shift + F)- 
SEARCH (?s-i)(?:(\A)|).+?<a\x20href="(?-s)(https?[^>\r\n]+/)((.+?)...\.jpg)|(?s).+
- 
REPLACE (?1\4.txt\r\n\r\n)?2\23000/\3\r\n
- 
FILTERS *.txt
- 
DIRECTORY Theabsolutepath to theduplicatedfolder
- 
Select the Regular expressionsearch mode
- 
Click on the Replace in Filesbutton and valid the confirmation dialog
 
- 
 From your downloaded example, you should get this kind of output, which should be similar, for all files of your duplicated folder amy038SRS_232716.txt https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716001.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716002.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716003.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716004.jpg ..... ..... ..... https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716251.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716252.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716253.jpg https://cdn04.atkingdom-network.com/secure/content/a/amy038/232716/3000/amy038SRS_232716254.jpg
 Now, the goal is, with the Python,LuaorNppExecscript plugin, to rename the current name of each file, in your duplicated folder, with the name located in the very first line of each file !This seems fairly easy and I bet that some script’s gurus, on N++ community, will find out a solution, very soon ! However, test my regex S/R against all your files, first, to verify possible issues and/or improvements ! Best Regards, guy038 P.S. : BTW, I noticed that your sample file contains, both : - 
Some lines with Windows line endings ( CRLF)
- 
Some lines with Unix line endings ( LF)
 Fortunately, this does not pertub the regex S/R. If you prefer to deal with Unix files, only, simply change the replacement regex as : REPLACE (?1\4.txt\n\n)?2\23000/\3\n
- 
- 
 @guy038 This is exactly what I wanted. Thanks for that complex RegEx. 
 No issues.
