Is there a way to export text files with URLs of the same domain extracted from a big text file with tab-separated URLs?
-
I have a big text file with over 5000 URLs tab-separated in it and I want to have a mini program to extract and cut URLs of the same domain and put them into a new text file with the name of the domain. For example, if we have the following URLs in the text file (Lexicographically ascending),
https://www.topuniversities.com/
https://www.topuniversities.com/university-rankings
https://www.topuniversities.com/university-rankings/world-university-rankings/2018
https://www.translate.com/I want to have the first three URLs to cut and paste into a new file with the name “topuniversities”. For the 4th URL, I want to put it into another text file called “Translate” and so on.
I know Notepad++ is very complex. Is there any way to do this?
-
I know Notepad++ is very complex. Is there any way to do this?
Yes, a script, e.g. python script, could do this BUT can we rely on your example data?
I mean, it is relatively easy to identify an url but getting the name part
of subdomain isn’t that easy.So, does every subdomain follow your example?
This is hostname.sub.domain like www.translate.com?
Or could it be that you have also something like
http://x15w.sub.text.org/…
If the latter is the case, what should be used as the filename?
sub or text or sub.text or subtext or …Cheers
Claudia