Community
    • Login

    Is there a way to export text files with URLs of the same domain extracted from a big text file with tab-separated URLs?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sepehr ES
      Sepehr E
      last edited by

      I have a big text file with over 5000 URLs tab-separated in it and I want to have a mini program to extract and cut URLs of the same domain and put them into a new text file with the name of the domain. For example, if we have the following URLs in the text file (Lexicographically ascending),

      https://www.topuniversities.com/
      https://www.topuniversities.com/university-rankings
      https://www.topuniversities.com/university-rankings/world-university-rankings/2018
      https://www.translate.com/

      I want to have the first three URLs to cut and paste into a new file with the name “topuniversities”. For the 4th URL, I want to put it into another text file called “Translate” and so on.

      I know Notepad++ is very complex. Is there any way to do this?

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @Sepehr E
        last edited by Claudia Frank

        @Sepehr-E

        I know Notepad++ is very complex. Is there any way to do this?

        Yes, a script, e.g. python script, could do this BUT can we rely on your example data?
        I mean, it is relatively easy to identify an url but getting the name part
        of subdomain isn’t that easy.

        So, does every subdomain follow your example?
        This is hostname.sub.domain like www.translate.com?
        Or could it be that you have also something like
        http://x15w.sub.text.org/…
        If the latter is the case, what should be used as the filename?
        sub or text or sub.text or subtext or …

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors