Community
    • Login

    Regex or something for hosts lines sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 585 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dultojorzeD Offline
      dultojorze
      last edited by

      Hello,

      Could you please help me the the following search-and-replace problem I am having?

      I have hosts file with list of domains form ZeroDot1

      Here is the data I currently have (“before” data):

      0.0.0.0 0.0.0.0ah.api.cryptopool.eu
      0.0.0.0 0.0.0.0air.api.cryptopool.eu
      0.0.0.0 0.0.0.0air.cryptopool.eu
      0.0.0.0 0.0.0.0am.dashminer.com
      0.0.0.0 0.0.0.0amsterdam.bytecoins.world
      0.0.0.0 0.0.0.0antispam.api.cryptopool.eu
      0.0.0.0 0.0.0.0antispam.cryptopool.eu
      0.0.0.0 0.0.0.0antispam.suremine.com
      0.0.0.0 0.0.0.0ap.veriblockpool.com
      0.0.0.0 0.0.0.0api.bcn.unipool.pro
      0.0.0.0 0.0.0.0api.bytecoins.world
      

      Here is how I would like that data to look (“after” data):

      0.0.0.0 cryptopool.eu
      0.0.0.0 dashminer.com
      ...
      etc
      
      OR
      cryptopool.eu
      dashminer.com
      ...
      etc
      

      I tried manually with .*cryptopool.eu \n ctrl+c/ctrl+v, replace it all with blank, but there is like 250k lines.
      Could you please help me with simple regex line that give only one domain, there could be lines like “hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.pon4ek.triplemining.com”
      Need to be “triplemining.com”. I can add 0.0.0.0 later.
      I tried to understand the faq and readme but suck at english.
      Thank you.

      Neil SchipperN 1 Reply Last reply Reply Quote 0
      • Neil SchipperN Offline
        Neil Schipper @dultojorze
        last edited by

        @dultojorze

        Hi. If you’re confident every line starts with that exact 17 character sequence, we could start by matching that (for removal) with one of these:
        ^\Q0.0.0.0 0.0.0.0.\E
        ^.{17}

        From there, we want to match the trailing “word1.word2” on the right (you’re confident that in every case exactly that form is what needs to be preserved, right? so zero cases like “hello.world.com”, right?), so we’d want to match (and capture) that after skipping everything to the left just before the second last word: .*<(\w+.\w+)

        So altogether, and enforcing start & end of line boundaries, we can use this regex for Find
        ^.{17}.*<(\w+.\w+)$
        and then we’ll replace it either without the prefix
        \1
        or we’ll replace it with the prefix
        0.0.0.0 \1

        After that, you can use Remove Duplicate Lines, a command in the Line Op group under the Edit menu. And you should be good.

        Neil SchipperN 1 Reply Last reply Reply Quote 0
        • guy038G Offline
          guy038
          last edited by guy038

          Hello, @dultojorze, @neil-schipper and All,

          Personnaly, from your INPUT text, below :

          0.0.0.0 0.0.0.0ah.api.cryptopool.eu
          0.0.0.0 0.0.0.0air.api.cryptopool.eu
          0.0.0.0 0.0.0.0air.cryptopool.eu
          0.0.0.0 0.0.0.0am.dashminer.com
          0.0.0.0 0.0.0.0amsterdam.bytecoins.world
          0.0.0.0 0.0.0.0antispam.api.cryptopool.eu
          0.0.0.0 0.0.0.0antispam.cryptopool.eu
          0.0.0.0 0.0.0.0antispam.suremine.com
          0.0.0.0 0.0.0.0ap.veriblockpool.com
          0.0.0.0 0.0.0.0api.bcn.unipool.pro
          0.0.0.0 0.0.0.0api.bytecoins.world
          

          I would use this regex S/R :

          • SEARCH ^.*\.(?=\w+\.\w+$)

          • REPLACE 0.0.0.0\x20

          • Tick the Wrap around option

          • Select the Regular expression search mode

          • Click once only, on the Replace All button

          giving the temporary OUTPUT text :

          0.0.0.0 cryptopool.eu
          0.0.0.0 cryptopool.eu
          0.0.0.0 cryptopool.eu
          0.0.0.0 dashminer.com
          0.0.0.0 bytecoins.world
          0.0.0.0 cryptopool.eu
          0.0.0.0 cryptopool.eu
          0.0.0.0 suremine.com
          0.0.0.0 veriblockpool.com
          0.0.0.0 unipool.pro
          0.0.0.0 bytecoins.world
          

          Then, I would use the menu option Edit > Line operations > Remove Duplicate Lines

          And you get your expected OUTPUT :

          0.0.0.0 cryptopool.eu
          0.0.0.0 dashminer.com
          0.0.0.0 bytecoins.world
          0.0.0.0 suremine.com
          0.0.0.0 veriblockpool.com
          0.0.0.0 unipool.pro
          

          Voilà !

          The nice thing to remember is that this command may act upon a selection of lines only !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 2
          • Neil SchipperN Offline
            Neil Schipper @Neil Schipper
            last edited by

            @dultojorze and @guy038,

            Shame on me! My intention was that my reply be completely embedded in a literal text box!

            If you're confident every line starts with that exact 17 character sequence, we could start by matching that (for removal) with one of these:
            ^\Q0.0.0.0 0.0.0.0.\E
            ^.{17}
            
            From there, we want to match the trailing "word1.word2" on the right (you're confident that in every case exactly that form is what needs to be preserved, right? so zero cases like "hello.world.com", right?), so we'd want to match (and capture) that after skipping everything to the left just before the second last word: .*\<(\w+\.\w+)
            
            So altogether, and enforcing start & end of line boundaries, we can use this regex for Find
            ^.{17}.*\<(\w+\.\w+)$
            and then we'll replace it either without the prefix
            \1
            or we'll replace it with the prefix
            0.0.0.0 \1
            
            After that, you can use Remove Duplicate Lines, a command in the Line Op group under the Edit menu. And you should be good.
            

            Anyway, now there are two tested solutions.

            1 Reply Last reply Reply Quote 1

            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

            With your input, this post could be even better 💗

            Register Login
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors