Regex or something for hosts lines sorting
-
Hello,
Could you please help me the the following search-and-replace problem I am having?
I have hosts file with list of domains form ZeroDot1
Here is the data I currently have (“before” data):
0.0.0.0 0.0.0.0ah.api.cryptopool.eu 0.0.0.0 0.0.0.0air.api.cryptopool.eu 0.0.0.0 0.0.0.0air.cryptopool.eu 0.0.0.0 0.0.0.0am.dashminer.com 0.0.0.0 0.0.0.0amsterdam.bytecoins.world 0.0.0.0 0.0.0.0antispam.api.cryptopool.eu 0.0.0.0 0.0.0.0antispam.cryptopool.eu 0.0.0.0 0.0.0.0antispam.suremine.com 0.0.0.0 0.0.0.0ap.veriblockpool.com 0.0.0.0 0.0.0.0api.bcn.unipool.pro 0.0.0.0 0.0.0.0api.bytecoins.world
Here is how I would like that data to look (“after” data):
0.0.0.0 cryptopool.eu 0.0.0.0 dashminer.com ... etc OR cryptopool.eu dashminer.com ... etc
I tried manually with .*cryptopool.eu \n ctrl+c/ctrl+v, replace it all with blank, but there is like 250k lines.
Could you please help me with simple regex line that give only one domain, there could be lines like “hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.hostmaster.pon4ek.triplemining.com”
Need to be “triplemining.com”. I can add 0.0.0.0 later.
I tried to understand the faq and readme but suck at english.
Thank you. -
Hi. If you’re confident every line starts with that exact 17 character sequence, we could start by matching that (for removal) with one of these:
^\Q0.0.0.0 0.0.0.0.\E
^.{17}From there, we want to match the trailing “word1.word2” on the right (you’re confident that in every case exactly that form is what needs to be preserved, right? so zero cases like “hello.world.com”, right?), so we’d want to match (and capture) that after skipping everything to the left just before the second last word: .*<(\w+.\w+)
So altogether, and enforcing start & end of line boundaries, we can use this regex for Find
^.{17}.*<(\w+.\w+)$
and then we’ll replace it either without the prefix
\1
or we’ll replace it with the prefix
0.0.0.0 \1After that, you can use Remove Duplicate Lines, a command in the Line Op group under the Edit menu. And you should be good.
-
Hello, @dultojorze, @neil-schipper and All,
Personnaly, from your INPUT text, below :
0.0.0.0 0.0.0.0ah.api.cryptopool.eu 0.0.0.0 0.0.0.0air.api.cryptopool.eu 0.0.0.0 0.0.0.0air.cryptopool.eu 0.0.0.0 0.0.0.0am.dashminer.com 0.0.0.0 0.0.0.0amsterdam.bytecoins.world 0.0.0.0 0.0.0.0antispam.api.cryptopool.eu 0.0.0.0 0.0.0.0antispam.cryptopool.eu 0.0.0.0 0.0.0.0antispam.suremine.com 0.0.0.0 0.0.0.0ap.veriblockpool.com 0.0.0.0 0.0.0.0api.bcn.unipool.pro 0.0.0.0 0.0.0.0api.bytecoins.world
I would use this regex S/R :
-
SEARCH
^.*\.(?=\w+\.\w+$)
-
REPLACE
0.0.0.0\x20
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click once only, on the
Replace All
button
giving the temporary OUTPUT text :
0.0.0.0 cryptopool.eu 0.0.0.0 cryptopool.eu 0.0.0.0 cryptopool.eu 0.0.0.0 dashminer.com 0.0.0.0 bytecoins.world 0.0.0.0 cryptopool.eu 0.0.0.0 cryptopool.eu 0.0.0.0 suremine.com 0.0.0.0 veriblockpool.com 0.0.0.0 unipool.pro 0.0.0.0 bytecoins.world
Then, I would use the menu option
Edit > Line operations > Remove Duplicate Lines
And you get your expected OUTPUT :
0.0.0.0 cryptopool.eu 0.0.0.0 dashminer.com 0.0.0.0 bytecoins.world 0.0.0.0 suremine.com 0.0.0.0 veriblockpool.com 0.0.0.0 unipool.pro
Voilà !
The nice thing to remember is that this command may act upon a selection of lines only !
Best Regards,
guy038
-
-
@dultojorze and @guy038,
Shame on me! My intention was that my reply be completely embedded in a literal text box!
If you're confident every line starts with that exact 17 character sequence, we could start by matching that (for removal) with one of these: ^\Q0.0.0.0 0.0.0.0.\E ^.{17} From there, we want to match the trailing "word1.word2" on the right (you're confident that in every case exactly that form is what needs to be preserved, right? so zero cases like "hello.world.com", right?), so we'd want to match (and capture) that after skipping everything to the left just before the second last word: .*\<(\w+\.\w+) So altogether, and enforcing start & end of line boundaries, we can use this regex for Find ^.{17}.*\<(\w+\.\w+)$ and then we'll replace it either without the prefix \1 or we'll replace it with the prefix 0.0.0.0 \1 After that, you can use Remove Duplicate Lines, a command in the Line Op group under the Edit menu. And you should be good.
Anyway, now there are two tested solutions.