Type of duplicate lines
-
Type of duplicate lines
My text file:
:
:
4xtrader@tpg.com.au:
2emajnllc@gmail.com:
2emajnllc@gmail.com:
2emajnllc@gmail.com:
127victor@cox.net:
1talo@bluewin.ch:
10241024simon@gmail.com:
10241024simon@gmail.com:
4xtrader@tpg.com.au:
4xtrader@tpg.com.au:
10241024simon@gmail.com:
10241024simon@gmail.com:
4xtrader@tpg.com.au:
:
4xtrader@tpg.com.au:
abradbery@gmail.com: q74Xpc0O
abradbery@gmail.com: q74Xpc0O
abradbery@gmail.com: q74Xpc0O
abradbery@gmail.com: q74Xpc0O
abrahamvthomas@hotmail.com:
abrahamvthomas@hotmail.com:
tomaszt1969@hotmail.com:u0Pct21s
vernon.lee.888@gmail.com:bFVi04Ly
v.o.l8@mail.ru:kof6Iut2
vaughan089@hotmail.com:wc61VihU
virtexed@gmail.com:
voytek@comcast.net:
vlado@3crew.net:g3Bn165d
…
My file has over 30,000 lines.1 / How to delete duplicate lines? I have used many expressions but it seems to help only when my file is limited to 10,000 lines. When my file has about 20,000 lines or more, the expressions are almost impossible to respond. For example :
^ (. *?) $ \ s +? ^ (? =. * ^ \ 1 $)
(? -s) ^ (. + \ R) (? = (? s:. *) ^ \ 1)
(? -s) ^ (. +) \ R (? s) (? =. * \ R \ 1 \ R?)
(? -s) ^ … (. + \ R) (? s) (?!. * \ R (? - s) … \ 1)
Why can’t there be only one form of expression to support the elimination of duplicate lines in my example format?
2 / Remove lines that have only “:”? (I think the answer to this question is similar to the answer of question 1. The reason I ask more questions here is because I was hesitant to think that the expression above matches the format of the line only. I need to be able to use it in another file.
3 / How to choose the lines that have characters after “:”?
Because I use Google to translate into English, hope you can understand my question. Thanks -
@Sarah-Duong said in Type of duplicate lines:
How to delete duplicate lines?
I’m wondering if you noticed there is a built-in feature that will “remove consecutive duplicate lines”. The option is under Edit, then Line operations.
As for the 2nd question I presume the line ONLY contains the
:
character. If so use the replace function (Search, then Replace) with
Find What:^:\R
Replace With: nothing in this field.Not sure what your intention is with question #3. As a suggestion, supply some examples so we may better help you.
Hopefully these answers will help.
Terry
-
When I follow your instructions, it absolutely doesn’t help me eliminate duplicate lines. I just noticed it removed the unique “:” line. The second thing you say: using the alternative method, this only helps to replace one other character. I mean, want to remove them with an expression. My question 3 is to search for lines that have the following character “:”. In the above example, I just selected the lines:
abradbery@gmail.com: q74Xpc0O
tomaszt1969@hotmail.com: u0Pct21s
vernon.lee.888@gmail.com: bFVi04Ly
v.o.l8@mail.ru: kof6Iut2
vaughan089@hotmail.com: wc61VihU
vlado@3crew.net: g3Bn165d
I do not know how to post my text file. You can guide me so I can post it here, you will soon understand my question. -
@Terry-R If I use the alternative method as you said: All other lines with “:” will be removed. I meant to remove only lines that have only “:”. The lines that have the characters before and after the “:” are all kept. Do you understand me ???Take a closer look at my example, in my example there are only lines with “:”
-
@Sarah-Duong said in Type of duplicate lines:
doesn’t help me eliminate duplicate lines
I copied some of your examples and the built-in function “remove consecutive line” DID remove the duplicates, leaving just 1 of each group. If it’s not working for you then something else (NPP environment maybe, your examples aren’t the same as you have in the file, or the file type) is affecting the result.
To post examples without the editor changing it use the information within the FAQ, specifically FAQ Desk: Request for Help without sufficient information to help you
Often issues like you are having are found to be data related, such as information in the real data is not reflected in the examples. You will see some posts have their examples in a black window, the FAQ tells you how to do that.Terry
-
@Terry-R Many thanks for the support from you. However, as a newbie, I still have many shortcomings when posting questions. But what I posted on this topic is copied from my text file. I really want to find a way to fully upload it but I don’t know how to manipulate it. I have no idea what your answers are. Just because it doesn’t really help me. Hope you understand. Thanks, again
-
@Sarah-Duong said in Type of duplicate lines:
what I posted on this topic is copied from my text file
I understand that you may have copied the examples, however the editor you type in can interpret some characters and change the way examples show. See what I’ve done with your example below.
4xtrader@tpg.com.au: 2emajnllc@gmail.com: 2emajnllc@gmail.com: 2emajnllc@gmail.com: 127victor@cox.net: 1talo@bluewin.ch: 10241024simon@gmail.com: 10241024simon@gmail.com: 4xtrader@tpg.com.au: 4xtrader@tpg.com.au: 10241024simon@gmail.com: 10241024simon@gmail.com: 4xtrader@tpg.com.au:
I created this black window by using the ` character 3 times on a line before and a line after the examples. On an english keyboard it will generally be the key right before the number 1 key.
We (all) on this forum are happy to help, however to do so you do need to be able to understand the questions we ask and be able to give good responses.
Terry
-
@Terry-R That is, if I want to post part of my text file here, I need to manipulate : Type ` character 3 times on a line before and a line after the examples? And will it be displayed on black window, same your ?
-
@Sarah-Duong said in Type of duplicate lines:
ype 3 times the " "followed
You will see that your post has been changed by the editor and the special character is missing. The character is like a backward facing quote character, often on the same key as
~
is.
So 3 of this character on a line by themselves
next line is the examples
next line by themselves the same 3 characters.We don’t need all 20000 lines, but about 20-30 lines showing all 3 types of lines you want changed will be sufficient.
Terry
-
@Terry-R If I have multiple lines, I have to type multiple ampersands `? Or do I just need to type it at the beginning and at the end of the text file content?
-
the 3 characters on a line. Then next line starts the examples for as many lines as necessary. the next line below the last line of characters is the 3 special characters again. So between the 2 lines of special characters are as many lines as you want.
Terry
-
@Terry-R and here my text :
1 / How to delete duplicate lines?
2/ Remove lines that have only “:”?
3/ How to choose the lines that have characters after “:”?: : : 4xtrader@tpg.com.au: 2emajnllc@gmail.com: 127victor@cox.net: 1talo@bluewin.ch: 10241024simon@gmail.com: 4xtrader@tpg.com.au: a.tworowski@o2.pl:sXOa61Dq 4xtrader@tpg.com.au: 1talo@bluewin.ch: a.tworowski@o2.pl:sXOa61Dq a_cameronsse@hotmail.com:jof6IutH 4xtrader@tpg.com.au: a_nizam2032@yahoo.com: aaaerealty@yahoo.com: a.tworowski@o2.pl:sXOa61Dq aaaerealty@yahoo.com: 10241024simon@gmail.com: 4xtrader@tpg.com.au: 10241024simon@gmail.com: 4xtrader@tpg.com.au: : 4xtrader@tpg.com.au: 10241024simon@gmail.com: a.tworowski@o2.pl:sXOa61Dq 4xtrader@tpg.com.au: 1talo@bluewin.ch: 2emajnllc@gmail.com: 4xtrader@tpg.com.au: 10241024simon@gmail.com: 4xtrader@tpg.com.au: abradbery@gmail.com:q74Xpc0O abrahamvthomas@hotmail.com: a.tworowski@o2.pl:sXOa61Dq : aaron.r.cameron@gmail.com: aaaerealty@yahoo.com: aaron.r.cameron@gmail.com: : abradbery@gmail.com:q74Xpc0O aaron.r.cameron@gmail.com: abrahamvthomas@hotmail.com: abradbery@gmail.com:q74Xpc0O 4xtrader@tpg.com.au: a.tworowski@o2.pl:sXOa61Dq abrahamvthomas@hotmail.com: 10241024simon@gmail.com: 1talo@bluewin.ch: 4xtrader@tpg.com.au: 10241024simon@gmail.com: a-al-khaledi@hotmail.com: abdullah.al.hajri0001@gmail.co: a_cameronsse@hotmail.com:jof6IutH abdullah.al.hajri0001@gmail.co: abrahamvthomas@hotmail.com: abdullah.al.hajri0001@gmail.co: aaaerealty@yahoo.com: abrahamvthomas@hotmail.com: abdullah.al.hajri0001@gmail.co: aaaerealty@yahoo.com: abrahamvthomas@hotmail.com: abrarahmed325@yahoo.com: 4xtrader@tpg.com.au: abrarahmed325@yahoo.com: 4xtrader@tpg.com.au: abrahamvthomas@hotmail.com: 4xtrader@tpg.com.au: abrarahmed325@yahoo.com: ac5.thomas@btinternet.com: AccountingQB@brilloco.com: abrahamvthomas@hotmail.com: 4xtrader@tpg.com.au: abrahamvthomas@hotmail.com: abrarahmed325@yahoo.com: 10241024simon@gmail.com: abrahamvthomas@hotmail.com: adelaideairportshuttles@gmail.: adgrant6180@yahoo.com.au: abrarahmed325@yahoo.com: abrahamvthomas@hotmail.com: ac5.thomas@btinternet.com: AccountingQB@brilloco.com: abrarahmed325@yahoo.com: AccountingQB@brilloco.com: abrahamvthomas@hotmail.com: AccountingQB@brilloco.com: abrarahmed325@yahoo.com: AccountingQB@brilloco.com: abrahamvthomas@hotmail.com: a.tworowski@o2.pl:sXOa61Dq abrarahmed325@yahoo.com: aaaerealty@yahoo.com: agarwalgaura@gmail.com: abrahamvthomas@hotmail.com: agarwalgaura@gmail.com: agaskill@maalnet.com: ageorgiev86@yandex.ru:dIYk0ONb abrarahmed325@yahoo.com: adgrant6180@yahoo.com.au: adelaideairportshuttles@gmail.: agarwalgaura@gmail.com: abrarahmed325@yahoo.com: agarwalgaura@gmail.com: adgrant6180@yahoo.com.au: abrahamvthomas@hotmail.com: agilbert@hixworks.com: adgrant6180@yahoo.com.au: afoto@optonline.net: agilbert@hixworks.com: afoto@optonline.net: agilbert@hixworks.com: afoto@optonline.net: adelaideairportshuttles@gmail.: abrahamvthomas@hotmail.com: agilbert@hixworks.com: abrahamvthomas@hotmail.com: adgrant6180@yahoo.com.au: adelaideairportshuttles@gmail.: advanced80@xtra.co.nz: agilbert@hixworks.com: agarwalgaura@gmail.com: abrahamvthomas@hotmail.com: abrarahmed325@yahoo.com: agilbert@hixworks.com: aipunts@yahoo.co.uk:pul8OBa4 agilbert@hixworks.com: aaaerealty@yahoo.com: agilbert@hixworks.com: AccountingQB@brilloco.com: agilbert@hixworks.com: AccountingQB@brilloco.com: aipunts@yahoo.co.uk:pul8OBa4 afoto@optonline.net: agilbert@hixworks.com: akisa5577@gmail.com: AccountingQB@brilloco.com: agilbert@hixworks.com: aipunts@yahoo.co.uk:pul8OBa4 agilbert@hixworks.com: abrarahmed325@yahoo.com: agilbert@hixworks.com: aj0312@my.bristol.ac.uk: akisa5577@gmail.com: agilbert@hixworks.com: akisa5577@gmail.com: aipunts@yahoo.co.uk:pul8OBa4 akisa5577@gmail.com: agilbert@hixworks.com: alagha.ahmad@gmail.com: alain_delongchamp@yahoo.com: agilbert@hixworks.com: alain_delongchamp@yahoo.com: agilbert@hixworks.com: akisa5577@gmail.com: alain_delongchamp@yahoo.com: akisa5577@gmail.com: alagha.ahmad@gmail.com: AccountingQB@brilloco.com: agilbert@hixworks.com: akolanupaka@gmail.com: alagha.ahmad@gmail.com: agilbert@hixworks.com: akolanupaka@gmail.com: alamrozek@interia.eu: alain_delongchamp@yahoo.com: alamrozek@interia.eu: alagha.ahmad@gmail.com: alain_delongchamp@yahoo.com: akisa5577@gmail.com: alain_delongchamp@yahoo.com: akisa5577@gmail.com: alain_delongchamp@yahoo.com: akisa5577@gmail.com: akolanupaka@gmail.com: alain_delongchamp@yahoo.com: alamrozek@interia.eu: alain_delongchamp@yahoo.com: alan.james68@icloud.com: Albert.Lau@eastwestbank.com: alain_delongchamp@yahoo.com: Albert.Lau@eastwestbank.com: alain_delongchamp@yahoo.com: abrarahmed325@yahoo.com: alain_delongchamp@yahoo.com: alan.james68@icloud.com: alamrozek@interia.eu: alan.james68@icloud.com: ajurkovic@iinet.net.au: Albert.Lau@eastwestbank.com: alan.james68@icloud.com: alamrozek@interia.eu: ageorgiev86@yandex.ru:dIYk0ONb alamrozek@interia.eu: Alemannia@gmx.com: alamrozek@interia.eu: akolanupaka@gmail.com: Alemannia@gmx.com: alert@infoplasticsurgery.com: alain_delongchamp@yahoo.com: Albert.Lau@eastwestbank.com: alain_delongchamp@yahoo.com: Albert.Lau@eastwestbank.com: albertrodriguez28@yahoo.com: Alemannia@gmx.com: alain_delongchamp@yahoo.com: albertrodriguez28@yahoo.com: aldis@hostnet.lv: alan.james68@icloud.com: alexrossouw196@gmail.com: alan.james68@icloud.com: alexrossouw196@gmail.com: Alemannia@gmx.com: alexrossouw196@gmail.com: akisa5577@gmail.com: Albert.Lau@eastwestbank.com: aldis@hostnet.lv: Albert.Lau@eastwestbank.com: alexrossouw196@gmail.com: aldis@hostnet.lv: alain_delongchamp@yahoo.com: alexrossouw196@gmail.com: alert@infoplasticsurgery.com: alexrossouw196@gmail.com: Alemannia@gmx.com: akisa5577@gmail.com: Alemannia@gmx.com: alexrossouw196@gmail.com: alert@infoplasticsurgery.com: akisa5577@gmail.com: alert@infoplasticsurgery.com: alektron@aol.com: althielman@live.com: altumbabicnahid@gmail.com: albertrodriguez28@yahoo.com: alexrossouw196@gmail.com: albertrodriguez28@yahoo.com: alexrossouw196@gmail.com: alfred.kum@gmail.com: alexrossouw196@gmail.com: alfred.kum@gmail.com: alert@infoplasticsurgery.com: alfred.kum@gmail.com: aman.di@hotmail.com: amendol1@verizon.net: alexrossouw196@gmail.com: alistair@hexcollective.co.uk: alfred.kum@gmail.com: alistair@hexcollective.co.uk: alfred.kum@gmail.com: aman.di@hotmail.com: abrarahmed325@yahoo.com: aman.di@hotmail.com: althielman@live.com: AMERAHMED19@GMAIL.COM: altumbabicnahid@gmail.com: andreas.toerpel@web.de alexrossouw196@gmail.com: andreaszerbes@gmail.com: ALJOAMAYA@GMAIL.COM: alert@infoplasticsurgery.com: aman.di@hotmail.com: altumbabicnahid@gmail.com: alexrossouw196@gmail.com: andpanagiotop@gmail.com: alfred.kum@gmail.com: andpanagiotop@gmail.com: alistair@hexcollective.co.uk: alizenel@outlook.com: aldis@hostnet.lv: althielman@live.com: alfred.kum@gmail.com: ALJOAMAYA@GMAIL.COM: alistair@hexcollective.co.uk: aman.di@hotmail.com: andpanagiotop@gmail.com: aman.di@hotmail.com: alan.james68@icloud.com: andrewdonnellyjr@aol.com:qu48OcaN andrzej.wencel@yahoo.com: alfred.kum@gmail.com: andrew.harnaga@hotmail.com: andreas.toerpel@web.de: alexrossouw196@gmail.com: andrew.harnaga@hotmail.com: andreaszerbes@gmail.com: andrew@ezestream.com.au: andrew.harnaga@hotmail.com: altumbabicnahid@gmail.com: andreas.toerpel@web.de: andrewdonnellyjr@aol.com:qu48OcaN andrzej.wencel@yahoo.com: andrew.harnaga@hotmail.com: anglinpaul@hotmail.com: andrew.chaveriat@gmail.com: alexrossouw196@gmail.com: aman.di@hotmail.com: andreas.toerpel@web.de: antydoe@gmail.com: anisessaid5@gmail.com: andrew@ezestream.com.au: andrew.harnaga@hotmail.com: andrewdonnellyjr@aol.com:qu48OcaN andreas.toerpel@web.de: antydoe@gmail.com: arash@42uag.com: arolaxinvestor@gmail.com: antydoe@gmail.com: arolaxinvestor@gmail.com: artallison@aol.com: anisessaid5@gmail.com: andreas.toerpel@web.de: anisessaid5@gmail.com: anglinpaul@hotmail.com: andrew.harnaga@hotmail.com: antuzla@outlook.com: antydoe@gmail.com: andpanagiotop@gmail.com: ascrowe@wyoming.com: arunasaste@gmail.com: ash-1989-@hotmail.com: andrzej.wencel@yahoo.com: anglinpaul@hotmail.com: ash-1989-@hotmail.com: arash@42uag.com: anuvu@ymail.com: andrew.harnaga@hotmail.com: antydoe@gmail.com: artallison@aol.com: andrew.harnaga@hotmail.com: andrewdonnellyjr@aol.com:qu48OcaN anglinpaul@hotmail.com: ash-1989-@hotmail.com: arunasaste@gmail.com: argoman@hotmail.co.uk: attention109@yahoo.com: alexrossouw196@gmail.com: antuzla@outlook.com: attention109@yahoo.com: andrzej.wencel@yahoo.com: arunasaste@gmail.com: arolaxinvestor@gmail.com: antuzla@outlook.com: asmoonlight@yandex.ru: attention109@yahoo.com: asmoonlight@yandex.ru: ash-1989-@hotmail.com: atinton@hotmail.com: avysotsky@ukr.net: arkadyokrezna@gmail.com: axel@aadaum.de: avysotsky@ukr.net: arunasaste@gmail.com: azyk1@yahoo.com: ash-1989-@hotmail.com: azyk1@yahoo.com: anglinpaul@hotmail.com: azyk1@yahoo.com: b.costin23@gmail.com: arunasaste@gmail.com: ash-1989-@hotmail.com: avysotsky@ukr.net: attention109@yahoo.com: avysotsky@ukr.net: ash-1989-@hotmail.com: attention109@yahoo.com: avysotsky@ukr.net: azyk1@yahoo.com: Badykshanov@gmail.com: b.costin23@gmail.com: Badykshanov@gmail.com: arunasaste@gmail.com: avysotsky@ukr.net: balsara@icloud.com: banking5150@gmail.com: antydoe@gmail.com: alistair@hexcollective.co.uk: avysotsky@ukr.net: arunasaste@gmail.com: ash-1989-@hotmail.com: b.costin23@gmail.com: ashley.brown@hushmail.com: Badykshanov@gmail.com: b.costin23@gmail.com: avysotsky@ukr.net: attention109@yahoo.com: banking5150@gmail.com: b.costin23@gmail.com: axel@aadaum.de: b.costin23@gmail.com: banking5151@gmail.com: azeezb22@gmail.com: artallison@aol.com: b.costin23@gmail.com: b.rowsell@bell.net: avysotsky@ukr.net: banking5150@gmail.com: Badykshanov@gmail.com: banking5150@gmail.com: avysotsky@ukr.net: Badykshanov@gmail.com: andrew@ezestream.com.au: attention109@yahoo.com: ash-1989-@hotmail.com: baratina@gmx.net: barnettos@yahoo.com:e38Ldp5C bartekkuchnik@gmail.com: baratina@gmx.net: bartir@hotmail.com: banking5150@gmail.com: b.costin23@gmail.com: banking5151@gmail.com: banksdw@slu.edu: azyk1@yahoo.com: banking5150@gmail.com: banking5151@gmail.com: barakgr@live.com: banksdw@slu.edu: arunasaste@gmail.com: b.rowsell@bell.net: banking5151@gmail.com: barakgr@live.com: avysotsky@ukr.net: banking5151@gmail.com: barakgr@live.com: Berganphoto@aol.com: bertfrigo@gmail.com: bengel1975@msn.com: bertfrigo@gmail.com: banksdw@slu.edu: bartir@hotmail.com: banking5151@gmail.com: banking5150@gmail.com: banksdw@slu.edu: bimleshkumar@live.in: bjh@yesyes.net: bartir@hotmail.com: banking5150@gmail.com: bcteo@pegasus-it.com.sg: BBJMcorp@aol.com: banking5151@gmail.com: BEDONEISM@HOTMAIL.COM: bengel1975@msn.com: BEDONEISM@HOTMAIL.COM: beamugt@yahoo.com: bddoliveiro@gmail.com: beamugt@yahoo.com: bartir@hotmail.com: BEDONEISM@HOTMAIL.COM: bjh@yesyes.net: baratina@gmx.net: blansford@lrshouston.com:fKBm16Pd barakgr@live.com: bcteo@pegasus-it.com.sg: bjh@yesyes.net: arunasaste@gmail.com: bjh@yesyes.net: blansford@LAMTexas.trade: blansford@lrshouston.com:fKBm16Pd bengel1975@msn.com: blansford@lrshouston.com:fKBm16Pd BEDONEISM@HOTMAIL.COM: bobs114@yahoo.com.au: bimleshkumar@live.in: blansford@lrshouston.com:fKBm16Pd bjh@yesyes.net: barakgr@live.com: bobs114@yahoo.com.au: bertfrigo@gmail.com: bengel1975@msn.com: bobs114@yahoo.com.au: blansford@lrshouston.com:fKBm16Pd bobsoneau@yahoo.com.au: bobwhite1946@yahoo.com: barakgr@live.com: blberger9@comcast.net: blansford@lrshouston.com:fKBm16Pd bohdarom@sbcglobal.net: bobrabcd@frontier.com: baratina@gmx.net: bobsoneau@yahoo.com.au: blansford@lrshouston.com:fKBm16Pd bobsoneau@yahoo.com.au: bertfrigo@gmail.com: bigblckdg@aol.com: bobwhite1946@yahoo.com: bleda2_ju21@hotmail.com: bohdarom@sbcglobal.net: boonwee.hong@gmail.com: boss_yuran@mail.ru: bertfrigo@gmail.com: boss_yuran@mail.ru: billsilk@ozemail.com.au: bobmedanovic@yahoo.com: bobsoneau@yahoo.com.au: bohetsj@gmail.com: bobs114@yahoo.com.au: banking5150@gmail.com: bobs114@yahoo.com.au: boonwee.hong@gmail.com: bohdarom@sbcglobal.net: boss_yuran@mail.ru: boothmark71@hotmail.com:bFVi84Kx boss_yuran@mail.ru: bobs114@yahoo.com.au: blansford@lrshouston.com:fKBm16Pd boss_yuran@mail.ru: botha.qatar@yahoo.com: bobwhite1946@yahoo.com: botha.qatar@yahoo.com: blansford@lrshouston.com:fKBm16Pd boothmark71@hotmail.com:bFVi84Kx boss_yuran@mail.ru: bobsoneau@yahoo.com.au: boss_yuran@mail.ru: bobwhite1946@yahoo.com: boss_yuran@mail.ru: botha.qatar@yahoo.com: bowwybowwy@gmail.com: brooksforex1529@yahoo.com: bru.nico@alice.it: boss_yuran@mail.ru: bru.nico@alice.it: brumbypat@hotmail.com: bohdarom@sbcglobal.net: bjh@yesyes.net: boss_yuran@mail.ru: camillopoland@gmail.com: barakgr@live.com: boss_yuran@mail.ru: brchio@hotmail.com: braykm01@yahoo.com: bru.nico@alice.it: brchio@hotmail.com: brooksforex1529@yahoo.com: bsrsolutions10@gmail.com: carlo.paniccia@hotmail.com: carlplunkett@hotmail.com: bobwhite1946@yahoo.com: brianchatting@yahoo.co.uk: carlplunkett@hotmail.com: brchio@hotmail.com: botha.qatar@yahoo.com: carlplunkett@hotmail.com: boonwee.hong@gmail.com: bowwybowwy@gmail.com: boonwee.hong@gmail.com: bobs114@yahoo.com.au: cagoldman2005@yahoo.com: boss_yuran@mail.ru: beamugt@yahoo.com: botha.qatar@yahoo.com: carlplunkett@hotmail.com: botha.qatar@yahoo.com: cary.northup@gmail.com: carlplunkett@hotmail.com: cary.northup@gmail.com: carlplunkett@hotmail.com: carlcrabill@yahoo.com: carlplunkett@hotmail.com: camillopoland@gmail.com: bowwybowwy@gmail.com: carlplunkett@hotmail.com: ccollins@semo.net:yd72XkjW carlplunkett@hotmail.com: booner2k@gmail.com: casstlem@yahoo.com.au: camillopoland@gmail.com: cary.northup@gmail.com: carlplunkett@hotmail.com: cary.northup@gmail.com: ccollins@semo.net:yd72XkjW cary.northup@gmail.com: ccollins@semo.net:yd72XkjW cary.northup@gmail.com: ccollins@semo.net:yd72XkjW cdb07d@gmail.com: botha.qatar@yahoo.com: cbenjamin@cisolaw.com: casstlem@yahoo.com.au: botha.qatar@yahoo.com: bobs114@yahoo.com.au: bsrsolutions10@gmail.com: ccollins@semo.net:yd72XkjW bstarling@gmx.com: cary.northup@gmail.com: ccollins@semo.net:yd72XkjW cary.northup@gmail.com: ccollins@semo.net:yd72XkjW casstlem@yahoo.com.au: ccollins@semo.net:yd72XkjW botha.qatar@yahoo.com: cary.northup@gmail.com: cdb07d@gmail.com: bsrsolutions10@gmail.com: cdb07d@gmail.com: cdudek60@gmail.com: ccollins@semo.net:yd72XkjW cdudek60@gmail.com: ccollins@semo.net:yd72XkjW cdudek60@gmail.com: cdb07d@gmail.com: boss_yuran@mail.ru: cdb07d@gmail.com: cdudek60@gmail.com: cdb07d@gmail.com: cdudek60@gmail.com: ccollins@semo.net:yd72XkjW cemedia@aol.com: cdudek60@gmail.com: cdb07d@gmail.com: cgsinvest@aol.com: huynhngoccuong@gmail.com: idrzewicz@icloud.com:w0Re72Ht info@simmtec.com: ia_sho@abv.bg: idrzewicz@icloud.com:w0Re72Ht Hassamqazi7@gmail.com: ihssass@hotmail.com: idrzewicz@icloud.com:w0Re72Ht haleelg@gmail.com: gratica@att.net:gKb4EQp1 george@georgeharrison1.com:cgw3AMl8 hasco@personainternet.com: idrzewicz@icloud.com:w0Re72Ht Hassamqazi7@gmail.com: ihssass@hotmail.com: idrzewicz@icloud.com:w0Re72Ht```
-
I used the line removal method as you instructed, it just shortened it. It seems that it only deletes duplicate lines close to each other. Conversely, if there is a line that is not on the side, it is not completely removed. It only helps to remove lines that have “:” unique. What I need is to delete the lines that have “:” and get the lines with all the characters before and after the “:” sign. I cannot use replacement in this case. Because it will replace all other lines. That I absolutely do not want.
-
My suggestion for removing duplicate lines does NOT work as your data is different to the examples. In examples the duplicates are beside each other, your real data does not have that. If you were to sort your file lexicographically then my suggestion will work.
I’m not on my pc so cannot confirm the function but I think it’s in the same area as remove duplicate lines.I’m sure we can have you sorted with the remaining problems very quickly. Can you answer, what do you want to do with the lines that have characters after the
:
, that was your #3 question.Terry
-
@Terry-R In my text file, there are some lines that only have “:”. How to eliminate them? I cannot use alternative methods
^: \ R
Because it replaces the “:” in other lines. You don’t seem to understand what I mean by this sentence.
In question 3, how to select only the lines with the characters after the “:” example.a_cameronsse@hotmail.com: jof6IutH abradbery@gmail.com: q74Xpc0O a.tworowski@o2.pl: sXOa61Dq ageorgiev86@yandex.ru: dIYk0ONb aipunts@yahoo.co.uk: pul8OBa4
I don’t know how to make you understand what I mean. Sorry for my limited english
-
Hello, @sarah-duong, @Terry-r and All,
First of all, thanks for posting a real example of your text. However, I noticed that the very last line is :
idrzewicz@icloud.com:w0Re72Ht```
And I suppose that the correct line is rather :
idrzewicz@icloud.com:w0Re72Ht
- Regarding your question
2
, to delete line with an unique colon char, the Terry’s regex S/R is NOT :
SEARCH ^: \ R REPLACE Leave EMPTY
but, indeed :
SEARCH ^:\R REPLACE Leave EMPTY
Assuming your example, after clicking on the
Replace All
button,6
lines, with an unique:
, are deleted !
- Regarding your question
1
, to delete duplicates lines, you could use the following regex S/R :
SEARCH
(?-s)^(.+\R)(?=(?s).+?^\1)
REPLACE
Leave EMPTY
Again, after clicking on the
Replace All
button, from your initial text of623
mail addresses, we get, at once, the expected text of152
mail addresses, all different !127victor@cox.net: a_nizam2032@yahoo.com: 2emajnllc@gmail.com: aaron.r.cameron@gmail.com: abradbery@gmail.com:q74Xpc0O 1talo@bluewin.ch: a-al-khaledi@hotmail.com: a_cameronsse@hotmail.com:jof6IutH abdullah.al.hajri0001@gmail.co: 4xtrader@tpg.com.au: 10241024simon@gmail.com: ac5.thomas@btinternet.com: a.tworowski@o2.pl:sXOa61Dq agaskill@maalnet.com: adgrant6180@yahoo.com.au: adelaideairportshuttles@gmail.: advanced80@xtra.co.nz: agarwalgaura@gmail.com: abrahamvthomas@hotmail.com: aaaerealty@yahoo.com: afoto@optonline.net: aj0312@my.bristol.ac.uk: aipunts@yahoo.co.uk:pul8OBa4 AccountingQB@brilloco.com: agilbert@hixworks.com: alagha.ahmad@gmail.com: ajurkovic@iinet.net.au: ageorgiev86@yandex.ru:dIYk0ONb alamrozek@interia.eu: akolanupaka@gmail.com: Albert.Lau@eastwestbank.com: alain_delongchamp@yahoo.com: Alemannia@gmx.com: akisa5577@gmail.com: alektron@aol.com: albertrodriguez28@yahoo.com: amendol1@verizon.net: abrarahmed325@yahoo.com: AMERAHMED19@GMAIL.COM: andreas.toerpel@web.de alert@infoplasticsurgery.com: alizenel@outlook.com: aldis@hostnet.lv: althielman@live.com: ALJOAMAYA@GMAIL.COM: alan.james68@icloud.com: alfred.kum@gmail.com: andreaszerbes@gmail.com: altumbabicnahid@gmail.com: andrew.chaveriat@gmail.com: aman.di@hotmail.com: andreas.toerpel@web.de: anisessaid5@gmail.com: andpanagiotop@gmail.com: ascrowe@wyoming.com: arash@42uag.com: anuvu@ymail.com: andrew.harnaga@hotmail.com: andrewdonnellyjr@aol.com:qu48OcaN argoman@hotmail.co.uk: alexrossouw196@gmail.com: andrzej.wencel@yahoo.com: arolaxinvestor@gmail.com: antuzla@outlook.com: asmoonlight@yandex.ru: atinton@hotmail.com: arkadyokrezna@gmail.com: anglinpaul@hotmail.com: balsara@icloud.com: antydoe@gmail.com: alistair@hexcollective.co.uk: ashley.brown@hushmail.com: axel@aadaum.de: azeezb22@gmail.com: artallison@aol.com: Badykshanov@gmail.com: andrew@ezestream.com.au: attention109@yahoo.com: ash-1989-@hotmail.com: barnettos@yahoo.com:e38Ldp5C bartekkuchnik@gmail.com: b.costin23@gmail.com: azyk1@yahoo.com: b.rowsell@bell.net: avysotsky@ukr.net: Berganphoto@aol.com: banksdw@slu.edu: BBJMcorp@aol.com: banking5151@gmail.com: bddoliveiro@gmail.com: bartir@hotmail.com: bcteo@pegasus-it.com.sg: arunasaste@gmail.com: blansford@LAMTexas.trade: BEDONEISM@HOTMAIL.COM: bimleshkumar@live.in: bengel1975@msn.com: blberger9@comcast.net: bobrabcd@frontier.com: baratina@gmx.net: bigblckdg@aol.com: bleda2_ju21@hotmail.com: bertfrigo@gmail.com: billsilk@ozemail.com.au: bobmedanovic@yahoo.com: bohetsj@gmail.com: banking5150@gmail.com: blansford@lrshouston.com:fKBm16Pd boothmark71@hotmail.com:bFVi84Kx bobsoneau@yahoo.com.au: brumbypat@hotmail.com: bohdarom@sbcglobal.net: bjh@yesyes.net: barakgr@live.com: braykm01@yahoo.com: bru.nico@alice.it: brooksforex1529@yahoo.com: carlo.paniccia@hotmail.com: bobwhite1946@yahoo.com: brianchatting@yahoo.co.uk: brchio@hotmail.com: boonwee.hong@gmail.com: cagoldman2005@yahoo.com: beamugt@yahoo.com: carlcrabill@yahoo.com: bowwybowwy@gmail.com: booner2k@gmail.com: camillopoland@gmail.com: carlplunkett@hotmail.com: cbenjamin@cisolaw.com: bobs114@yahoo.com.au: bstarling@gmx.com: casstlem@yahoo.com.au: botha.qatar@yahoo.com: cary.northup@gmail.com: bsrsolutions10@gmail.com: boss_yuran@mail.ru: ccollins@semo.net:yd72XkjW cemedia@aol.com: cdudek60@gmail.com: cdb07d@gmail.com: cgsinvest@aol.com: huynhngoccuong@gmail.com: info@simmtec.com: ia_sho@abv.bg: haleelg@gmail.com: gratica@att.net:gKb4EQp1 george@georgeharrison1.com:cgw3AMl8 hasco@personainternet.com: Hassamqazi7@gmail.com: ihssass@hotmail.com: idrzewicz@icloud.com:w0Re72Ht
- Regarding your question
3
, the Terry’s request seems justified :
Can you answer, what do you want to do with the lines that have characters after the :, that was your #3 question.
Indeed, you said :
3/ How to choose the lines that have characters after “:”?
But, once your lines are “chosen”, what next ?!
Now, it you want to easily point out these specific lines you could use the Mark feature :
-
Click on the
Search > Mark...
menu option -
SEARCH
(?-s):.+
-
Tick the
Bookmark line
,Purge for earch search
andWrap around
options -
Of course, select the
Regular expression
search mode -
Click on the
Mark All
button
=> The lines, containing text after the
:
char, are bookmarked with a blue circle, and the text matched is highlighted in red !- Then, some operations are possible on these bookmarked lines. Just select the sub-menu
Search > Bookmark
For instance, using the
Copy Bookmark Lines
option, then a paste operation, here is the14
-lines list, from the modified text, without duplicate lines :abradbery@gmail.com:q74Xpc0O a_cameronsse@hotmail.com:jof6IutH a.tworowski@o2.pl:sXOa61Dq aipunts@yahoo.co.uk:pul8OBa4 ageorgiev86@yandex.ru:dIYk0ONb andrewdonnellyjr@aol.com:qu48OcaN barnettos@yahoo.com:e38Ldp5C blansford@lrshouston.com:fKBm16Pd boothmark71@hotmail.com:bFVi84Kx ccollins@semo.net:yd72XkjW cgsinvest@aol.com: gratica@att.net:gKb4EQp1 george@georgeharrison1.com:cgw3AMl8 idrzewicz@icloud.com:w0Re72Ht
Best Regards,
guy038
P.S. :
Once we are sure that your goals are achieved, we can give you some explanations on the regular expressions used ;-))
- Regarding your question
-
@guy038 said in Type of duplicate lines:
Regarding your question 2, to delete line with an unique colon char, the Terry’s regex S/R is NOT :
@Sarah-Duong I think I see a problem you are having with the Regexes. Using Google translator is introducing spaces in any characters not determined to be words. Thus I used your original regexes in your first post and translated from English to Italian and this is what i got:
We can see that the original does NOT contain spaces, the translation does. So you need to be careful copying the regexes back to your language. by all means copy and translate so you can read our words to you. BUT, do not try to do the same for the regexes. Copy those and paste directly into NPP!
@guy038 I like your regex to remove duplicates, I had considered that but as I’ve said before I hate forward lookups due to the issue of it possibly failing completely. As a test I copied the examples (600+ lines) and made lots of copies in the same file. I got up to just shy of 200K lines and still the regex worked, I gave up trying to determine the limit at that point. Perhaps I’m being a bit harsh on that function!
Cheers
Terry -
Hello, @sarah-duong, @Terry-r and All,
Terry, I would like to emphasize, in this post, the importance of choosing the right type of quantifier ( greedy, lazy or atomic ) in a regular expression !
-
From the initial text of @sarah-duong, above, which contains
623
lines, I duplicated it325
times and I added a final line-break, at the very end of this test file. So, I obtained a file of202,475
lines, for5,016,375
bytes -
Then, applying the regex S/R, with the lazy quantifier
+?
:-
SEARCH
(?-s)^(.+\R)(?=(?s).+?^\1)
-
REPLACE
Leave EMPTY
-
against this large text, I did get, after clicking on the
Replace All
button and202,323
replacements ( In fact, suppressions ! ), in3mn 57s,
, on my old XP laptop, the very short expected text, of3,711
bytes long, containing the152
lines, all different !-
Then, applying the regex S/R, where I changed the lazy quantifier
+?
, in the look-ahead, with the usual greedy quantifier+
:-
SEARCH
(?-s)^(.+\R)(?=(?s).+^\1)
-
REPLACE
Leave EMPTY
-
against this same text, even after a
1-hour
process about, no result occurred, although Notepad++ did not seem to get stuck !?
So, I decided to run, again, this regex S/R, at
10h45
about, expecting a correct result, after some hours, when I was back home ! By chance, the process did stop, in the evening and has correctly deleted202,323
lines, giving the expecting final file of3,711
bytes long and152
lines ;-))In order to know the exact time, used to execute these
202,323
replacements, by Notepad++, I simply opened the Process-Hackerv2.39.124
utility, double-clicked on the Notepad++ process to get its properties and, then, clicked, in the Threads tab, on the one with start address =notepad++0x12ab7b
. See, below :=> The sum of the Kernel and User times, minus
2s
about, due to N++ startup, indicates the time of the S/R :6h 17m 52s
!! Compare with the previous time of3m 57s
, as shown below ;-))
So, guys, could you repeat these two regex S/R, to know, even approximatively, with your configuration and OS, the time to process this
UTF-8
test file, containing325
times the initial text of @sarah-duong. So, a total of202,475
lines and5,016,375
bytes ! I quite curious of the results ;-))Best Regards,
guy038
-
-
@guy038 said in Type of duplicate lines:
So, guys, could you repeat these two regex S/R
The PC config I tested on is Windows 10 64bit version 1607 (2016 LTSB) with a Intel i5-8600 and 8GB RAM.
I used NPP 7.8.5 64bit.
As requested I ran both your “lazy” and the “greedy” regexes. I had the exact same test file as you (same lines and byte size). The lazy S/R produced a time of 1m 23s. The greedy S/R produced the same result with a time of 1hr 54m (accurate to a minute only).
Given your “old XP” system produced the 237s time for lazy and mine 83s, that’s a ratio of 0.35. If I do the same with the greedy S/R, given your time of 6h 17m 52s mine should be around 2hr 12m. My actual time of 1hr 54m is not too dissimiliar from that. So perhaps we can consider the speed of the regexes is “mostly” independent of OS version or even possibly CPU type, possibly little efficiencies in newer OS or CPU builds.
I suppose the testing phase in the lookaheads is what takes the time. For the greedy regex, read all characters until end of file, test then drop 1 character, repeat until a solution found. Whereas the lazy regex just grabs 1 character then tests, continue until a solution found.
As this test file has “at least” 325 copies of each and every “original” line the test is possibly not a very accurate one. Indeed the lookahead is “guaranteed” of finding a match no more than 623 lines ahead. In this situation the lookahead is never going to “fail” as we have seen in bigger files with a sparse number of duplicates.
Terry
-
@Terry-R said in Type of duplicate lines:
So perhaps we can consider the speed of the regexes is “mostly” independent of OS version or even possibly CPU type
I should perhaps elaborate on this statement. I realise that the mere fact that my test results for only about 1/3rd that of @guy038 means there are efficiencies in OS and CPU, but I contend that they are mostly to do with GHz speed of the CPU, rather than microcode efficiencies.
On “old XP” system would likely have a dual core (maybe a quad core) CPU with speed in the low GHz range. The i5-8600 has 6 cores with a speed of 3.1GHz.
Could we suggest that the results have more to do with the number of cores and GHz speed, than efficiencies in microcode or the actual CPU design?
Or is the question irrelevant as the answer to everything is “more speed/horsepower”!
Terry