Type of duplicate lines



  • Type of duplicate lines
    My text file:
    :
    :
    4xtrader@tpg.com.au:
    2emajnllc@gmail.com:
    2emajnllc@gmail.com:
    2emajnllc@gmail.com:
    127victor@cox.net:
    1talo@bluewin.ch:
    10241024simon@gmail.com:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    :
    4xtrader@tpg.com.au:
    abradbery@gmail.com: q74Xpc0O
    abradbery@gmail.com: q74Xpc0O
    abradbery@gmail.com: q74Xpc0O
    abradbery@gmail.com: q74Xpc0O
    abrahamvthomas@hotmail.com:
    abrahamvthomas@hotmail.com:
    tomaszt1969@hotmail.com:u0Pct21s
    vernon.lee.888@gmail.com:bFVi04Ly
    v.o.l8@mail.ru:kof6Iut2
    vaughan089@hotmail.com:wc61VihU
    virtexed@gmail.com:
    voytek@comcast.net:
    vlado@3crew.net:g3Bn165d

    My file has over 30,000 lines.

    1 / How to delete duplicate lines? I have used many expressions but it seems to help only when my file is limited to 10,000 lines. When my file has about 20,000 lines or more, the expressions are almost impossible to respond. For example :
     ^ (. *?) $ \ s +? ^ (? =. * ^ \ 1 $)
      (? -s) ^ (. + \ R) (? = (? s:. *) ^ \ 1)
      (? -s) ^ (. +) \ R (? s) (? =. * \ R \ 1 \ R?)
      (? -s) ^ … (. + \ R) (? s) (?!. * \ R (? - s) … \ 1)
    Why can’t there be only one form of expression to support the elimination of duplicate lines in my example format?
    2 / Remove lines that have only “:”? (I think the answer to this question is similar to the answer of question 1. The reason I ask more questions here is because I was hesitant to think that the expression above matches the format of the line only. I need to be able to use it in another file.
    3 / How to choose the lines that have characters after “:”?
    Because I use Google to translate into English, hope you can understand my question. Thanks



  • @Sarah-Duong said in Type of duplicate lines:

    How to delete duplicate lines?

    I’m wondering if you noticed there is a built-in feature that will “remove consecutive duplicate lines”. The option is under Edit, then Line operations.

    As for the 2nd question I presume the line ONLY contains the : character. If so use the replace function (Search, then Replace) with
    Find What:^:\R
    Replace With: nothing in this field.

    Not sure what your intention is with question #3. As a suggestion, supply some examples so we may better help you.

    Hopefully these answers will help.

    Terry



  • When I follow your instructions, it absolutely doesn’t help me eliminate duplicate lines. I just noticed it removed the unique “:” line. The second thing you say: using the alternative method, this only helps to replace one other character. I mean, want to remove them with an expression. My question 3 is to search for lines that have the following character “:”. In the above example, I just selected the lines:
    abradbery@gmail.com: q74Xpc0O
    tomaszt1969@hotmail.com: u0Pct21s
    vernon.lee.888@gmail.com: bFVi04Ly
    v.o.l8@mail.ru: kof6Iut2
    vaughan089@hotmail.com: wc61VihU
    vlado@3crew.net: g3Bn165d
    I do not know how to post my text file. You can guide me so I can post it here, you will soon understand my question.



  • @Terry-R If I use the alternative method as you said: All other lines with “:” will be removed. I meant to remove only lines that have only “:”. The lines that have the characters before and after the “:” are all kept. Do you understand me ???Take a closer look at my example, in my example there are only lines with “:”



  • @Sarah-Duong said in Type of duplicate lines:

    doesn’t help me eliminate duplicate lines

    I copied some of your examples and the built-in function “remove consecutive line” DID remove the duplicates, leaving just 1 of each group. If it’s not working for you then something else (NPP environment maybe, your examples aren’t the same as you have in the file, or the file type) is affecting the result.

    To post examples without the editor changing it use the information within the FAQ, specifically FAQ Desk: Request for Help without sufficient information to help you
    Often issues like you are having are found to be data related, such as information in the real data is not reflected in the examples. You will see some posts have their examples in a black window, the FAQ tells you how to do that.

    Terry



  • @Terry-R Many thanks for the support from you. However, as a newbie, I still have many shortcomings when posting questions. But what I posted on this topic is copied from my text file. I really want to find a way to fully upload it but I don’t know how to manipulate it. I have no idea what your answers are. Just because it doesn’t really help me. Hope you understand. Thanks, again



  • @Sarah-Duong said in Type of duplicate lines:

    what I posted on this topic is copied from my text file

    I understand that you may have copied the examples, however the editor you type in can interpret some characters and change the way examples show. See what I’ve done with your example below.

    4xtrader@tpg.com.au:
    2emajnllc@gmail.com:
    2emajnllc@gmail.com:
    2emajnllc@gmail.com:
    127victor@cox.net:
    1talo@bluewin.ch:
    10241024simon@gmail.com:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    

    I created this black window by using the ` character 3 times on a line before and a line after the examples. On an english keyboard it will generally be the key right before the number 1 key.

    We (all) on this forum are happy to help, however to do so you do need to be able to understand the questions we ask and be able to give good responses.

    Terry



  • @Terry-R That is, if I want to post part of my text file here, I need to manipulate : Type ` character 3 times on a line before and a line after the examples? And will it be displayed on black window, same your ?



  • @Sarah-Duong said in Type of duplicate lines:

    ype 3 times the " "followed

    You will see that your post has been changed by the editor and the special character is missing. The character is like a backward facing quote character, often on the same key as ~ is.
    So 3 of this character on a line by themselves
    next line is the examples
    next line by themselves the same 3 characters.

    We don’t need all 20000 lines, but about 20-30 lines showing all 3 types of lines you want changed will be sufficient.

    Terry



  • @Terry-R If I have multiple lines, I have to type multiple ampersands `? Or do I just need to type it at the beginning and at the end of the text file content?



  • the 3 characters on a line. Then next line starts the examples for as many lines as necessary. the next line below the last line of characters is the 3 special characters again. So between the 2 lines of special characters are as many lines as you want.

    Terry



  • @Terry-R and here my text :
    1 / How to delete duplicate lines?
    2/ Remove lines that have only “:”?
    3/ How to choose the lines that have characters after “:”?

    :
    :
    :
    4xtrader@tpg.com.au:
    2emajnllc@gmail.com:
    127victor@cox.net:
    1talo@bluewin.ch:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    a.tworowski@o2.pl:sXOa61Dq
    4xtrader@tpg.com.au:
    1talo@bluewin.ch:
    a.tworowski@o2.pl:sXOa61Dq
    a_cameronsse@hotmail.com:jof6IutH
    4xtrader@tpg.com.au:
    a_nizam2032@yahoo.com:
    aaaerealty@yahoo.com:
    a.tworowski@o2.pl:sXOa61Dq
    aaaerealty@yahoo.com:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    :
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    a.tworowski@o2.pl:sXOa61Dq
    4xtrader@tpg.com.au:
    1talo@bluewin.ch:
    2emajnllc@gmail.com:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    4xtrader@tpg.com.au:
    abradbery@gmail.com:q74Xpc0O
    abrahamvthomas@hotmail.com:
    a.tworowski@o2.pl:sXOa61Dq
    :
    aaron.r.cameron@gmail.com:
    aaaerealty@yahoo.com:
    aaron.r.cameron@gmail.com:
    :
    abradbery@gmail.com:q74Xpc0O
    aaron.r.cameron@gmail.com:
    abrahamvthomas@hotmail.com:
    abradbery@gmail.com:q74Xpc0O
    4xtrader@tpg.com.au:
    a.tworowski@o2.pl:sXOa61Dq
    abrahamvthomas@hotmail.com:
    10241024simon@gmail.com:
    1talo@bluewin.ch:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    a-al-khaledi@hotmail.com:
    abdullah.al.hajri0001@gmail.co:
    a_cameronsse@hotmail.com:jof6IutH
    abdullah.al.hajri0001@gmail.co:
    abrahamvthomas@hotmail.com:
    abdullah.al.hajri0001@gmail.co:
    aaaerealty@yahoo.com:
    abrahamvthomas@hotmail.com:
    abdullah.al.hajri0001@gmail.co:
    aaaerealty@yahoo.com:
    abrahamvthomas@hotmail.com:
    abrarahmed325@yahoo.com:
    4xtrader@tpg.com.au:
    abrarahmed325@yahoo.com:
    4xtrader@tpg.com.au:
    abrahamvthomas@hotmail.com:
    4xtrader@tpg.com.au:
    abrarahmed325@yahoo.com:
    ac5.thomas@btinternet.com:
    AccountingQB@brilloco.com:
    abrahamvthomas@hotmail.com:
    4xtrader@tpg.com.au:
    abrahamvthomas@hotmail.com:
    abrarahmed325@yahoo.com:
    10241024simon@gmail.com:
    abrahamvthomas@hotmail.com:
    adelaideairportshuttles@gmail.:
    adgrant6180@yahoo.com.au:
    abrarahmed325@yahoo.com:
    abrahamvthomas@hotmail.com:
    ac5.thomas@btinternet.com:
    AccountingQB@brilloco.com:
    abrarahmed325@yahoo.com:
    AccountingQB@brilloco.com:
    abrahamvthomas@hotmail.com:
    AccountingQB@brilloco.com:
    abrarahmed325@yahoo.com:
    AccountingQB@brilloco.com:
    abrahamvthomas@hotmail.com:
    a.tworowski@o2.pl:sXOa61Dq
    abrarahmed325@yahoo.com:
    aaaerealty@yahoo.com:
    agarwalgaura@gmail.com:
    abrahamvthomas@hotmail.com:
    agarwalgaura@gmail.com:
    agaskill@maalnet.com:
    ageorgiev86@yandex.ru:dIYk0ONb
    abrarahmed325@yahoo.com:
    adgrant6180@yahoo.com.au:
    adelaideairportshuttles@gmail.:
    agarwalgaura@gmail.com:
    abrarahmed325@yahoo.com:
    agarwalgaura@gmail.com:
    adgrant6180@yahoo.com.au:
    abrahamvthomas@hotmail.com:
    agilbert@hixworks.com:
    adgrant6180@yahoo.com.au:
    afoto@optonline.net:
    agilbert@hixworks.com:
    afoto@optonline.net:
    agilbert@hixworks.com:
    afoto@optonline.net:
    adelaideairportshuttles@gmail.:
    abrahamvthomas@hotmail.com:
    agilbert@hixworks.com:
    abrahamvthomas@hotmail.com:
    adgrant6180@yahoo.com.au:
    adelaideairportshuttles@gmail.:
    advanced80@xtra.co.nz:
    agilbert@hixworks.com:
    agarwalgaura@gmail.com:
    abrahamvthomas@hotmail.com:
    abrarahmed325@yahoo.com:
    agilbert@hixworks.com:
    aipunts@yahoo.co.uk:pul8OBa4
    agilbert@hixworks.com:
    aaaerealty@yahoo.com:
    agilbert@hixworks.com:
    AccountingQB@brilloco.com:
    agilbert@hixworks.com:
    AccountingQB@brilloco.com:
    aipunts@yahoo.co.uk:pul8OBa4
    afoto@optonline.net:
    agilbert@hixworks.com:
    akisa5577@gmail.com:
    AccountingQB@brilloco.com:
    agilbert@hixworks.com:
    aipunts@yahoo.co.uk:pul8OBa4
    agilbert@hixworks.com:
    abrarahmed325@yahoo.com:
    agilbert@hixworks.com:
    aj0312@my.bristol.ac.uk:
    akisa5577@gmail.com:
    agilbert@hixworks.com:
    akisa5577@gmail.com:
    aipunts@yahoo.co.uk:pul8OBa4
    akisa5577@gmail.com:
    agilbert@hixworks.com:
    alagha.ahmad@gmail.com:
    alain_delongchamp@yahoo.com:
    agilbert@hixworks.com:
    alain_delongchamp@yahoo.com:
    agilbert@hixworks.com:
    akisa5577@gmail.com:
    alain_delongchamp@yahoo.com:
    akisa5577@gmail.com:
    alagha.ahmad@gmail.com:
    AccountingQB@brilloco.com:
    agilbert@hixworks.com:
    akolanupaka@gmail.com:
    alagha.ahmad@gmail.com:
    agilbert@hixworks.com:
    akolanupaka@gmail.com:
    alamrozek@interia.eu:
    alain_delongchamp@yahoo.com:
    alamrozek@interia.eu:
    alagha.ahmad@gmail.com:
    alain_delongchamp@yahoo.com:
    akisa5577@gmail.com:
    alain_delongchamp@yahoo.com:
    akisa5577@gmail.com:
    alain_delongchamp@yahoo.com:
    akisa5577@gmail.com:
    akolanupaka@gmail.com:
    alain_delongchamp@yahoo.com:
    alamrozek@interia.eu:
    alain_delongchamp@yahoo.com:
    alan.james68@icloud.com:
    Albert.Lau@eastwestbank.com:
    alain_delongchamp@yahoo.com:
    Albert.Lau@eastwestbank.com:
    alain_delongchamp@yahoo.com:
    abrarahmed325@yahoo.com:
    alain_delongchamp@yahoo.com:
    alan.james68@icloud.com:
    alamrozek@interia.eu:
    alan.james68@icloud.com:
    ajurkovic@iinet.net.au:
    Albert.Lau@eastwestbank.com:
    alan.james68@icloud.com:
    alamrozek@interia.eu:
    ageorgiev86@yandex.ru:dIYk0ONb
    alamrozek@interia.eu:
    Alemannia@gmx.com:
    alamrozek@interia.eu:
    akolanupaka@gmail.com:
    Alemannia@gmx.com:
    alert@infoplasticsurgery.com:
    alain_delongchamp@yahoo.com:
    Albert.Lau@eastwestbank.com:
    alain_delongchamp@yahoo.com:
    Albert.Lau@eastwestbank.com:
    albertrodriguez28@yahoo.com:
    Alemannia@gmx.com:
    alain_delongchamp@yahoo.com:
    albertrodriguez28@yahoo.com:
    aldis@hostnet.lv:
    alan.james68@icloud.com:
    alexrossouw196@gmail.com:
    alan.james68@icloud.com:
    alexrossouw196@gmail.com:
    Alemannia@gmx.com:
    alexrossouw196@gmail.com:
    akisa5577@gmail.com:
    Albert.Lau@eastwestbank.com:
    aldis@hostnet.lv:
    Albert.Lau@eastwestbank.com:
    alexrossouw196@gmail.com:
    aldis@hostnet.lv:
    alain_delongchamp@yahoo.com:
    alexrossouw196@gmail.com:
    alert@infoplasticsurgery.com:
    alexrossouw196@gmail.com:
    Alemannia@gmx.com:
    akisa5577@gmail.com:
    Alemannia@gmx.com:
    alexrossouw196@gmail.com:
    alert@infoplasticsurgery.com:
    akisa5577@gmail.com:
    alert@infoplasticsurgery.com:
    alektron@aol.com:
    althielman@live.com:
    altumbabicnahid@gmail.com:
    albertrodriguez28@yahoo.com:
    alexrossouw196@gmail.com:
    albertrodriguez28@yahoo.com:
    alexrossouw196@gmail.com:
    alfred.kum@gmail.com:
    alexrossouw196@gmail.com:
    alfred.kum@gmail.com:
    alert@infoplasticsurgery.com:
    alfred.kum@gmail.com:
    aman.di@hotmail.com:
    amendol1@verizon.net:
    alexrossouw196@gmail.com:
    alistair@hexcollective.co.uk:
    alfred.kum@gmail.com:
    alistair@hexcollective.co.uk:
    alfred.kum@gmail.com:
    aman.di@hotmail.com:
    abrarahmed325@yahoo.com:
    aman.di@hotmail.com:
    althielman@live.com:
    AMERAHMED19@GMAIL.COM:
    altumbabicnahid@gmail.com:
    andreas.toerpel@web.de
    alexrossouw196@gmail.com:
    andreaszerbes@gmail.com:
    ALJOAMAYA@GMAIL.COM:
    alert@infoplasticsurgery.com:
    aman.di@hotmail.com:
    altumbabicnahid@gmail.com:
    alexrossouw196@gmail.com:
    andpanagiotop@gmail.com:
    alfred.kum@gmail.com:
    andpanagiotop@gmail.com:
    alistair@hexcollective.co.uk:
    alizenel@outlook.com:
    aldis@hostnet.lv:
    althielman@live.com:
    alfred.kum@gmail.com:
    ALJOAMAYA@GMAIL.COM:
    alistair@hexcollective.co.uk:
    aman.di@hotmail.com:
    andpanagiotop@gmail.com:
    aman.di@hotmail.com:
    alan.james68@icloud.com:
    andrewdonnellyjr@aol.com:qu48OcaN
    andrzej.wencel@yahoo.com:
    alfred.kum@gmail.com:
    andrew.harnaga@hotmail.com:
    andreas.toerpel@web.de:
    alexrossouw196@gmail.com:
    andrew.harnaga@hotmail.com:
    andreaszerbes@gmail.com:
    andrew@ezestream.com.au:
    andrew.harnaga@hotmail.com:
    altumbabicnahid@gmail.com:
    andreas.toerpel@web.de:
    andrewdonnellyjr@aol.com:qu48OcaN
    andrzej.wencel@yahoo.com:
    andrew.harnaga@hotmail.com:
    anglinpaul@hotmail.com:
    andrew.chaveriat@gmail.com:
    alexrossouw196@gmail.com:
    aman.di@hotmail.com:
    andreas.toerpel@web.de:
    antydoe@gmail.com:
    anisessaid5@gmail.com:
    andrew@ezestream.com.au:
    andrew.harnaga@hotmail.com:
    andrewdonnellyjr@aol.com:qu48OcaN
    andreas.toerpel@web.de:
    antydoe@gmail.com:
    arash@42uag.com:
    arolaxinvestor@gmail.com:
    antydoe@gmail.com:
    arolaxinvestor@gmail.com:
    artallison@aol.com:
    anisessaid5@gmail.com:
    andreas.toerpel@web.de:
    anisessaid5@gmail.com:
    anglinpaul@hotmail.com:
    andrew.harnaga@hotmail.com:
    antuzla@outlook.com:
    antydoe@gmail.com:
    andpanagiotop@gmail.com:
    ascrowe@wyoming.com:
    arunasaste@gmail.com:
    ash-1989-@hotmail.com:
    andrzej.wencel@yahoo.com:
    anglinpaul@hotmail.com:
    ash-1989-@hotmail.com:
    arash@42uag.com:
    anuvu@ymail.com:
    andrew.harnaga@hotmail.com:
    antydoe@gmail.com:
    artallison@aol.com:
    andrew.harnaga@hotmail.com:
    andrewdonnellyjr@aol.com:qu48OcaN
    anglinpaul@hotmail.com:
    ash-1989-@hotmail.com:
    arunasaste@gmail.com:
    argoman@hotmail.co.uk:
    attention109@yahoo.com:
    alexrossouw196@gmail.com:
    antuzla@outlook.com:
    attention109@yahoo.com:
    andrzej.wencel@yahoo.com:
    arunasaste@gmail.com:
    arolaxinvestor@gmail.com:
    antuzla@outlook.com:
    asmoonlight@yandex.ru:
    attention109@yahoo.com:
    asmoonlight@yandex.ru:
    ash-1989-@hotmail.com:
    atinton@hotmail.com:
    avysotsky@ukr.net:
    arkadyokrezna@gmail.com:
    axel@aadaum.de:
    avysotsky@ukr.net:
    arunasaste@gmail.com:
    azyk1@yahoo.com:
    ash-1989-@hotmail.com:
    azyk1@yahoo.com:
    anglinpaul@hotmail.com:
    azyk1@yahoo.com:
    b.costin23@gmail.com:
    arunasaste@gmail.com:
    ash-1989-@hotmail.com:
    avysotsky@ukr.net:
    attention109@yahoo.com:
    avysotsky@ukr.net:
    ash-1989-@hotmail.com:
    attention109@yahoo.com:
    avysotsky@ukr.net:
    azyk1@yahoo.com:
    Badykshanov@gmail.com:
    b.costin23@gmail.com:
    Badykshanov@gmail.com:
    arunasaste@gmail.com:
    avysotsky@ukr.net:
    balsara@icloud.com:
    banking5150@gmail.com:
    antydoe@gmail.com:
    alistair@hexcollective.co.uk:
    avysotsky@ukr.net:
    arunasaste@gmail.com:
    ash-1989-@hotmail.com:
    b.costin23@gmail.com:
    ashley.brown@hushmail.com:
    Badykshanov@gmail.com:
    b.costin23@gmail.com:
    avysotsky@ukr.net:
    attention109@yahoo.com:
    banking5150@gmail.com:
    b.costin23@gmail.com:
    axel@aadaum.de:
    b.costin23@gmail.com:
    banking5151@gmail.com:
    azeezb22@gmail.com:
    artallison@aol.com:
    b.costin23@gmail.com:
    b.rowsell@bell.net:
    avysotsky@ukr.net:
    banking5150@gmail.com:
    Badykshanov@gmail.com:
    banking5150@gmail.com:
    avysotsky@ukr.net:
    Badykshanov@gmail.com:
    andrew@ezestream.com.au:
    attention109@yahoo.com:
    ash-1989-@hotmail.com:
    baratina@gmx.net:
    barnettos@yahoo.com:e38Ldp5C
    bartekkuchnik@gmail.com:
    baratina@gmx.net:
    bartir@hotmail.com:
    banking5150@gmail.com:
    b.costin23@gmail.com:
    banking5151@gmail.com:
    banksdw@slu.edu:
    azyk1@yahoo.com:
    banking5150@gmail.com:
    banking5151@gmail.com:
    barakgr@live.com:
    banksdw@slu.edu:
    arunasaste@gmail.com:
    b.rowsell@bell.net:
    banking5151@gmail.com:
    barakgr@live.com:
    avysotsky@ukr.net:
    banking5151@gmail.com:
    barakgr@live.com:
    Berganphoto@aol.com:
    bertfrigo@gmail.com:
    bengel1975@msn.com:
    bertfrigo@gmail.com:
    banksdw@slu.edu:
    bartir@hotmail.com:
    banking5151@gmail.com:
    banking5150@gmail.com:
    banksdw@slu.edu:
    bimleshkumar@live.in:
    bjh@yesyes.net:
    bartir@hotmail.com:
    banking5150@gmail.com:
    bcteo@pegasus-it.com.sg:
    BBJMcorp@aol.com:
    banking5151@gmail.com:
    BEDONEISM@HOTMAIL.COM:
    bengel1975@msn.com:
    BEDONEISM@HOTMAIL.COM:
    beamugt@yahoo.com:
    bddoliveiro@gmail.com:
    beamugt@yahoo.com:
    bartir@hotmail.com:
    BEDONEISM@HOTMAIL.COM:
    bjh@yesyes.net:
    baratina@gmx.net:
    blansford@lrshouston.com:fKBm16Pd
    barakgr@live.com:
    bcteo@pegasus-it.com.sg:
    bjh@yesyes.net:
    arunasaste@gmail.com:
    bjh@yesyes.net:
    blansford@LAMTexas.trade:
    blansford@lrshouston.com:fKBm16Pd
    bengel1975@msn.com:
    blansford@lrshouston.com:fKBm16Pd
    BEDONEISM@HOTMAIL.COM:
    bobs114@yahoo.com.au:
    bimleshkumar@live.in:
    blansford@lrshouston.com:fKBm16Pd
    bjh@yesyes.net:
    barakgr@live.com:
    bobs114@yahoo.com.au:
    bertfrigo@gmail.com:
    bengel1975@msn.com:
    bobs114@yahoo.com.au:
    blansford@lrshouston.com:fKBm16Pd
    bobsoneau@yahoo.com.au:
    bobwhite1946@yahoo.com:
    barakgr@live.com:
    blberger9@comcast.net:
    blansford@lrshouston.com:fKBm16Pd
    bohdarom@sbcglobal.net:
    bobrabcd@frontier.com:
    baratina@gmx.net:
    bobsoneau@yahoo.com.au:
    blansford@lrshouston.com:fKBm16Pd
    bobsoneau@yahoo.com.au:
    bertfrigo@gmail.com:
    bigblckdg@aol.com:
    bobwhite1946@yahoo.com:
    bleda2_ju21@hotmail.com:
    bohdarom@sbcglobal.net:
    boonwee.hong@gmail.com:
    boss_yuran@mail.ru:
    bertfrigo@gmail.com:
    boss_yuran@mail.ru:
    billsilk@ozemail.com.au:
    bobmedanovic@yahoo.com:
    bobsoneau@yahoo.com.au:
    bohetsj@gmail.com:
    bobs114@yahoo.com.au:
    banking5150@gmail.com:
    bobs114@yahoo.com.au:
    boonwee.hong@gmail.com:
    bohdarom@sbcglobal.net:
    boss_yuran@mail.ru:
    boothmark71@hotmail.com:bFVi84Kx
    boss_yuran@mail.ru:
    bobs114@yahoo.com.au:
    blansford@lrshouston.com:fKBm16Pd
    boss_yuran@mail.ru:
    botha.qatar@yahoo.com:
    bobwhite1946@yahoo.com:
    botha.qatar@yahoo.com:
    blansford@lrshouston.com:fKBm16Pd
    boothmark71@hotmail.com:bFVi84Kx
    boss_yuran@mail.ru:
    bobsoneau@yahoo.com.au:
    boss_yuran@mail.ru:
    bobwhite1946@yahoo.com:
    boss_yuran@mail.ru:
    botha.qatar@yahoo.com:
    bowwybowwy@gmail.com:
    brooksforex1529@yahoo.com:
    bru.nico@alice.it:
    boss_yuran@mail.ru:
    bru.nico@alice.it:
    brumbypat@hotmail.com:
    bohdarom@sbcglobal.net:
    bjh@yesyes.net:
    boss_yuran@mail.ru:
    camillopoland@gmail.com:
    barakgr@live.com:
    boss_yuran@mail.ru:
    brchio@hotmail.com:
    braykm01@yahoo.com:
    bru.nico@alice.it:
    brchio@hotmail.com:
    brooksforex1529@yahoo.com:
    bsrsolutions10@gmail.com:
    carlo.paniccia@hotmail.com:
    carlplunkett@hotmail.com:
    bobwhite1946@yahoo.com:
    brianchatting@yahoo.co.uk:
    carlplunkett@hotmail.com:
    brchio@hotmail.com:
    botha.qatar@yahoo.com:
    carlplunkett@hotmail.com:
    boonwee.hong@gmail.com:
    bowwybowwy@gmail.com:
    boonwee.hong@gmail.com:
    bobs114@yahoo.com.au:
    cagoldman2005@yahoo.com:
    boss_yuran@mail.ru:
    beamugt@yahoo.com:
    botha.qatar@yahoo.com:
    carlplunkett@hotmail.com:
    botha.qatar@yahoo.com:
    cary.northup@gmail.com:
    carlplunkett@hotmail.com:
    cary.northup@gmail.com:
    carlplunkett@hotmail.com:
    carlcrabill@yahoo.com:
    carlplunkett@hotmail.com:
    camillopoland@gmail.com:
    bowwybowwy@gmail.com:
    carlplunkett@hotmail.com:
    ccollins@semo.net:yd72XkjW
    carlplunkett@hotmail.com:
    booner2k@gmail.com:
    casstlem@yahoo.com.au:
    camillopoland@gmail.com:
    cary.northup@gmail.com:
    carlplunkett@hotmail.com:
    cary.northup@gmail.com:
    ccollins@semo.net:yd72XkjW
    cary.northup@gmail.com:
    ccollins@semo.net:yd72XkjW
    cary.northup@gmail.com:
    ccollins@semo.net:yd72XkjW
    cdb07d@gmail.com:
    botha.qatar@yahoo.com:
    cbenjamin@cisolaw.com:
    casstlem@yahoo.com.au:
    botha.qatar@yahoo.com:
    bobs114@yahoo.com.au:
    bsrsolutions10@gmail.com:
    ccollins@semo.net:yd72XkjW
    bstarling@gmx.com:
    cary.northup@gmail.com:
    ccollins@semo.net:yd72XkjW
    cary.northup@gmail.com:
    ccollins@semo.net:yd72XkjW
    casstlem@yahoo.com.au:
    ccollins@semo.net:yd72XkjW
    botha.qatar@yahoo.com:
    cary.northup@gmail.com:
    cdb07d@gmail.com:
    bsrsolutions10@gmail.com:
    cdb07d@gmail.com:
    cdudek60@gmail.com:
    ccollins@semo.net:yd72XkjW
    cdudek60@gmail.com:
    ccollins@semo.net:yd72XkjW
    cdudek60@gmail.com:
    cdb07d@gmail.com:
    boss_yuran@mail.ru:
    cdb07d@gmail.com:
    cdudek60@gmail.com:
    cdb07d@gmail.com:
    cdudek60@gmail.com:
    ccollins@semo.net:yd72XkjW
    cemedia@aol.com:
    cdudek60@gmail.com:
    cdb07d@gmail.com:
    cgsinvest@aol.com: 
    huynhngoccuong@gmail.com:
    idrzewicz@icloud.com:w0Re72Ht
    info@simmtec.com:
    ia_sho@abv.bg:
    idrzewicz@icloud.com:w0Re72Ht
    Hassamqazi7@gmail.com:
    ihssass@hotmail.com:
    idrzewicz@icloud.com:w0Re72Ht
    haleelg@gmail.com:
    gratica@att.net:gKb4EQp1
    george@georgeharrison1.com:cgw3AMl8
    hasco@personainternet.com:
    idrzewicz@icloud.com:w0Re72Ht
    Hassamqazi7@gmail.com:
    ihssass@hotmail.com:
    idrzewicz@icloud.com:w0Re72Ht```


  • I used the line removal method as you instructed, it just shortened it. It seems that it only deletes duplicate lines close to each other. Conversely, if there is a line that is not on the side, it is not completely removed. It only helps to remove lines that have “:” unique. What I need is to delete the lines that have “:” and get the lines with all the characters before and after the “:” sign. I cannot use replacement in this case. Because it will replace all other lines. That I absolutely do not want.



  • My suggestion for removing duplicate lines does NOT work as your data is different to the examples. In examples the duplicates are beside each other, your real data does not have that. If you were to sort your file lexicographically then my suggestion will work.
    I’m not on my pc so cannot confirm the function but I think it’s in the same area as remove duplicate lines.

    I’m sure we can have you sorted with the remaining problems very quickly. Can you answer, what do you want to do with the lines that have characters after the :, that was your #3 question.

    Terry



  • @Terry-R In my text file, there are some lines that only have “:”. How to eliminate them? I cannot use alternative methods

    ^: \ R
    

    Because it replaces the “:” in other lines. You don’t seem to understand what I mean by this sentence.
    In question 3, how to select only the lines with the characters after the “:” example.

    a_cameronsse@hotmail.com: jof6IutH
    abradbery@gmail.com: q74Xpc0O
    a.tworowski@o2.pl: sXOa61Dq
    ageorgiev86@yandex.ru: dIYk0ONb
    aipunts@yahoo.co.uk: pul8OBa4
    

    I don’t know how to make you understand what I mean. Sorry for my limited english



  • Hello, @sarah-duong, @Terry-r and All,

    First of all, thanks for posting a real example of your text. However, I noticed that the very last line is :

    idrzewicz@icloud.com:w0Re72Ht```
    

    And I suppose that the correct line is rather :

    idrzewicz@icloud.com:w0Re72Ht
    

    • Regarding your question 2, to delete line with an unique colon char, the Terry’s regex S/R is NOT :
    SEARCH   ^: \ R
    
    REPLACE  Leave EMPTY
    

    but, indeed :

    SEARCH   ^:\R
    
    REPLACE  Leave EMPTY
    

    Assuming your example, after clicking on the Replace All button, 6 lines, with an unique :, are deleted !


    • Regarding your question 1, to delete duplicates lines, you could use the following regex S/R :

    SEARCH (?-s)^(.+\R)(?=(?s).+?^\1)

    REPLACE Leave EMPTY

    Again, after clicking on the Replace All button, from your initial text of 623 mail addresses, we get, at once, the expected text of 152 mail addresses, all different !

    127victor@cox.net:
    a_nizam2032@yahoo.com:
    2emajnllc@gmail.com:
    aaron.r.cameron@gmail.com:
    abradbery@gmail.com:q74Xpc0O
    1talo@bluewin.ch:
    a-al-khaledi@hotmail.com:
    a_cameronsse@hotmail.com:jof6IutH
    abdullah.al.hajri0001@gmail.co:
    4xtrader@tpg.com.au:
    10241024simon@gmail.com:
    ac5.thomas@btinternet.com:
    a.tworowski@o2.pl:sXOa61Dq
    agaskill@maalnet.com:
    adgrant6180@yahoo.com.au:
    adelaideairportshuttles@gmail.:
    advanced80@xtra.co.nz:
    agarwalgaura@gmail.com:
    abrahamvthomas@hotmail.com:
    aaaerealty@yahoo.com:
    afoto@optonline.net:
    aj0312@my.bristol.ac.uk:
    aipunts@yahoo.co.uk:pul8OBa4
    AccountingQB@brilloco.com:
    agilbert@hixworks.com:
    alagha.ahmad@gmail.com:
    ajurkovic@iinet.net.au:
    ageorgiev86@yandex.ru:dIYk0ONb
    alamrozek@interia.eu:
    akolanupaka@gmail.com:
    Albert.Lau@eastwestbank.com:
    alain_delongchamp@yahoo.com:
    Alemannia@gmx.com:
    akisa5577@gmail.com:
    alektron@aol.com:
    albertrodriguez28@yahoo.com:
    amendol1@verizon.net:
    abrarahmed325@yahoo.com:
    AMERAHMED19@GMAIL.COM:
    andreas.toerpel@web.de
    alert@infoplasticsurgery.com:
    alizenel@outlook.com:
    aldis@hostnet.lv:
    althielman@live.com:
    ALJOAMAYA@GMAIL.COM:
    alan.james68@icloud.com:
    alfred.kum@gmail.com:
    andreaszerbes@gmail.com:
    altumbabicnahid@gmail.com:
    andrew.chaveriat@gmail.com:
    aman.di@hotmail.com:
    andreas.toerpel@web.de:
    anisessaid5@gmail.com:
    andpanagiotop@gmail.com:
    ascrowe@wyoming.com:
    arash@42uag.com:
    anuvu@ymail.com:
    andrew.harnaga@hotmail.com:
    andrewdonnellyjr@aol.com:qu48OcaN
    argoman@hotmail.co.uk:
    alexrossouw196@gmail.com:
    andrzej.wencel@yahoo.com:
    arolaxinvestor@gmail.com:
    antuzla@outlook.com:
    asmoonlight@yandex.ru:
    atinton@hotmail.com:
    arkadyokrezna@gmail.com:
    anglinpaul@hotmail.com:
    balsara@icloud.com:
    antydoe@gmail.com:
    alistair@hexcollective.co.uk:
    ashley.brown@hushmail.com:
    axel@aadaum.de:
    azeezb22@gmail.com:
    artallison@aol.com:
    Badykshanov@gmail.com:
    andrew@ezestream.com.au:
    attention109@yahoo.com:
    ash-1989-@hotmail.com:
    barnettos@yahoo.com:e38Ldp5C
    bartekkuchnik@gmail.com:
    b.costin23@gmail.com:
    azyk1@yahoo.com:
    b.rowsell@bell.net:
    avysotsky@ukr.net:
    Berganphoto@aol.com:
    banksdw@slu.edu:
    BBJMcorp@aol.com:
    banking5151@gmail.com:
    bddoliveiro@gmail.com:
    bartir@hotmail.com:
    bcteo@pegasus-it.com.sg:
    arunasaste@gmail.com:
    blansford@LAMTexas.trade:
    BEDONEISM@HOTMAIL.COM:
    bimleshkumar@live.in:
    bengel1975@msn.com:
    blberger9@comcast.net:
    bobrabcd@frontier.com:
    baratina@gmx.net:
    bigblckdg@aol.com:
    bleda2_ju21@hotmail.com:
    bertfrigo@gmail.com:
    billsilk@ozemail.com.au:
    bobmedanovic@yahoo.com:
    bohetsj@gmail.com:
    banking5150@gmail.com:
    blansford@lrshouston.com:fKBm16Pd
    boothmark71@hotmail.com:bFVi84Kx
    bobsoneau@yahoo.com.au:
    brumbypat@hotmail.com:
    bohdarom@sbcglobal.net:
    bjh@yesyes.net:
    barakgr@live.com:
    braykm01@yahoo.com:
    bru.nico@alice.it:
    brooksforex1529@yahoo.com:
    carlo.paniccia@hotmail.com:
    bobwhite1946@yahoo.com:
    brianchatting@yahoo.co.uk:
    brchio@hotmail.com:
    boonwee.hong@gmail.com:
    cagoldman2005@yahoo.com:
    beamugt@yahoo.com:
    carlcrabill@yahoo.com:
    bowwybowwy@gmail.com:
    booner2k@gmail.com:
    camillopoland@gmail.com:
    carlplunkett@hotmail.com:
    cbenjamin@cisolaw.com:
    bobs114@yahoo.com.au:
    bstarling@gmx.com:
    casstlem@yahoo.com.au:
    botha.qatar@yahoo.com:
    cary.northup@gmail.com:
    bsrsolutions10@gmail.com:
    boss_yuran@mail.ru:
    ccollins@semo.net:yd72XkjW
    cemedia@aol.com:
    cdudek60@gmail.com:
    cdb07d@gmail.com:
    cgsinvest@aol.com: 
    huynhngoccuong@gmail.com:
    info@simmtec.com:
    ia_sho@abv.bg:
    haleelg@gmail.com:
    gratica@att.net:gKb4EQp1
    george@georgeharrison1.com:cgw3AMl8
    hasco@personainternet.com:
    Hassamqazi7@gmail.com:
    ihssass@hotmail.com:
    idrzewicz@icloud.com:w0Re72Ht
    

    • Regarding your question 3, the Terry’s request seems justified :

    Can you answer, what do you want to do with the lines that have characters after the :, that was your #3 question.

    Indeed, you said :

    3/ How to choose the lines that have characters after “:”?

    But, once your lines are “chosen”, what next ?!


    Now, it you want to easily point out these specific lines you could use the Mark feature :

    • Click on the Search > Mark... menu option

    • SEARCH (?-s):.+

    • Tick the Bookmark line , Purge for earch search and Wrap around options

    • Of course, select the Regular expression search mode

    • Click on the Mark All button

    => The lines, containing text after the : char, are bookmarked with a blue circle, and the text matched is highlighted in red !

    • Then, some operations are possible on these bookmarked lines. Just select the sub-menu Search > Bookmark

    For instance, using the Copy Bookmark Lines option, then a paste operation, here is the 14-lines list, from the modified text, without duplicate lines :

    abradbery@gmail.com:q74Xpc0O
    a_cameronsse@hotmail.com:jof6IutH
    a.tworowski@o2.pl:sXOa61Dq
    aipunts@yahoo.co.uk:pul8OBa4
    ageorgiev86@yandex.ru:dIYk0ONb
    andrewdonnellyjr@aol.com:qu48OcaN
    barnettos@yahoo.com:e38Ldp5C
    blansford@lrshouston.com:fKBm16Pd
    boothmark71@hotmail.com:bFVi84Kx
    ccollins@semo.net:yd72XkjW
    cgsinvest@aol.com: 
    gratica@att.net:gKb4EQp1
    george@georgeharrison1.com:cgw3AMl8
    idrzewicz@icloud.com:w0Re72Ht
    

    Best Regards,

    guy038

    P.S. :

    Once we are sure that your goals are achieved, we can give you some explanations on the regular expressions used ;-))



  • @guy038 said in Type of duplicate lines:

    Regarding your question 2, to delete line with an unique colon char, the Terry’s regex S/R is NOT :

    @Sarah-Duong I think I see a problem you are having with the Regexes. Using Google translator is introducing spaces in any characters not determined to be words. Thus I used your original regexes in your first post and translated from English to Italian and this is what i got:
    25fbef85-3da4-4ed5-b692-5db7e7472317-image.png

    We can see that the original does NOT contain spaces, the translation does. So you need to be careful copying the regexes back to your language. by all means copy and translate so you can read our words to you. BUT, do not try to do the same for the regexes. Copy those and paste directly into NPP!

    @guy038 I like your regex to remove duplicates, I had considered that but as I’ve said before I hate forward lookups due to the issue of it possibly failing completely. As a test I copied the examples (600+ lines) and made lots of copies in the same file. I got up to just shy of 200K lines and still the regex worked, I gave up trying to determine the limit at that point. Perhaps I’m being a bit harsh on that function!

    Cheers
    Terry



  • Hello, @sarah-duong, @Terry-r and All,

    Terry, I would like to emphasize, in this post, the importance of choosing the right type of quantifier ( greedy, lazy or atomic ) in a regular expression !

    • From the initial text of @sarah-duong, above, which contains 623 lines, I duplicated it 325 times and I added a final line-break, at the very end of this test file. So, I obtained a file of 202,475 lines, for 5,016,375 bytes

    • Then, applying the regex S/R, with the lazy quantifier +? :

      • SEARCH (?-s)^(.+\R)(?=(?s).+?^\1)

      • REPLACE Leave EMPTY

    against this large text, I did get, after clicking on the Replace All button and 202,323 replacements ( In fact, suppressions ! ), in 3mn 57s,, on my old XP laptop, the very short expected text, of 3,711 bytes long, containing the 152 lines, all different !

    • Then, applying the regex S/R, where I changed the lazy quantifier +?, in the look-ahead, with the usual greedy quantifier + :

      • SEARCH (?-s)^(.+\R)(?=(?s).+^\1)

      • REPLACE Leave EMPTY

    against this same text, even after a 1-hour process about, no result occurred, although Notepad++ did not seem to get stuck !?


    So, I decided to run, again, this regex S/R, at 10h45 about, expecting a correct result, after some hours, when I was back home ! By chance, the process did stop, in the evening and has correctly deleted 202,323 lines, giving the expecting final file of 3,711 bytes long and 152 lines ;-))

    In order to know the exact time, used to execute these 202,323 replacements, by Notepad++, I simply opened the Process-Hacker v2.39.124 utility, double-clicked on the Notepad++ process to get its properties and, then, clicked, in the Threads tab, on the one with start address = notepad++0x12ab7b. See, below :

    a19e86ce-65cf-4fa9-ab2c-7c47da142597-image.png

    => The sum of the Kernel and User times, minus 2s about, due to N++ startup, indicates the time of the S/R : 6h 17m 52s !! Compare with the previous time of 3m 57s, as shown below ;-))

    129b0555-a3e7-45e9-869a-b881ece7fee7-image.png


    So, guys, could you repeat these two regex S/R, to know, even approximatively, with your configuration and OS, the time to process this UTF-8 test file, containing 325 times the initial text of @sarah-duong. So, a total of 202,475 lines and 5,016,375 bytes ! I quite curious of the results ;-))

    Best Regards,

    guy038



  • @guy038 said in Type of duplicate lines:

    So, guys, could you repeat these two regex S/R

    The PC config I tested on is Windows 10 64bit version 1607 (2016 LTSB) with a Intel i5-8600 and 8GB RAM.

    I used NPP 7.8.5 64bit.

    As requested I ran both your “lazy” and the “greedy” regexes. I had the exact same test file as you (same lines and byte size). The lazy S/R produced a time of 1m 23s. The greedy S/R produced the same result with a time of 1hr 54m (accurate to a minute only).

    Given your “old XP” system produced the 237s time for lazy and mine 83s, that’s a ratio of 0.35. If I do the same with the greedy S/R, given your time of 6h 17m 52s mine should be around 2hr 12m. My actual time of 1hr 54m is not too dissimiliar from that. So perhaps we can consider the speed of the regexes is “mostly” independent of OS version or even possibly CPU type, possibly little efficiencies in newer OS or CPU builds.

    I suppose the testing phase in the lookaheads is what takes the time. For the greedy regex, read all characters until end of file, test then drop 1 character, repeat until a solution found. Whereas the lazy regex just grabs 1 character then tests, continue until a solution found.

    As this test file has “at least” 325 copies of each and every “original” line the test is possibly not a very accurate one. Indeed the lookahead is “guaranteed” of finding a match no more than 623 lines ahead. In this situation the lookahead is never going to “fail” as we have seen in bigger files with a sparse number of duplicates.

    Terry



  • @Terry-R said in Type of duplicate lines:

    So perhaps we can consider the speed of the regexes is “mostly” independent of OS version or even possibly CPU type

    I should perhaps elaborate on this statement. I realise that the mere fact that my test results for only about 1/3rd that of @guy038 means there are efficiencies in OS and CPU, but I contend that they are mostly to do with GHz speed of the CPU, rather than microcode efficiencies.

    On “old XP” system would likely have a dual core (maybe a quad core) CPU with speed in the low GHz range. The i5-8600 has 6 cores with a speed of 3.1GHz.

    Could we suggest that the results have more to do with the number of cores and GHz speed, than efficiencies in microcode or the actual CPU design?

    Or is the question irrelevant as the answer to everything is “more speed/horsepower”!

    Terry


Log in to reply