Remove duplicate strings with comma separator



  • Hi all,
    i would like to find and remove duplicate records of some Archives i have in a text file.
    The text file looks like this;
    AA,
    OA,PA,PC,TA,TB,TC,TG,
    OA,PA,PB,PC,RK,TA,TB,TC,TG,X0,X1,
    AA,ED,OA,RK,PA,PB,PT,PC,TA,TB,TC,TG,
    AA,OA,RK,PA,PB,PC,TA,TB,TC,TG,X0,X1,
    AA,ZD,
    AA,
    AA,
    HA,
    AA,HA,
    HA,
    RA,RB,RD,RE,RF,RG,RH,RI,Y1,Y2,Y3,
    RA,RB,RD,RE,RF,RG,RH,RI,RL,RK,X0,X1,Y1,Y2,Y3,
    AA,RA,RB,RD,RE,RF,RG,RH,RI,RK,Y1,Y2,Y3,YA,
    RA,RB,RD,RE,RF,RG,RH,RI,RK,Y1,Y2,Y3,
    CA,CB,EA,EB,EC,ED,PB,VA,
    CA,CB,AA,K1,EA,EB,EC,ED,VA,X0,X1,
    AA,CA,CB,RK,EA,EB,EC,ED,PB,VA,
    AA,CA,CB,RK,EA,EB,EC,ED,VA,X5,X6,
    FA,FB,
    K1,CA,RA,
    AA,FA,FB,K1,CA,RA,
    FA,FB,

    I havent found some solution in the existing posts so im trying a new one.
    Maybe someone can help.

    Preferably i would like to remove the duplicate archives like AA for example and leave only the first or last.
    also it would be nice to delete the commatas and maybe place all remaining records in separate lines.
    Like this:
    AA
    FA
    FB
    The last two are only luxury wishes, so not very important.

    Thx in advance



  • @Saki-Soulimenas

    There are many postings in this community about removing duplicate lines from unsorted files. Here’s one…see my posting that starts out “I’m glad you have a solution…”. Doing a Regular Expression replacement operation with the Find what expression found there, and specifying an EMPTY Replace with box should do what you want.

    For the second part (your luxury wish), you can turn this:

    K1,CA,RA,
    

    into

    K1
    CA
    RA
    

    with this replace operation:

    Find what zone: ,
    Replace with zone: \r\n
    Search mode: Regular expression



  • Hi Scott,
    thx very much for your answer.
    A solution like yours for removing duplicate lines i did find.
    But as i said i would like to remove duplicate records of my archives .

    Ill point out the duplicates of one archive from my example
    AA,
    AA,
    HA,
    AA,HA,
    HA,
    RA,RB,RD,RE,RF,RG,RH,RI,Y1,Y2,Y3,
    RA,RB,RD,RE,RF,RG,RH,RI,RL,RK,X0,X1,Y1,Y2,Y3,
    AA,RA,RB,RD,RE,RF,RG,RH,RI,RK,Y1,Y2,Y3,YA,
    RA,RB,RD,RE,RF,RG,RH,RI,RK,Y1,Y2,Y3,
    CA,CB,EA,EB,EC,ED,PB,VA,
    CA,CB,AA,K1,EA,EB,EC,ED,VA,X0,X1,
    AA,CA,CB,RK,EA,EB,EC,ED,PB,VA,
    AA,CA,CB,RK,EA,EB,EC,ED,VA,X5,X6,
    FA,FB,
    K1,CA,RA,
    AA,FA,FB,K1,CA,RA,
    FA,FB,

    as you can see not all lines are the same where the archive AA lies.
    Some of the duplicates dont even lie on the first place of a line.

    So i would need an expression that would find Duplicates with exactly 2 letters without considering the commatas.
    So it is a bit more difficult then other problems.

    Thank you also for your other tip.



  • OK, Problem solved… i needed to wake up first. :)
    I’ll use your comma replacer first and then i can remove duplicate lines.

    Perfect! Thank you very much.
    Have a nice day


Log in to reply