Delete duplicate lines when not sorted alphabetically



  • Hi.

    How do I delete lines that exist more than once, without sorting the lines alphabetically?

    I use this in regex:

    ^(.+?)\R(\1\R?)+

    with this replace:

    \1\r\n

    But I can see at least one duplicate still existing.

    Also, if I were to sort it alphabetically, and use what’s written above, it returns lines like this as duplicates even though they aren’t.

    https://archiveofourown.org/works/113030
    https://archiveofourown.org/works/11303034

    Can someone help me with this problem?

    Thanks in advance.



  • Hello, @t-c and All,

    No, problem, as long as your file have been previously sorted ;-)

    • First add, preferably, an empty line after the last item of your list. Then, if you want :

    • To keep just one item of all the duplicated lines, use : SEARCH (?-s)(^.+\R)\1+ and REPLACE \1

    • To delete all duplicated lines of a file, use : SEARCH (?-s)(^.+\R)\1+ and REPLACE Leave EMPTY

    Best Regards,

    guy038



  • Hi guy038,

    I’m code-illiterate, I just use Notepad++ as an end-user so to speak, and I don’t really understand how to carry out the search you propose in your answer. Obviously it is not the regular Find command, because that would look for the string of characters literally in the text. Could you please add a small reply for dummies so that I can get rid of all my duplicates (and not just rid of one of each pair of duplicates).

    Thank you so much in advance.



  • @Mireia-Dos

    @guy038 just neglected to explicitly say (because the OP knew it already) that you need to set the Search mode to Regular expression before doing the other steps he outlined. Thus, check this button:

    Imgur

    You could also try @Claudia-Frank 's excellent Pythonscript for sorting (it’s optional and I see you don’t want it) and removing duplicate lines; find it here and a discussion about it here. It’s user interface appears thus:

    Imgur



  • It works! This is awesome. Thank you both so much hahaha. I’m so happy. Thanks.

    Oh, and I won’t be using that sorting script because that is so far above my capabilities I wouldn’t know where to start, but I usually sort and delete duplicates (just one duplicate line) with Textpad. Alternatively, I’ve learned today I can do it like this in Notepad++:

    In the TextFX menu, click on TextFx Tools and then check the options

    T: + Sort ascending
    T: + Sort outputs only UNIQUE (at column) lines

    and then executing the “sort” function (first or second topmost options)

    Thanks again!



  • How do I run the script?



  • @T-C

    In general for running Pythonscripts, there is some info on it here. In that thread, see my posting where I start with “Scripting languages aren’t built in; they’re plugins…”.

    For specifics about running the script being discussed in this current thread, there are some hints about what additional is needed here.



  • Is there an easier way for me to weed out duplicates without having to learn to run a script I have no idea how to do?



  • @T-C

    Well, I suppose some other editors support this functionality directly. Or if you know how to program you could write a program to do it. Probably other ways as well…

    I’d say running a prewritten script is pretty easy, and in most cases it is, but the user interface toolkit in this one (Tk) sometimes causes people problems with installation… :-(


Log in to reply