How to remove the content from file1 or list1 (e.g. 4000 usernames) from file2 or list2 (e.g. 50000 usernames)



  • Hello,

    how can the content from file1 or list1 (e.g. 4000 usernames) from file2 or list2 (e.g. 50000 usernames).

    file1 or list1 looks like:
    user3
    user9
    user10
    user2
    user30

    file2 or list2 looks like:
    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user10;user300;user33;user9;user2;user35;user290;user30;user158;user3;user893;user10;

    firstly i tried to modify file1 with
    Find what: \r\n
    Replace with: -
    Search Mode: Regular expression

    Then file1 looks like:
    user3-user9-user10-user2-user30

    Then change it with:
    Find what: -
    Replace with ;)|(
    Search mode: Normal

    Then file1 looks like:
    user3;)|(user9;)|(user10;)|(user2;)|(user30

    Then i add at the start a “(” and at the end a “;)”

    Then file1 looks like:
    (user3;)|(user9;)|(user10;)|(user2;)|(user30;)

    I open file2 in notepad++ and copy’n’paste the file1 content to the “Find what:” field.
    Find what: (user3;)|(user9;)|(user10;)|(user2;)|(user30)
    Replace with: <empty>
    Search mode: Normal

    The file2 looks like
    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user300;user33;user35;user290;user158;user893;user10;

    Is there another possibilty to solve this?

    Best regards
    Armin



  • Hello, @armir-irger and All,

    Indeed, there is an other way to manage this task, using just one regex S/R :-))

    As usual, though obvious, just work on copies of your two files File1 and File2 !

    In order to simulate a real case, I simply split your set of members, in File2 in two parts, with some “rubbish” text, between !

    As regarding your list of members, in File1, I suppose that each string user####, which must disappear from File2 :

    • Begins a line, or is preceded with a semicolon ( ; )

    • Ends a line or is followed with a semicolon ( ; )

    Note that the File1 contents must be added at the end of File2, after a separation line, not used in File2. I personally chose the ### string but any string with, preferably, non-regex characters would be OK !

    So, following your example, we get the initial sample text, below :

    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user10;user300;user33;user9;
    
    bla
    bla blah
    
    blaaah
    bla bla blah
    
    [role]
    rid=1000
    rdesc=Members
    user=user2;user35;user290;user30;user158;user3;user893;user10;
    bla blah
    
    blaaah
    
    ###
    
    user3
    user9;user10;user2
    user30
    

    Now, we run the following regex S/R :

    SEARCH (user\d+);(?=(?s).*###.*(^|;)\1(;|\R))|(?s)###.+

    REPLACE Leave EMPTY

    And we get the expected text, below ( the user#### members, present in File1 are removed from File2 as well as all the File1 contents, which have been temporarily added ) Voilà !

    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user300;user33;
    
    bla
    bla blah
    
    blaaah
    bla bla blah
    
    [role]
    rid=1000
    rdesc=Members
    user=user35;user290;user158;user893;
    bla blah
    
    blaaah
    

    Best Regards,

    guy038

    P.S. :

    Note that, in File2, the last user10 member is also removed, because it was already present, twice, in File2 !



  • This seems a common request. Maybe it is time to script it?



  • This question comes up rather frequently, and it is always tackled with a regular-expression solution. This is okay… but sometimes people have trouble with that, so how about this time we throw down a Pythonscript solution? [Thanks, Alan, for the hint…]

    A variant of this question is “How do I replace a list of words (and corresponding replacement values) in one document and have the replace act upon another document?”. The question in this thread is just a special case of that: Deleting is simply where a replacement value is zero length.

    So I propose that the word list should have the following form, and be present in the clipboard when the script is run, with the data file to be operated on in the active Notepad++ editor tab:

    DELIMITERsearhtextDELIMITERreplacementtext
    where replacementtext can be empty in order to do a deletion

    Here’s an example:

    Text to be copied to clipboard prior to running the script:

    :silver:golden
    @silently@sqwalkingly
    .sqwalkingly .
    

    I purposefully used a different delimiter character on each line to show that that is possible…hmmm, maybe this just confuses things? Oh, well…

    Text to be operated on, all by itself in a fresh editor tab:

    Six silver swans swam silently seaward

    After running the script with that editor tab active, its text should be changed to:

    Six golden swans swam seaward

    Note that in the second line of the list, silently is changed to sqwalkingly. But…in the third line, sqwalkingly followed by a space is deleted (no text follows the final delimiter, meaning change the search text to nothing, i.e., delete it).

    Hopefully the reader can follow the progression of replacements in this case.

    So…to perform the OP’s original deletion of data, one would create the word list as follows and copy it to the clipboard:

    !user3;!
    !user9;!
    !user10;!
    !user2;!
    !user30;!
    

    Remember, the format is: DELIMITERsearhtextDELIMITERreplacementtext where replacementtext can be empty in order to do a deletion (as we are doing here). This time I have arbitrarily used the ! character as the delimiter, and I was consistent about it in each line, as one usually would be.

    Then, running the script in the file of the original data:

    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user10;user300;user33;user9;user2;user35;user290;user30;user158;user3;user893;user10;
    

    One obtains, with the desired users eliminated:

    [role]
    rid=2000
    rdesc=Members
    user=user100;user70;user4030;user300;user33;user35;user290;user158;user893;
    

    Here’s the script code for ReplaceUsingListInClipboard.py:

    def RULIC__main():
        if not editor.canPaste(): return
        cp = editor.getCurrentPos()
        editor.setSelection(cp, cp)  # cancel any active selection(s)
        doc_orig_len = editor.getTextLength()
        editor.paste()  # paste so we can get easy access to the clipboard text
        cp = editor.getCurrentPos()  # this has moved because of the paste
        clipboard_lines_list = editor.getTextRange(cp - editor.getTextLength() + doc_orig_len, cp).splitlines()
        editor.undo()  # revert the paste action, but sadly, this puts it in the undo buffer...so it can be redone
        editor.beginUndoAction()
        for line in clipboard_lines_list:
            try: (search_text, replace_text) = line.rstrip('\n\r')[1:].split(line[0])
            except (ValueError, IndexError): continue
            editor.replace(search_text, replace_text)
        editor.endUndoAction()
    
    RULIC__main()
    

Log in to reply