How to remove the content from file1 or list1 (e.g. 4000 usernames) from file2 or list2 (e.g. 50000 usernames)

Armin Irger

Hello,

how can the content from file1 or list1 (e.g. 4000 usernames) from file2 or list2 (e.g. 50000 usernames).

file1 or list1 looks like:
user3
user9
user10
user2
user30

file2 or list2 looks like:
[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user10;user300;user33;user9;user2;user35;user290;user30;user158;user3;user893;user10;

firstly i tried to modify file1 with
Find what: \r\n
Replace with: -
Search Mode: Regular expression

Then file1 looks like:
user3-user9-user10-user2-user30

Then change it with:
Find what: -
Replace with ;)|(
Search mode: Normal

Then file1 looks like:
user3;)|(user9;)|(user10;)|(user2;)|(user30

Then i add at the start a “(” and at the end a “;)”

Then file1 looks like:
(user3;)|(user9;)|(user10;)|(user2;)|(user30;)

I open file2 in notepad++ and copy’n’paste the file1 content to the “Find what:” field.
Find what: (user3;)|(user9;)|(user10;)|(user2;)|(user30)
Replace with: <empty>
Search mode: Normal

The file2 looks like
[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user300;user33;user35;user290;user158;user893;user10;

Is there another possibilty to solve this?

Best regards
Armin

guy038

Hello, @armir-irger and All,

Indeed, there is an other way to manage this task, using just one regex S/R :-))

As usual, though obvious, just work on copies of your two files File1 and File2 !

In order to simulate a real case, I simply split your set of members, in File2 in two parts, with some “rubbish” text, between !

As regarding your list of members, in File1, I suppose that each string user####, which must disappear from File2 :

Begins a line, or is preceded with a semicolon ( ; )
Ends a line or is followed with a semicolon ( ; )

Note that the File1 contents must be added at the end of File2, after a separation line, not used in File2. I personally chose the ### string but any string with, preferably, non-regex characters would be OK !

So, following your example, we get the initial sample text, below :

[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user10;user300;user33;user9;

bla
bla blah

blaaah
bla bla blah

[role]
rid=1000
rdesc=Members
user=user2;user35;user290;user30;user158;user3;user893;user10;
bla blah

blaaah

###

user3
user9;user10;user2
user30

Now, we run the following regex S/R :

SEARCH (user\d+);(?=(?s).*###.*(^|;)\1(;|\R))|(?s)###.+

REPLACE Leave EMPTY

And we get the expected text, below ( the user#### members, present in File1 are removed from File2 as well as all the File1 contents, which have been temporarily added ) Voilà !

[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user300;user33;

bla
bla blah

blaaah
bla bla blah

[role]
rid=1000
rdesc=Members
user=user35;user290;user158;user893;
bla blah

blaaah

Best Regards,

guy038

P.S. :

Note that, in File2, the last user10 member is also removed, because it was already present, twice, in File2 !

Alan Kilborn

This seems a common request. Maybe it is time to script it?

Scott Sumner

This question comes up rather frequently, and it is always tackled with a regular-expression solution. This is okay… but sometimes people have trouble with that, so how about this time we throw down a Pythonscript solution? [Thanks, Alan, for the hint…]

A variant of this question is “How do I replace a list of words (and corresponding replacement values) in one document and have the replace act upon another document?”. The question in this thread is just a special case of that: Deleting is simply where a replacement value is zero length.

So I propose that the word list should have the following form, and be present in the clipboard when the script is run, with the data file to be operated on in the active Notepad++ editor tab:

DELIMITERsearhtextDELIMITERreplacementtext
where replacementtext can be empty in order to do a deletion

Here’s an example:

Text to be copied to clipboard prior to running the script:

:silver:golden
@silently@sqwalkingly
.sqwalkingly .

I purposefully used a different delimiter character on each line to show that that is possible…hmmm, maybe this just confuses things? Oh, well…

Text to be operated on, all by itself in a fresh editor tab:

Six silver swans swam silently seaward

After running the script with that editor tab active, its text should be changed to:

Six golden swans swam seaward

Note that in the second line of the list, silently is changed to sqwalkingly. But…in the third line, sqwalkingly followed by a space is deleted (no text follows the final delimiter, meaning change the search text to nothing, i.e., delete it).

Hopefully the reader can follow the progression of replacements in this case.

So…to perform the OP’s original deletion of data, one would create the word list as follows and copy it to the clipboard:

!user3;!
!user9;!
!user10;!
!user2;!
!user30;!

Remember, the format is: DELIMITERsearhtextDELIMITERreplacementtext where replacementtext can be empty in order to do a deletion (as we are doing here). This time I have arbitrarily used the ! character as the delimiter, and I was consistent about it in each line, as one usually would be.

Then, running the script in the file of the original data:

[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user10;user300;user33;user9;user2;user35;user290;user30;user158;user3;user893;user10;

One obtains, with the desired users eliminated:

[role]
rid=2000
rdesc=Members
user=user100;user70;user4030;user300;user33;user35;user290;user158;user893;

Here’s the script code for ReplaceUsingListInClipboard.py:

def RULIC__main():
    if not editor.canPaste(): return
    cp = editor.getCurrentPos()
    editor.setSelection(cp, cp)  # cancel any active selection(s)
    doc_orig_len = editor.getTextLength()
    editor.paste()  # paste so we can get easy access to the clipboard text
    cp = editor.getCurrentPos()  # this has moved because of the paste
    clipboard_lines_list = editor.getTextRange(cp - editor.getTextLength() + doc_orig_len, cp).splitlines()
    editor.undo()  # revert the paste action, but sadly, this puts it in the undo buffer...so it can be redone
    editor.beginUndoAction()
    for line in clipboard_lines_list:
        try: (search_text, replace_text) = line.rstrip('\n\r')[1:].split(line[0])
        except (ValueError, IndexError): continue
        editor.replace(search_text, replace_text)
    editor.endUndoAction()

RULIC__main()