How to: Delete all lines in a .txt-document that occur in another .txt-document
-
Hello,
dunno if this can be done easily but I will try to ask anyway:
I have a list of words in a foreign language and there might be a few English words
in there as well…these I want to delete.
My idea was: I have already found a list of 5000 most common English words.
So now my question: how can I delete those entries/lines in my list that also
occur in the English language list?To make it clearer:
i have a list called “foreign_language_vocab.txt” and a list “English_vocab.txt”.
Now i want to delete all lines in “foriegn_language_vocab.txt” that have also occur in “English_vocab.txt”. Thank you for the help!Best,
Iskandar -
Is there always one word per line in both files? If yes, a while ago I wrote a script for the NppExec plugin that does exactly what you want.
Which version of Notepad++ do you use? If it is a version prior to v7.6 you can install the NppExec plugin using Plugin Manager. If you use v7.6 you can use new build in Plugin Admin.
When you managed to install the plugin come back to obtain further instructions.
-
Hello, @iskandar-the-pupsi, @dinkumoil and All,
Nothing is impossible with regular expressions ;-))
So, in a new N++ tab (
Ctrl + N
) :-
Copy all the contents of the
foreign_language_vocab.txt
file -
Add a line of, at least,
3
tildes characters (~~~
) -
Copy all the contents of the
English_vocab.txt
file
Here is, below, an example, with a mix of French and English-American words, in the first part
# foreign_language_vocab.txt table church poisson girl couteau maison orange town world day école garçon car lit plate voiture star ~~~~~~~~~~~~~~~~~~~~ # English_vocab.txt table man church girl knife town fork world country car house plate road light hammer box paper book vegetable orange castle forest wood bed desk water glass cat farm
Now :
-
Open the Replace dialog
-
SEARCH
(?-s)^(.+)\R(?s)(?=.+^\1$)|~~~.+
-
REPLACE
Leave EMPTY
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the Replace All button
Et voilà ;-)) You get the expected result, below :
# foreign_language_vocab.txt poisson couteau maison day école garçon lit voiture star
Remarks :
-
Data, in the two parts does not need to be sorted, first !
-
If a word has the same spelling in the two languages, it is removed ! ( case of words “
table
” and “orange
” ) -
If a foreign word is not part of the English_vocab.txt file , it is not removed ( case of the remaining words “
day
” and “star
” in the foreign_language_vocab.txt file )
Best Regards
guy038
-
-
@guy038 said:
Nothing is impossible with regular expressions
There should be a qualifier: …unless your regular expression happens to select all the text in your document. :-)
-
Hi, @scott-sumner and All,
Note that I did not tell "Nothing is impossible with N++ regular expressions " ;-))
Cheers,
guy038