Find matching word between two text file
-
Hi guys, I need some help here.
Let say I’ve 2 text file each with 100’s of lines.
Text file A and B.
I want to match all words in A to B … so any words in A if find in B highlighted.
Example: lat say in file A - have a word: CAR… in file B - have a word: CARPOOL…
It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.
Or … all matching words to be saved to a new file…either would be great.
So file A being the "source… match any/all words from file A to file B (if exist)
I tried Compare - but it’s show the difference… I would need the match.
Thanks for your help in advance.
Frederick -
Hi @Frederick Smith
your problem was interesting. It was very similar in nature to a solution presented by @guy038, namely:
https://notepad-plus-plus.org/community/topic/16335/multiline-replace-multiple-hosts-in-hostsfile
In that instance the question was how to remove lines when duplicates found. In essence though the search method here works very close to that one.I’m going to assume that the file A contents is 1 word per line, if not then we need file A in that format (When you copy lines you ONLY want the word which is duplicated, not additional words on the same line). So you would need file A opened first, then put a “—” line at the bottom, make it the last line. Then below it add file B.
Open the Mark function and use the following:
Find What:(?is)^(.+)\R(?=.*---.*\1)
You need search mode set toregular expression
(very important) and wrap around ticked. Also tick Bookmark Lines, this will help later.Have the cursor set at the top left most position of the file, so top of file A contents, otherwise the result will be unpredictable. You will only need to click on the Mark All button once. Any of the file A contents which also appear in file B area (below the — line) will be marked and also the line will be bookmarked (blue circle in the margin). The — line stops attempts to find duplicates in file B area.
Now use the “Search” menu option, select “Bookmark”, then “Copy Bookmarked Lines”. Put the copied lines elsewhere, which is what you requested.
My regex includes the (?is) modifier, s means CRLF (carriage return line feed) character is treated like ALL other characters, i means do an insensitive search. Insensitive means “CAR” would also find “car”, “Car”, “cAr” etc.
I hope this helps, otherwise come back with more info including samples of actual file A and B contents if you can.
Terry
-
Hi Terry,
Thanks a lot for taking the time and responding to my question.
First - you’re correct an your assumption.
ALMOST THERE…
First I tried, didn’t work, - then looking at your function code - realized it calls for: “—” (3) not “-” , so once I changed that it WORKED!
With one exception!
The only one thing is that it Marks the file A part - not file B part -
(and I would need file B part to be marked)- I tried flipping around the files., but that didn’t work.
This is not a real files…just a sample to illustrate…
This is file A:
car
apple
beach
hello
down
sun
questionThis is file B:
city
whatever
carpool
san
beachcity
cornel
downpillowI opened FileA - and made to this:
car
apple
beach
hello
down
sun
questioncity
whatever
carpool
san
beachcity
cornel
downpillowSo,instead mark: car, beach down
Would need mark: carpool, beachcity, downpillow
So “car” would be highlighted in: “carpool”So how to change the “Mark” function to do that result?
Thanks again Terry!
-
Hello, @frederick-Smith; @terry-r and All,
Of course, with your additional information, it becomes easier to point out the suitable regex ! I hope that Terry won’t mind if I reply to you, first ;-))
Actually, you have two files :
File_A
which contains a list of strings, which, possibly, are subsets of some words contained in theFile_B
list !Then, we’re going to reverse the logic :
-
First, in a new N++ tab, copy/paste the
File_B.txt
contents -
Add the single line
---
-
Then, under this line, insert the
File_A
contents -
Open the Mark dialog
-
Use the regex search :
(?si)(.+)(?=.*^---\R.*^\1$)
-
Preferably, tick the
Purge for each search
option -
Click on the
Mark All
button
So, given
File_B
contents, below :city whatever carpool san beachcity cornel downpillow
and
File_A
contents, below :car cornel apple beach hello ever down sun it question
Just note that I added
3
words ever, cornel and it, in order to show that “subset-words” can be marked, also, in middle or at end of the whole word or that the entire word can be highlighted !Now, we add, in a new tab, the following text :
city whatever carpool san beachcity cornel downpillow --- car cornel apple beach hello ever down sun it question
Finally, using the Mark dialog and the regex
(?si)(.+)(?=.*^---\R.*^\1$)
, it should higlight the bold words, below :-))city
whatever
carpool
san
beachcity
cornel
downpillowNotes :
-
As usual, the
(?si)
modifiers mean an insensitive to case search and that any dot (.
) will match any single character ( Standard and EOL ) -
Then, the main part
(.+)
try to match the longest, non-null, amount of characters, even in several lines, stored as group1
, but ONLY IF the positive look-around(?=.*^---\R.*^\1$)
is TRUE. That is to say, IF it detects :-
A range of any character, possibly empty,
.*
, -
followed with a line with, only,
3
dashes and its line-break,^---\R
, -
followed, again, with the longest range, possibly null, of any character,
.*
, -
and ended with the contents of group
1
, alone on its line,^\1$
-
Remark : if you prefer a sensitive to case search, simply use the first part
(?s-i)
, instead !Cheers,
guy038
-
-
@Frederick-Smith said:
It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.
I interpreted that as being the word in file A being highlighted, so what you really meant was the letters CAR in carpool would be highlighted as CAR also existed in file A. Sorry about that and the confusion over the 3 “-”, sometimes characters don’t show well, it’s the interpreter (behind the compose window) that causes most of the issues. As @guy038 has given you another solution to fit your requirements I’ll let it be.
Be sure to come back if anyone that elaborate, or help further.
Terry
-