Find matching word between two text file

Frederick Smith

Hi guys, I need some help here.
Let say I’ve 2 text file each with 100’s of lines.
Text file A and B.
I want to match all words in A to B … so any words in A if find in B highlighted.
Example: lat say in file A - have a word: CAR… in file B - have a word: CARPOOL…
It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.
Or … all matching words to be saved to a new file…either would be great.
So file A being the "source… match any/all words from file A to file B (if exist)
I tried Compare - but it’s show the difference… I would need the match.
Thanks for your help in advance.
Frederick

Terry R

Hi @Frederick Smith
your problem was interesting. It was very similar in nature to a solution presented by @guy038, namely:
https://notepad-plus-plus.org/community/topic/16335/multiline-replace-multiple-hosts-in-hostsfile
In that instance the question was how to remove lines when duplicates found. In essence though the search method here works very close to that one.

I’m going to assume that the file A contents is 1 word per line, if not then we need file A in that format (When you copy lines you ONLY want the word which is duplicated, not additional words on the same line). So you would need file A opened first, then put a “—” line at the bottom, make it the last line. Then below it add file B.

Open the Mark function and use the following:
Find What: (?is)^(.+)\R(?=.*---.*\1)
You need search mode set to regular expression (very important) and wrap around ticked. Also tick Bookmark Lines, this will help later.

Have the cursor set at the top left most position of the file, so top of file A contents, otherwise the result will be unpredictable. You will only need to click on the Mark All button once. Any of the file A contents which also appear in file B area (below the — line) will be marked and also the line will be bookmarked (blue circle in the margin). The — line stops attempts to find duplicates in file B area.

Now use the “Search” menu option, select “Bookmark”, then “Copy Bookmarked Lines”. Put the copied lines elsewhere, which is what you requested.

My regex includes the (?is) modifier, s means CRLF (carriage return line feed) character is treated like ALL other characters, i means do an insensitive search. Insensitive means “CAR” would also find “car”, “Car”, “cAr” etc.

I hope this helps, otherwise come back with more info including samples of actual file A and B contents if you can.

Terry

Frederick Smith

Hi Terry,
Thanks a lot for taking the time and responding to my question.
First - you’re correct an your assumption.
ALMOST THERE…
First I tried, didn’t work, - then looking at your function code - realized it calls for: “—” (3) not “-” , so once I changed that it WORKED!
With one exception!
The only one thing is that it Marks the file A part - not file B part -
(and I would need file B part to be marked)

I tried flipping around the files., but that didn’t work.

This is not a real files…just a sample to illustrate…

This is file A:
car
apple
beach
hello
down
sun
question

This is file B:
city
whatever
carpool
san
beachcity
cornel
downpillow

I opened FileA - and made to this:
car
apple
beach
hello
down
sun
question

city
whatever
carpool
san
beachcity
cornel
downpillow

So,instead mark: car, beach down
Would need mark: carpool, beachcity, downpillow
So “car” would be highlighted in: “carpool”

So how to change the “Mark” function to do that result?

Thanks again Terry!

guy038

Hello, @frederick-Smith; @terry-r and All,

Of course, with your additional information, it becomes easier to point out the suitable regex ! I hope that Terry won’t mind if I reply to you, first ;-))

Actually, you have two files : File_A which contains a list of strings, which, possibly, are subsets of some words contained in the File_B list !

Then, we’re going to reverse the logic :

First, in a new N++ tab, copy/paste the File_B.txt contents
Add the single line ---
Then, under this line, insert the File_A contents
Open the Mark dialog
Use the regex search :

(?si)(.+)(?=.*^---\R.*^\1$)

Preferably, tick the Purge for each search option
Click on the Mark All button

So, given File_B contents, below :

city
whatever
carpool
san
beachcity
cornel
downpillow

and File_A contents, below :

car
cornel
apple
beach
hello
ever
down
sun
it
question

Just note that I added 3 words ever, cornel and it, in order to show that “subset-words” can be marked, also, in middle or at end of the whole word or that the entire word can be highlighted !

Now, we add, in a new tab, the following text :

city
whatever
carpool
san
beachcity
cornel
downpillow
---
car
cornel
apple
beach
hello
ever
down
sun
it
question

Finally, using the Mark dialog and the regex (?si)(.+)(?=.*^---\R.*^\1$), it should higlight the bold words, below :-))

city
whatever
carpool
san
beachcity
cornel
downpillow

Notes :

As usual, the (?si) modifiers mean an insensitive to case search and that any dot ( . ) will match any single character ( Standard and EOL )
Then, the main part (.+) try to match the longest, non-null, amount of characters, even in several lines, stored as group 1, but ONLY IF the positive look-around (?=.*^---\R.*^\1$) is TRUE. That is to say, IF it detects :
- A range of any character, possibly empty, .* ,
- followed with a line with, only, 3 dashes and its line-break, ^---\R ,
- followed, again, with the longest range, possibly null, of any character, .* ,
- and ended with the contents of group 1, alone on its line, ^\1$

Remark : if you prefer a sensitive to case search, simply use the first part (?s-i), instead !

Cheers,

guy038

Terry R

@Frederick-Smith said:

It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.

I interpreted that as being the word in file A being highlighted, so what you really meant was the letters CAR in carpool would be highlighted as CAR also existed in file A. Sorry about that and the confusion over the 3 “-”, sometimes characters don’t show well, it’s the interpreter (behind the compose window) that causes most of the issues. As @guy038 has given you another solution to fit your requirements I’ll let it be.

Be sure to come back if anyone that elaborate, or help further.

Terry

Frederick Smith

Hi @terry-r, @guy038 and All

First I want to thank you both: @terry-r and @guy038 - for taking your time and giving me help.

Both solution works - maybe a bit different - but both gives the good results what I was looking for.

Let me say, how much I appreciate the community. Thanks you!

Thanks again guys!

Find matching word between two text file

I opened FileA - and made to this: car apple beach hello down sun question

I opened FileA - and made to this:
car
apple
beach
hello
down
sun
question