I want to compare two files and bookmark the lines containing similar words

Bader Alharbi · Sep 25, 2021, 3:29 AM

Hi,
I want to compare two files and bookmark the lines containing similar words, for example:
file1.txt

Ahmed:12321
Ali:22432
Khalid:567643

file2.txt

Ahmed
Ali

I found a method that could be used here but the lines have to be identical for it to work.
Basically, you should go to the bottom of file1 and put ##### then paste the contents of file two and press ctrl M and use this regex (?-s)^(.+\R)(?=(?s).#####.?\1) with the search mode regular expressions and “bookmark line” box checked then clicking mark all.
If you have knowledge in regular expressions please help me to make it exclude whatever after the : and only compare whatever is before it to file2 contents.

PeterJones · Sep 25, 2021, 4:58 PM

@Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

(?-s)^(.+\R)(?=(?s).*#####.*?\1)

Change the (.+\R) to (.+?:).*?\R – everything else should stay the same

guy038 · Sep 25, 2021, 7:11 PM

Hello, @bader-alharbi, @peterjones and All;

Here is a general solution which marks every word of File 1 ONLY IF this specific word is also present in File 2 :

MARK (?s-i)\b(\w+)\b(?=.+#####.+?\b\1\b)

So, for instance, from this initial text :

Ahmed:12321
12345,56789
Ali:22432
Khalid:567643
Alone sentence
Queen Elisabeth
This is a 789 test
ali
Mary Thompson
#####
Ahmed
Mary	789
Elisabeth
567643
test,a is:This

You would obtain, after the Mark process :

Ahmed:12321
12345,56789
Ali:22432
Khalid:567643
Alone sentence
Queen Elisabeth
This is a 789 test
ali
Mary Thompson
#####
Ahmed
Mary 789
Elisabeth
567643
test,a is:This

Notes :

The words in File 2 can be in any order ;-)) I could have used :

#####
Mary	789 Ahmed
Elisabeth
567643
test
a
is!This

or even :

#####
Mary,a,789,is,Ahmed,Elisabeth,This,567643,test

The present search is sensitive to case. If you prefer to search identical words, whatever their case, change the beginning of the regex from (?s-i) to (?si)
By default, the part \b(\w+)\b looks for the greatest range of word characters, between 2 non-word chars. And a word character represents any single letter, accentuated letter, digit or the _ character. If you want to modify or add other characters to be considered as words, just go to Settings > Preferences... > Delimiter > Word character list

Best Regards,

guy038

Bader Alharbi · Sep 25, 2021, 10:56 PM

Hi, @guy038
That worked perfectly for me. Thanks a lot.
I still have one more question, is there a regex to add the line number in multiple places in the line?
For example, if I want to use it like this
mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt

I’m doing it now using the column editor but I’d like to use it in a macro and apply it to different files with a different number of lines.

Bader Alharbi · Sep 25, 2021, 11:08 PM

@PeterJones
I tried the one you posted, and it works too. Thank you.
Please let me know if you can help me with the other question I mentioned in my earlier reply? I would appreciate it.

PeterJones · Sep 25, 2021, 11:21 PM

@Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

is there a regex to add the line number in multiple places in the line?

Regular expressions cannot count (they have no concept of “increment a number”). Your two options inside Notepad++ are using the Column Editor like you’ve already discovered, or using a scripting plugin like PythonScript and using the full power of a programming language to influence the text in the open file. (I actually just answered a question earlier today on that same concept.)

apply it to different files with a different number of lines.

as linked in that other topic (and the links refenced there), you can make a macro that will do the begin/end-select for column mode… and if you combined that with other controls, like the Ctrl+Home to go to the start of the file, and Ctrl+End to go to the end, you could have a macro that does the zeroeth-column select in the macro, then manually type Alt+C to bring up the column editor and insert the numbers, then you could use another regex (using multiple capture groups) to distribute the number from the beginning of the line to the various locations throughout the line that you need