I want to compare two files and bookmark the lines containing similar words



  • Hi,
    I want to compare two files and bookmark the lines containing similar words, for example:
    file1.txt

    Ahmed:12321
    Ali:22432
    Khalid:567643

    file2.txt

    Ahmed
    Ali

    I found a method that could be used here but the lines have to be identical for it to work.
    Basically, you should go to the bottom of file1 and put ##### then paste the contents of file two and press ctrl M and use this regex (?-s)^(.+\R)(?=(?s).#####.?\1) with the search mode regular expressions and “bookmark line” box checked then clicking mark all.
    If you have knowledge in regular expressions please help me to make it exclude whatever after the : and only compare whatever is before it to file2 contents.



  • @Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

    (?-s)^(.+\R)(?=(?s).*#####.*?\1)

    Change the (.+\R) to (.+?:).*?\R – everything else should stay the same

    d4506fcd-8353-4bb0-a101-9d10ea1c64ac-image.png



  • Hello, @bader-alharbi, @peterjones and All;

    Here is a general solution which marks every word of File 1 ONLY IF this specific word is also present in File 2 :

    MARK (?s-i)\b(\w+)\b(?=.+#####.+?\b\1\b)

    So, for instance, from this initial text :

    Ahmed:12321
    12345,56789
    Ali:22432
    Khalid:567643
    Alone sentence
    Queen Elisabeth
    This is a 789 test
    ali
    Mary Thompson
    #####
    Ahmed
    Mary	789
    Elisabeth
    567643
    test,a is:This
    

    You would obtain, after the Mark process :

    Ahmed:12321
    12345,56789
    Ali:22432
    Khalid:567643
    Alone sentence
    Queen Elisabeth
    This is a 789 test
    ali
    Mary Thompson
    #####
    Ahmed
    Mary 789
    Elisabeth
    567643
    test,a is:This


    Notes :

    • The words in File 2 can be in any order ;-)) I could have used :
    #####
    Mary	789 Ahmed
    Elisabeth
    567643
    test
    a
    is!This
    

    or even :

    #####
    Mary,a,789,is,Ahmed,Elisabeth,This,567643,test
    

    • The present search is sensitive to case. If you prefer to search identical words, whatever their case, change the beginning of the regex from (?s-i) to (?si)

    • By default, the part \b(\w+)\b looks for the greatest range of word characters, between 2 non-word chars. And a word character represents any single letter, accentuated letter, digit or the _ character. If you want to modify or add other characters to be considered as words, just go to Settings > Preferences... > Delimiter > Word character list

    Best Regards,

    guy038



  • Hi, @guy038
    That worked perfectly for me. Thanks a lot.
    I still have one more question, is there a regex to add the line number in multiple places in the line?
    For example, if I want to use it like this
    mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
    mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
    mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt

    I’m doing it now using the column editor but I’d like to use it in a macro and apply it to different files with a different number of lines.



  • @PeterJones
    I tried the one you posted, and it works too. Thank you.
    Please let me know if you can help me with the other question I mentioned in my earlier reply? I would appreciate it.



  • @Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

    is there a regex to add the line number in multiple places in the line?

    Regular expressions cannot count (they have no concept of “increment a number”). Your two options inside Notepad++ are using the Column Editor like you’ve already discovered, or using a scripting plugin like PythonScript and using the full power of a programming language to influence the text in the open file. (I actually just answered a question earlier today on that same concept.)

    apply it to different files with a different number of lines.

    as linked in that other topic (and the links refenced there), you can make a macro that will do the begin/end-select for column mode… and if you combined that with other controls, like the Ctrl+Home to go to the start of the file, and Ctrl+End to go to the end, you could have a macro that does the zeroeth-column select in the macro, then manually type Alt+C to bring up the column editor and insert the numbers, then you could use another regex (using multiple capture groups) to distribute the number from the beginning of the line to the various locations throughout the line that you need


Log in to reply