Community
    • Login

    I want to compare two files and bookmark the lines containing similar words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 2.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Bader AlharbiB
      Bader Alharbi
      last edited by

      Hi,
      I want to compare two files and bookmark the lines containing similar words, for example:
      file1.txt

      Ahmed:12321
      Ali:22432
      Khalid:567643

      file2.txt

      Ahmed
      Ali

      I found a method that could be used here but the lines have to be identical for it to work.
      Basically, you should go to the bottom of file1 and put ##### then paste the contents of file two and press ctrl M and use this regex (?-s)^(.+\R)(?=(?s).#####.?\1) with the search mode regular expressions and “bookmark line” box checked then clicking mark all.
      If you have knowledge in regular expressions please help me to make it exclude whatever after the : and only compare whatever is before it to file2 contents.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Bader Alharbi
        last edited by PeterJones

        @Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

        (?-s)^(.+\R)(?=(?s).*#####.*?\1)

        Change the (.+\R) to (.+?:).*?\R – everything else should stay the same

        d4506fcd-8353-4bb0-a101-9d10ea1c64ac-image.png

        Bader AlharbiB 1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello, @bader-alharbi, @peterjones and All;

          Here is a general solution which marks every word of File 1 ONLY IF this specific word is also present in File 2 :

          MARK (?s-i)\b(\w+)\b(?=.+#####.+?\b\1\b)

          So, for instance, from this initial text :

          Ahmed:12321
          12345,56789
          Ali:22432
          Khalid:567643
          Alone sentence
          Queen Elisabeth
          This is a 789 test
          ali
          Mary Thompson
          #####
          Ahmed
          Mary	789
          Elisabeth
          567643
          test,a is:This
          

          You would obtain, after the Mark process :

          Ahmed:12321
          12345,56789
          Ali:22432
          Khalid:567643
          Alone sentence
          Queen Elisabeth
          This is a 789 test
          ali
          Mary Thompson
          #####
          Ahmed
          Mary 789
          Elisabeth
          567643
          test,a is:This


          Notes :

          • The words in File 2 can be in any order ;-)) I could have used :
          #####
          Mary	789 Ahmed
          Elisabeth
          567643
          test
          a
          is!This
          

          or even :

          #####
          Mary,a,789,is,Ahmed,Elisabeth,This,567643,test
          

          • The present search is sensitive to case. If you prefer to search identical words, whatever their case, change the beginning of the regex from (?s-i) to (?si)

          • By default, the part \b(\w+)\b looks for the greatest range of word characters, between 2 non-word chars. And a word character represents any single letter, accentuated letter, digit or the _ character. If you want to modify or add other characters to be considered as words, just go to Settings > Preferences... > Delimiter > Word character list

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 2
          • Bader AlharbiB
            Bader Alharbi
            last edited by

            Hi, @guy038
            That worked perfectly for me. Thanks a lot.
            I still have one more question, is there a regex to add the line number in multiple places in the line?
            For example, if I want to use it like this
            mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
            mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt
            mkvmerge -o “line number”.mkv “line number”.mp4 “line number”.srt

            I’m doing it now using the column editor but I’d like to use it in a macro and apply it to different files with a different number of lines.

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • Bader AlharbiB
              Bader Alharbi @PeterJones
              last edited by

              @PeterJones
              I tried the one you posted, and it works too. Thank you.
              Please let me know if you can help me with the other question I mentioned in my earlier reply? I would appreciate it.

              1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @Bader Alharbi
                last edited by

                @Bader-Alharbi said in I want to compare two files and bookmark the lines containing similar words:

                is there a regex to add the line number in multiple places in the line?

                Regular expressions cannot count (they have no concept of “increment a number”). Your two options inside Notepad++ are using the Column Editor like you’ve already discovered, or using a scripting plugin like PythonScript and using the full power of a programming language to influence the text in the open file. (I actually just answered a question earlier today on that same concept.)

                apply it to different files with a different number of lines.

                as linked in that other topic (and the links refenced there), you can make a macro that will do the begin/end-select for column mode… and if you combined that with other controls, like the Ctrl+Home to go to the start of the file, and Ctrl+End to go to the end, you could have a macro that does the zeroeth-column select in the macro, then manually type Alt+C to bring up the column editor and insert the numbers, then you could use another regex (using multiple capture groups) to distribute the number from the beginning of the line to the various locations throughout the line that you need

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors