Community
    • Login

    Comparing two txt files. Finding differences.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 4 Posters 9.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner @Rafal Jonca
      last edited by

      @Rafal-Jonca

      Don’t think in terms of “comparing” the files. Although that can be made to work, there is an easier way.

      Try the following:

      Combine the contents of the two files into one file, in the order you’ve shown them (“A” first at the top of the new file, “B” at the bottom of the new file.

      Invoke the Mark… feature (Search menu) and set up the following:

      Find what zone: ([\w-]+\.LAZ)(?s)(?=.*?^\1)
      Mark line checkbox: ticked
      Wrap around checkbox: ticked
      Search mode radio-button: Regular expression

      Press the Mark All button.

      This will highlight in red and will bookmark all of the occurrences of the files that you have already downloaded. It is a simple matter from there to delete the bookmarked lines (Search (menu) -> Bookmark -> Remove Bookmarked Lines) to get the list of URLs yet to download.

      If this (or ANY posting on the Notepad++ Community site) is useful, don’t reply with a “thanks”, simply up-vote ( click the ^ in the ^ 0 v area on the right ).

      Sample of the marking:

      Imgur

      1 Reply Last reply Reply Quote 2
      • Rafal JoncaR
        Rafal Jonca
        last edited by

        Hmm, it works excellent with all sets. But with exception of one set of urls.

        Everytime it goes to this 1013 line, and later, as you see. Then it marks everything red below.

        https://www.sendspace.com/file/3ytn9k

        Scott SumnerS 1 Reply Last reply Reply Quote 0
        • Scott SumnerS
          Scott Sumner @Rafal Jonca
          last edited by

          @Rafal-Jonca

          Hmmm…well, I’m not opposed to using new (to me) hosting sites, but sendspace thinks I’m going to give it a credit card number that it “won’t charge”, so, ah, No, sorry… Suggest putting your file on a different hosting site (e.g. http://textuploader.com/) and I’ll have a look.

          There was some discussion in another thread about this general technique causing all the text in the document to be redmarked, so I guess I’m now starting to question this technique, or at least my usage of it (maybe the regular expression is not restrictive enough).

          1 Reply Last reply Reply Quote 0
          • Rafal JoncaR
            Rafal Jonca
            last edited by

            ??? Sendspace is free for all the people ?

            OK, I know how to use imgur now. It is looking like this:

            https://imgur.com/a/Afvof

            Scott SumnerS 1 Reply Last reply Reply Quote 0
            • Scott SumnerS
              Scott Sumner @Rafal Jonca
              last edited by

              @Rafal-Jonca

              Okay, I guess I did the wrong thing on sendspace…oops. :-)

              I see the redmarking but to diagnose further I think I need the WHOLE file if you can share it as TEXT, not an image…

              1 Reply Last reply Reply Quote 0
              • Rafal JoncaR
                Rafal Jonca
                last edited by

                I think the suffix “000.LAZ” is making my problems :) It is different in 1014 line.

                I will check it carefully and let know.

                1 Reply Last reply Reply Quote 1
                • Rafal JotskiR
                  Rafal Jotski
                  last edited by

                  Yes, I confirm. These urls with 000.LAZ were making problems.

                  Because I was changing later urls to <a href="http_ shapes and I had broken links in these points. As a result all these with 000.LAZ were out.

                  So, your method helped me to find error spots :) It is working excellent now.

                  Could I ask you for detailed explanation how ([\w-]+.LAZ)(?s)(?=.*?^\1) works ?

                  Scott SumnerS 1 Reply Last reply Reply Quote 0
                  • Scott SumnerS
                    Scott Sumner @Rafal Jotski
                    last edited by

                    @Rafal-Jotski

                    Could I ask you for detailed explanation…

                    Sure.

                    Look for any string of one or more word characters (defined as A-Z, a-z, 0-9, or _) or a -, followed by a .LAZ. The wrapping parentheses on this cause it the matching string to be remembered as capture group #1. The (?s) means that any following . characters in the expression can match across line borders (usually a line-border will stop the match possibility). Next comes a partial expression that starts with (?=.*? and ends a bit later with ). This is merely an assertion that what else inside occurs at some point later in the document. In this case what is inside that wrapper is a ^ which means “start of a line”, followed by \1 which is the same text as matched earlier (your xxxx.LAZ).

                    Since what occurs inside the (?= and ) is just an assertion it must match but does not contribute to the match, thus it isn’t colored red.

                    I think this may be fairly easy to understand, but maybe not to write from ground up, and it definitely isn’t easy to describe as per the above. I hope this helps in some way…

                    1 Reply Last reply Reply Quote 1
                    • chcgC
                      chcg
                      last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • chcgC
                        chcg
                        last edited by

                        https://github.com/pnedev/compare-plugin might help you for simple ordered file lists or some other standalone diff programs like kdiff3, winmerge, …

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors