Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Merge 2 Files - Lines Containing Same

    Help wanted · · · – – – · · ·
    2
    12
    80
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Shesh Nioice
      Shesh Nioice last edited by

      Hello need to merge 2 different textfiles and lines containing the same should be in one line

      Example

      Text File Nr 1

      Test0 000027e23c421aeec283c0b491adb97a
      Test1 0000660f57cad07bc2d56a752ab1b051
      Test2 0000f78a8b2c0a5d71c1f651f7959bab
      Test3 0001034369cf3f54df7b537b9459d978

      Text File Nr 2
      4424 000027e23c421aeec283c0b491adb97a
      5678 0000660f57cad07bc2d56a752ab1b051
      9101 0000f78a8b2c0a5d71c1f651f7959bab
      3442 0001034369cf3f54df7b537b9459d978

      Result Should be


      Test0 000027e23c421aeec283c0b491adb97a 4424 (If more matches ->here all other text)
      Test1 0000660f57cad07bc2d56a752ab1b051 5678
      Test2 0000f78a8b2c0a5d71c1f651f7959bab 9101
      Test3 0001034369cf3f54df7b537b9459d978 3442


      it should be match the Hash wich is always 32 lengh.

      Thank you for your help

      1 Reply Last reply Reply Quote 0
      • Terry R
        Terry R last edited by

        @Shesh-Nioice said in Merge 2 Files - Lines Containing Same:

        merge 2 different textfiles and lines containing the same should be in one line

        If you were able to change the order of each line such that the data was first followed by the “Test0” and “4424” portions, like this:
        000027e23c421aeec283c0b491adb97a Test0
        then the data could be merged and then sorted. The same data would then appear together on consecutive lines. Then it is a simple matter to process them into the format you wish. However if the lines neeed to remain in the same order (test0, test1, test2 etc), then it is still possible but it will be require more steps.

        Terry

        Shesh Nioice 1 Reply Last reply Reply Quote 1
        • Terry R
          Terry R last edited by Terry R

          @Terry-R said in Merge 2 Files - Lines Containing Same:

          However if the lines neeed to remain in the same order

          Actually, just thinking a bit more. If the order of the lines once processed was required to be in the same order as the original “Text File Nr 1” and this was already in an alphabetical order (as represented by Test0, Test1, Test2, etc) then no more steps would be necessary as the final formatting of the data (putting “Test0”, “Test1” etc back at the start of the line) would allow for another final sort returning the lines to the correct order.

          So:

          1. is the final order of lines important or not?
          2. is the original order in Text File Nr 1 already in order (alphabetical, numerical?)?

          Terry

          1 Reply Last reply Reply Quote 2
          • Shesh Nioice
            Shesh Nioice @Terry R last edited by

            @Terry-R Hey Terry thank you for your reply

            The format can be changed to:
            000027e23c421aeec283c0b491adb97a Test0
            for both files this shouldnt be a problem.

            The final order of lines are not important and the Text in the files is not sorted.

            Important for me is That the text which contains the same hash is merged in the endresult.
            Sorting the hash doesnt help me because I have less textlines in file nr1 than in file nr2.

            So sorting would not help because if I would sort them and put the lines togheter it wouldnt match.
            Thank you.

            1 Reply Last reply Reply Quote 0
            • Terry R
              Terry R last edited by

              @Shesh-Nioice said in Merge 2 Files - Lines Containing Same:

              So sorting would not help because if I would sort them and put the lines togheter it wouldnt match

              True if you are referring to the original line makeup (Test0 at the front). If however the makeup is changed (you just suggested it is OK to do so and put the Test0 and 4424 at the end) then a sort WILL match Hash data.

              So if you like I can mock up some steps and regular expressions to do most of the work for you. You just need to press some buttons and do some minor key entry work.

              Just as a matter of interest how many lines do each of the files contain?

              Terry

              Shesh Nioice 1 Reply Last reply Reply Quote 3
              • Shesh Nioice
                Shesh Nioice @Terry R last edited by

                @Terry-R This would be a huge help for me. If you have free time left. Thank you
                File 1 contains 28k lines
                File 2 contains 900k lines

                Only need the 28k matches lines the rest can be deleted.

                1 Reply Last reply Reply Quote 1
                • Terry R
                  Terry R last edited by Terry R

                  @Shesh-Nioice said in Merge 2 Files - Lines Containing Same:

                  This would be a huge help for me.

                  First off, combine the 2 files, so copy the contents of one file into the other file, doesn’t matter where. I’m assuming “Test0” and “4424” are real representations of the “names” at the start of the lines.

                  Every regex (regular expression) below requires that the “search mode” is set to regular expression, VERY IMPORTANT!

                  Second we need to reformat the lines so the “Test0” etc is at the end of the line. So use the following regex (regular expression) in the “Replace” function.
                  Find What:(?-s)^(\w+)(\s)(\w+)$
                  Replace With:\3\2\1
                  Hit the “Replace All” button

                  Now we need to sort the lines so the Hash data is together when duplicated. Use the builtin function under “Edit” menu, then “Line Operations”, then “sort lexicographically descending” (this puts Test0, Test1 etc first for any duplicate sequence).

                  Next we combine lines when the Hash is the same. So again use the “Replace” function.
                  Find What:(?-s)^(\w+)(\s.+)(\R)\1(\s.+)(\R|\z)
                  Replace With:\1\2\4\3
                  Hit the “Replace All” button

                  So now that the lines are combined we need to bring the “Test0”, “Test1” etc to the front. So again use the “Replace” function"
                  Find What:(?-s)(\w+)(\s)(\w+)(.+)*$
                  Replace With:\3\2\1\4
                  Hit the “Replace All” button

                  At this point I would have (given your example data along with 1 additional line for a threesome combo)

                  Test3 0001034369cf3f54df7b537b9459d978 3442 3678
                  Test2 0000f78a8b2c0a5d71c1f651f7959bab 9101
                  Test1 0000660f57cad07bc2d56a752ab1b051 5678
                  Test0 000027e23c421aeec283c0b491adb97a 4424
                  

                  I see your late additional step about removing any lines which were NOT duplicated. I can supply an additional step shortly but wanted to give you what I was working on while waiting for your reply.

                  I hope this works for you. It does rely heavily on your example data being a “real” representation of the data you are working with. If it is not then you need to portray the real data, or at least identify why my processes failed. We can then work on any changes to help get you the answer you seek.

                  Terry

                  1 Reply Last reply Reply Quote 4
                  • Terry R
                    Terry R last edited by

                    @Terry-R said in Merge 2 Files - Lines Containing Same:

                    I see your late additional step about removing any lines which were NOT duplicated. I can supply an additional step shortly

                    So to remove the NON duplicated data lines I’ve used the “Mark” function this time. It can also be done with a “Replace” function but this will give you more insight into the NPP functions and how they can help you with various tasks.

                    So the Mark function is under “Search” below the “Replace” option, don’t select “Mark All”, that’s a different option again.
                    We insert the regex:
                    Find What:(?-s)^(\w+)\s(\w+)$
                    Make sure “Bookmark Lines” is ticked, and search mode is set to “regular expression”. Click on “Mark All”. Now close this window and you will see some of the lines are marked with a blue dot at the start (default icon). At this point we can remove these lines so use “Remove Bookmarked Lines” which is under “Search”, then “Bookmark”.

                    Terry

                    1 Reply Last reply Reply Quote 3
                    • Shesh Nioice
                      Shesh Nioice last edited by

                      Hey terry again I really have no words everything worked like expected.
                      Saved alot time for me big thanks great community forum and I should be start learning regular expressions very interesting and helpful.

                      Thank you alot for taking the time helping me. :)
                      Great work nice to see that you taking time for noobs like me.
                      How I can give you a repurtation or positive feedback+ im new here.

                      Terry R 1 Reply Last reply Reply Quote 3
                      • Terry R
                        Terry R last edited by

                        @Shesh-Nioice said in Merge 2 Files - Lines Containing Same:

                        How I can give you a repurtation or positive feedback+ im new here.

                        Well, I see you’ve already up-voted my solution and that’s all we need to see. It is nice when posters do come back and give us feedback (of any kind). I’m glad it worked out so well on the first go. We do sometimes have to adjust if the examples provided did not give a true representation and I was concerned your “Test0”, “Test1” etc might have fallen into that.

                        Thanks
                        Terry

                        Shesh Nioice 1 Reply Last reply Reply Quote 2
                        • Shesh Nioice
                          Shesh Nioice @Terry R last edited by

                          @Terry-R Yes I was surprised myself I tested out first if the regex match for my case in https://regex101.com/
                          and It did. Everthing worked with the first try big thanks again.

                          1 Reply Last reply Reply Quote 0
                          • Terry R
                            Terry R @Shesh Nioice last edited by

                            @Shesh-Nioice said in Merge 2 Files - Lines Containing Same:

                            I should be start learning regular expressions very interesting and helpful.

                            To learn about regexes (regular expressions) start with our FAQ section. There are a number of helpful links to sites (I see you already found one, regex101.com). There is the manual for NPP which is on the NPP homepage. Other references are also linked within the FAQ posts.

                            I was where you are now not too long ago, but I too found the regulars here were very helpful to me and now I pass that forward. As always we strive not only to help users, but to guide them so they can gain more knowledge. Regexes are brilliant but don’t forget to start small at first as it can be quite daunting to attempt to read some of the more complex regexes that are provided on this forum. The beauty of regex101.com is that it does explain what the regex is doing so that will give you some insight.

                            Cheers
                            Terry

                            1 Reply Last reply Reply Quote 3
                            • First post
                              Last post
                            Copyright © 2014 NodeBB Forums | Contributors