• Login
Community
  • Login

Concatenate corresponding lines from two files

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 2 Posters 7.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    Jeff Caldwell
    last edited by Aug 21, 2017, 12:10 AM

    Lets say I have two txt files. I want to concatenate the corresponding lines from one into another, how can this be automated in Notepadd++

    Example

    File1:

    word1
    word2
    word3
    etc…

    File2:
    2243
    2314
    4231
    3241

    And end up with either a new file or the first file containing

    word12243
    word22314
    word44231
    word53241

    I’ve tried using column editing but no all the lines are the same length so it doesnt seem to work for me.

    Thanks.

    1 Reply Last reply Reply Quote 0
    • J
      Jeff Caldwell
      last edited by Aug 21, 2017, 1:13 AM

      And I found a solution online but cant delete my post so if anyone is looking https://www.gillmeister-software.com/online-tools/text/merge-lists-line-by-line.aspx

      1 Reply Last reply Reply Quote 0
      • G
        guy038
        last edited by guy038 Aug 22, 2017, 9:05 AM Aug 21, 2017, 3:40 PM

        Hello, @jeff-caldwell and All,

        Regex updated and section remark added, on 08/22/2017 - 11.07am ( French TZ )

        Of course, Jeff, your on-line tool, to merge two lists, with same number of elements, works fine !

        But, as I’m “mad” about regular expressions, I just tried, as an holiday’s exercise ( Indeed, I’m on holidays ! ) to find out a way to merge two identical lists, with a regex Search/Replacement

        We’ll just need to extra symbols, which are not found in your two original lists. I chose the # and @ symbols


        So, for instance, copy/paste the list of the 20 most common family names, in United Kingdom, below, in a new tab :

        Smith
        Jones
        Taylor
        Brown
        Williams
        Wilson
        Johnson
        Davies
        Robinson
        Wright
        Thompson
        Evans
        Walker
        White
        Roberts
        Green
        Hall
        Wood
        Jackson
        Clarke
        

        And copy/paste the list of the 20 most common given names ( 10 male / 10 female ), in United Kingdom, after the first list and possible blank lines :

        Oliver
        Amelia
        Jack
        Olivia
        Harry
        Emily
        George
        Isla
        Charlie
        Ava
        Jacob
        Jessica
        Thomas
        Ella
        Noah
        Isabella
        William
        Poppy
        Oscar
        Mia
        

        We, first, add :

        • A # symbol, in front of any line, containing a family name

        • A @ symbol, in front of any line, containing a given name

        NOTE : This may be done, using the N++ column mode feature OR the simple regex ( SEARCH ^ and REPLACE # ( or @ )

        Thus, we obtain the complete list, below :

        #Smith
        #Jones
        #Taylor
        #Brown
        #Williams
        #Wilson
        #Johnson
        #Davies
        #Robinson
        #Wright
        #Thompson
        #Evans
        #Walker
        #White
        #Roberts
        #Green
        #Hall
        #Wood
        #Jackson
        #Clarke
        
        
        @Oliver
        @Amelia
        @Jack
        @Olivia
        @Harry
        @Emily
        @George
        @Isla
        @Charlie
        @Ava
        @Jacob
        @Jessica
        @Thomas
        @Ella
        @Noah
        @Isabella
        @William
        @Poppy
        @Oscar
        @Mia
        

        Now :

        • Move back to the beginning of the first list, or on a blank line, above

        • Open the Replace dialog ( CTRL + H )

        • Type the regex (?-s)^#(.+)\R((?s).*?)@(.+\R?), in the Find what: zone

        • Type the regex \1 \3\2, in the Replace with: zone, with a space character, after \1

        • Select the Regular expression search mode

        • Press, repeatedly, on the ALT + A shortcut ( idem. Replace All button ), till no other occurrence can be found

        Et voilà !

        After 20 Replace All actions, you should get the expected list :

        Smith Oliver
        Jones Amelia
        Taylor Jack
        Brown Olivia
        Williams Harry
        Wilson Emily
        Johnson George
        Davies Isla
        Robinson Charlie
        Wright Ava
        Thompson Jacob
        Evans Jessica
        Walker Thomas
        White Ella
        Roberts Noah
        Green Isabella
        Hall William
        Wood Poppy
        Jackson Oscar
        Clarke Mia
        

        Remark :

        I previously built the search regex as (?-s)^#(.+)\R(?s)(.*?)(?-s)@(.+\R?)

        Then, I understood that the modifier (?s), in the middle of the regex, could be embedded, inside the second group (.*?), in order to limit its action to group 2, only

        By that means, we don’t have to repeat the necessary (?-s), modifier to get the next item of the second list and we get the final regex (?-s)^#(.+)\R((?s).*?)@(.+\R?)

        Best Regards,

        guy038

        P.S. : The family and given names lists, above, are extracted from the two addresses, below :

        https://en.wikipedia.org/wiki/Lists_of_most_common_surnames

        https://en.wikipedia.org/wiki/List_of_most_popular_given_names

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Aug 22, 2017, 8:43 AM Aug 21, 2017, 4:37 PM

          Hi, All,

          Regex updated and section Notes, added, on 08/22/2017 - 10.45am ( French TZ )

          Just realized that my previous regex can be extended to reorganize more than two lists !

          For instance, let’s suppose I double the complete list of family and given names, from my previous post, adding the new symbols = and _, in order to get the text, below :

          #Smith
          #Jones
          #Taylor
          #Brown
          #Williams
          #Wilson
          #Johnson
          #Davies
          #Robinson
          #Wright
          #Thompson
          #Evans
          #Walker
          #White
          #Roberts
          #Green
          #Hall
          #Wood
          #Jackson
          #Clarke
          
          
          @Oliver
          @Amelia
          @Jack
          @Olivia
          @Harry
          @Emily
          @George
          @Isla
          @Charlie
          @Ava
          @Jacob
          @Jessica
          @Thomas
          @Ella
          @Noah
          @Isabella
          @William
          @Poppy
          @Oscar
          @Mia
          
          =Smith
          =Jones
          =Taylor
          =Brown
          =Williams
          =Wilson
          =Johnson
          =Davies
          =Robinson
          =Wright
          =Thompson
          =Evans
          =Walker
          =White
          =Roberts
          =Green
          =Hall
          =Wood
          =Jackson
          =Clarke
          
          
          _Oliver
          _Amelia
          _Jack
          _Olivia
          _Harry
          _Emily
          _George
          _Isla
          _Charlie
          _Ava
          _Jacob
          _Jessica
          _Thomas
          _Ella
          _Noah
          _Isabella
          _William
          _Poppy
          _Oscar
          _Mia
          

          Then, the regex S/R :

          SEARCH (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)

          REPLACE \1 \3 \5 \7\2\4\6, with a space character after \1, \3 and \5

          would return, after 20 hits, on the ALT + A shortcut ( Replace All ), the single shortened list :

          Smith Oliver Smith Oliver
          Jones Amelia Jones Amelia
          Taylor Jack Taylor Jack
          Brown Olivia Brown Olivia
          Williams Harry Williams Harry
          Wilson Emily Wilson Emily
          Johnson George Johnson George
          Davies Isla Davies Isla
          Robinson Charlie Robinson Charlie
          Wright Ava Wright Ava
          Thompson Jacob Thompson Jacob
          Evans Jessica Evans Jessica
          Walker Thomas Walker Thomas
          White Ella White Ella
          Roberts Noah Roberts Noah
          Green Isabella Green Isabella
          Hall William Hall William
          Wood Poppy Wood Poppy
          Jackson Oscar Jackson Oscar
          Clarke Mia Clarke Mia
          

          Notes :

          • The first part (?-s) means that the regex engine will consider, by default, that the dot meta-character matches any single standard character, only

          • Then the part ^#(.+)\R represents the first complete line, beginning with the # symbol and followed by its End of Line character(s), with part, after symbol #, stored as group 1

          • Any part, of the form ((?s).*?), is the smallest multi-line range of characters ( standard or EOL ones ) till a User-symbol ( @, = or _ ) and stored as groups 2, 4 or 6

          • The parts @(.+)\R and =(.+)\R represent the first complete line, beginning with the @ or = symbol and followed by its End of Line character(s), with the part, after the symbol, stored as group 3 and 5

          • The last part _(.+\R?) stands for the first complete line, beginning with the _ symbol, followed by optional End of Line character(s), and the part, after the _ symbol is stored as group 7

          • In replacement, the first part, \1 \3 \5 \7, rewrites each line, without its initial User-symbol, separated by a space character, as an unique line, ended by End of Line character(s)

          • Then, the remaining of the four lists, \2\4\6, is, simply, rewritten, without any change !

          • The table , below, marks the beginning of each of the seven defined groups :

          ----------------1-----2---------3-----4---------5-----6---------7------
          SEARCH   (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)
          

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 0
          2 out of 4
          • First post
            2/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors