Concatenate corresponding lines from two files



  • Lets say I have two txt files. I want to concatenate the corresponding lines from one into another, how can this be automated in Notepadd++

    Example

    File1:

    word1
    word2
    word3
    etc…

    File2:
    2243
    2314
    4231
    3241

    And end up with either a new file or the first file containing

    word12243
    word22314
    word44231
    word53241

    I’ve tried using column editing but no all the lines are the same length so it doesnt seem to work for me.

    Thanks.



  • And I found a solution online but cant delete my post so if anyone is looking https://www.gillmeister-software.com/online-tools/text/merge-lists-line-by-line.aspx



  • Hello, @jeff-caldwell and All,

    Regex updated and section remark added, on 08/22/2017 - 11.07am ( French TZ )

    Of course, Jeff, your on-line tool, to merge two lists, with same number of elements, works fine !

    But, as I’m “mad” about regular expressions, I just tried, as an holiday’s exercise ( Indeed, I’m on holidays ! ) to find out a way to merge two identical lists, with a regex Search/Replacement

    We’ll just need to extra symbols, which are not found in your two original lists. I chose the # and @ symbols


    So, for instance, copy/paste the list of the 20 most common family names, in United Kingdom, below, in a new tab :

    Smith
    Jones
    Taylor
    Brown
    Williams
    Wilson
    Johnson
    Davies
    Robinson
    Wright
    Thompson
    Evans
    Walker
    White
    Roberts
    Green
    Hall
    Wood
    Jackson
    Clarke
    

    And copy/paste the list of the 20 most common given names ( 10 male / 10 female ), in United Kingdom, after the first list and possible blank lines :

    Oliver
    Amelia
    Jack
    Olivia
    Harry
    Emily
    George
    Isla
    Charlie
    Ava
    Jacob
    Jessica
    Thomas
    Ella
    Noah
    Isabella
    William
    Poppy
    Oscar
    Mia
    

    We, first, add :

    • A # symbol, in front of any line, containing a family name

    • A @ symbol, in front of any line, containing a given name

    NOTE : This may be done, using the N++ column mode feature OR the simple regex ( SEARCH ^ and REPLACE # ( or @ )

    Thus, we obtain the complete list, below :

    #Smith
    #Jones
    #Taylor
    #Brown
    #Williams
    #Wilson
    #Johnson
    #Davies
    #Robinson
    #Wright
    #Thompson
    #Evans
    #Walker
    #White
    #Roberts
    #Green
    #Hall
    #Wood
    #Jackson
    #Clarke
    
    
    @Oliver
    @Amelia
    @Jack
    @Olivia
    @Harry
    @Emily
    @George
    @Isla
    @Charlie
    @Ava
    @Jacob
    @Jessica
    @Thomas
    @Ella
    @Noah
    @Isabella
    @William
    @Poppy
    @Oscar
    @Mia
    

    Now :

    • Move back to the beginning of the first list, or on a blank line, above

    • Open the Replace dialog ( CTRL + H )

    • Type the regex (?-s)^#(.+)\R((?s).*?)@(.+\R?), in the Find what: zone

    • Type the regex \1 \3\2, in the Replace with: zone, with a space character, after \1

    • Select the Regular expression search mode

    • Press, repeatedly, on the ALT + A shortcut ( idem. Replace All button ), till no other occurrence can be found

    Et voilà !

    After 20 Replace All actions, you should get the expected list :

    Smith Oliver
    Jones Amelia
    Taylor Jack
    Brown Olivia
    Williams Harry
    Wilson Emily
    Johnson George
    Davies Isla
    Robinson Charlie
    Wright Ava
    Thompson Jacob
    Evans Jessica
    Walker Thomas
    White Ella
    Roberts Noah
    Green Isabella
    Hall William
    Wood Poppy
    Jackson Oscar
    Clarke Mia
    

    Remark :

    I previously built the search regex as (?-s)^#(.+)\R(?s)(.*?)(?-s)@(.+\R?)

    Then, I understood that the modifier (?s), in the middle of the regex, could be embedded, inside the second group (.*?), in order to limit its action to group 2, only

    By that means, we don’t have to repeat the necessary (?-s), modifier to get the next item of the second list and we get the final regex (?-s)^#(.+)\R((?s).*?)@(.+\R?)

    Best Regards,

    guy038

    P.S. : The family and given names lists, above, are extracted from the two addresses, below :

    https://en.wikipedia.org/wiki/Lists_of_most_common_surnames

    https://en.wikipedia.org/wiki/List_of_most_popular_given_names



  • Hi, All,

    Regex updated and section Notes, added, on 08/22/2017 - 10.45am ( French TZ )

    Just realized that my previous regex can be extended to reorganize more than two lists !

    For instance, let’s suppose I double the complete list of family and given names, from my previous post, adding the new symbols = and _, in order to get the text, below :

    #Smith
    #Jones
    #Taylor
    #Brown
    #Williams
    #Wilson
    #Johnson
    #Davies
    #Robinson
    #Wright
    #Thompson
    #Evans
    #Walker
    #White
    #Roberts
    #Green
    #Hall
    #Wood
    #Jackson
    #Clarke
    
    
    @Oliver
    @Amelia
    @Jack
    @Olivia
    @Harry
    @Emily
    @George
    @Isla
    @Charlie
    @Ava
    @Jacob
    @Jessica
    @Thomas
    @Ella
    @Noah
    @Isabella
    @William
    @Poppy
    @Oscar
    @Mia
    
    =Smith
    =Jones
    =Taylor
    =Brown
    =Williams
    =Wilson
    =Johnson
    =Davies
    =Robinson
    =Wright
    =Thompson
    =Evans
    =Walker
    =White
    =Roberts
    =Green
    =Hall
    =Wood
    =Jackson
    =Clarke
    
    
    _Oliver
    _Amelia
    _Jack
    _Olivia
    _Harry
    _Emily
    _George
    _Isla
    _Charlie
    _Ava
    _Jacob
    _Jessica
    _Thomas
    _Ella
    _Noah
    _Isabella
    _William
    _Poppy
    _Oscar
    _Mia
    

    Then, the regex S/R :

    SEARCH (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)

    REPLACE \1 \3 \5 \7\2\4\6, with a space character after \1, \3 and \5

    would return, after 20 hits, on the ALT + A shortcut ( Replace All ), the single shortened list :

    Smith Oliver Smith Oliver
    Jones Amelia Jones Amelia
    Taylor Jack Taylor Jack
    Brown Olivia Brown Olivia
    Williams Harry Williams Harry
    Wilson Emily Wilson Emily
    Johnson George Johnson George
    Davies Isla Davies Isla
    Robinson Charlie Robinson Charlie
    Wright Ava Wright Ava
    Thompson Jacob Thompson Jacob
    Evans Jessica Evans Jessica
    Walker Thomas Walker Thomas
    White Ella White Ella
    Roberts Noah Roberts Noah
    Green Isabella Green Isabella
    Hall William Hall William
    Wood Poppy Wood Poppy
    Jackson Oscar Jackson Oscar
    Clarke Mia Clarke Mia
    

    Notes :

    • The first part (?-s) means that the regex engine will consider, by default, that the dot meta-character matches any single standard character, only

    • Then the part ^#(.+)\R represents the first complete line, beginning with the # symbol and followed by its End of Line character(s), with part, after symbol #, stored as group 1

    • Any part, of the form ((?s).*?), is the smallest multi-line range of characters ( standard or EOL ones ) till a User-symbol ( @, = or _ ) and stored as groups 2, 4 or 6

    • The parts @(.+)\R and =(.+)\R represent the first complete line, beginning with the @ or = symbol and followed by its End of Line character(s), with the part, after the symbol, stored as group 3 and 5

    • The last part _(.+\R?) stands for the first complete line, beginning with the _ symbol, followed by optional End of Line character(s), and the part, after the _ symbol is stored as group 7

    • In replacement, the first part, \1 \3 \5 \7, rewrites each line, without its initial User-symbol, separated by a space character, as an unique line, ended by End of Line character(s)

    • Then, the remaining of the four lists, \2\4\6, is, simply, rewritten, without any change !

    • The table , below, marks the beginning of each of the seven defined groups :

    ----------------1-----2---------3-----4---------5-----6---------7------
    SEARCH   (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)
    

    Cheers,

    guy038


Log in to reply