Concatenate corresponding lines from two files

Jeff Caldwell

Lets say I have two txt files. I want to concatenate the corresponding lines from one into another, how can this be automated in Notepadd++

Example

File1:

word1
word2
word3
etc…

File2:
2243
2314
4231
3241

And end up with either a new file or the first file containing

word12243
word22314
word44231
word53241

I’ve tried using column editing but no all the lines are the same length so it doesnt seem to work for me.

Thanks.

Jeff Caldwell

And I found a solution online but cant delete my post so if anyone is looking https://www.gillmeister-software.com/online-tools/text/merge-lists-line-by-line.aspx

guy038

Hello, @jeff-caldwell and All,

Regex updated and section remark added, on 08/22/2017 - 11.07am ( French TZ )

Of course, Jeff, your on-line tool, to merge two lists, with same number of elements, works fine !

But, as I’m “mad” about regular expressions, I just tried, as an holiday’s exercise ( Indeed, I’m on holidays ! ) to find out a way to merge two identical lists, with a regex Search/Replacement

We’ll just need to extra symbols, which are not found in your two original lists. I chose the # and @ symbols

So, for instance, copy/paste the list of the 20 most common family names, in United Kingdom, below, in a new tab :

Smith
Jones
Taylor
Brown
Williams
Wilson
Johnson
Davies
Robinson
Wright
Thompson
Evans
Walker
White
Roberts
Green
Hall
Wood
Jackson
Clarke

And copy/paste the list of the 20 most common given names ( 10 male / 10 female ), in United Kingdom, after the first list and possible blank lines :

Oliver
Amelia
Jack
Olivia
Harry
Emily
George
Isla
Charlie
Ava
Jacob
Jessica
Thomas
Ella
Noah
Isabella
William
Poppy
Oscar
Mia

We, first, add :

A # symbol, in front of any line, containing a family name
A @ symbol, in front of any line, containing a given name

NOTE : This may be done, using the N++ column mode feature OR the simple regex ( SEARCH ^ and REPLACE # ( or @ )

Thus, we obtain the complete list, below :

#Smith
#Jones
#Taylor
#Brown
#Williams
#Wilson
#Johnson
#Davies
#Robinson
#Wright
#Thompson
#Evans
#Walker
#White
#Roberts
#Green
#Hall
#Wood
#Jackson
#Clarke


@Oliver
@Amelia
@Jack
@Olivia
@Harry
@Emily
@George
@Isla
@Charlie
@Ava
@Jacob
@Jessica
@Thomas
@Ella
@Noah
@Isabella
@William
@Poppy
@Oscar
@Mia

Now :

Move back to the beginning of the first list, or on a blank line, above
Open the Replace dialog ( CTRL + H )
Type the regex (?-s)^#(.+)\R((?s).*?)@(.+\R?), in the Find what: zone
Type the regex \1 \3\2, in the Replace with: zone, with a space character, after \1
Select the Regular expression search mode
Press, repeatedly, on the ALT + A shortcut ( idem. Replace All button ), till no other occurrence can be found

Et voilà !

After 20 Replace All actions, you should get the expected list :

Smith Oliver
Jones Amelia
Taylor Jack
Brown Olivia
Williams Harry
Wilson Emily
Johnson George
Davies Isla
Robinson Charlie
Wright Ava
Thompson Jacob
Evans Jessica
Walker Thomas
White Ella
Roberts Noah
Green Isabella
Hall William
Wood Poppy
Jackson Oscar
Clarke Mia

Remark :

I previously built the search regex as (?-s)^#(.+)\R(?s)(.*?)(?-s)@(.+\R?)

Then, I understood that the modifier (?s), in the middle of the regex, could be embedded, inside the second group (.*?), in order to limit its action to group 2, only

By that means, we don’t have to repeat the necessary (?-s), modifier to get the next item of the second list and we get the final regex (?-s)^#(.+)\R((?s).*?)@(.+\R?)

Best Regards,

guy038

P.S. : The family and given names lists, above, are extracted from the two addresses, below :

https://en.wikipedia.org/wiki/Lists_of_most_common_surnames

https://en.wikipedia.org/wiki/List_of_most_popular_given_names

guy038

Hi, All,

Regex updated and section Notes, added, on 08/22/2017 - 10.45am ( French TZ )

Just realized that my previous regex can be extended to reorganize more than two lists !

For instance, let’s suppose I double the complete list of family and given names, from my previous post, adding the new symbols = and _, in order to get the text, below :

#Smith
#Jones
#Taylor
#Brown
#Williams
#Wilson
#Johnson
#Davies
#Robinson
#Wright
#Thompson
#Evans
#Walker
#White
#Roberts
#Green
#Hall
#Wood
#Jackson
#Clarke


@Oliver
@Amelia
@Jack
@Olivia
@Harry
@Emily
@George
@Isla
@Charlie
@Ava
@Jacob
@Jessica
@Thomas
@Ella
@Noah
@Isabella
@William
@Poppy
@Oscar
@Mia

=Smith
=Jones
=Taylor
=Brown
=Williams
=Wilson
=Johnson
=Davies
=Robinson
=Wright
=Thompson
=Evans
=Walker
=White
=Roberts
=Green
=Hall
=Wood
=Jackson
=Clarke


_Oliver
_Amelia
_Jack
_Olivia
_Harry
_Emily
_George
_Isla
_Charlie
_Ava
_Jacob
_Jessica
_Thomas
_Ella
_Noah
_Isabella
_William
_Poppy
_Oscar
_Mia

Then, the regex S/R :

SEARCH (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)

REPLACE \1 \3 \5 \7\2\4\6, with a space character after \1, \3 and \5

would return, after 20 hits, on the ALT + A shortcut ( Replace All ), the single shortened list :

Smith Oliver Smith Oliver
Jones Amelia Jones Amelia
Taylor Jack Taylor Jack
Brown Olivia Brown Olivia
Williams Harry Williams Harry
Wilson Emily Wilson Emily
Johnson George Johnson George
Davies Isla Davies Isla
Robinson Charlie Robinson Charlie
Wright Ava Wright Ava
Thompson Jacob Thompson Jacob
Evans Jessica Evans Jessica
Walker Thomas Walker Thomas
White Ella White Ella
Roberts Noah Roberts Noah
Green Isabella Green Isabella
Hall William Hall William
Wood Poppy Wood Poppy
Jackson Oscar Jackson Oscar
Clarke Mia Clarke Mia

Notes :

The first part (?-s) means that the regex engine will consider, by default, that the dot meta-character matches any single standard character, only
Then the part ^#(.+)\R represents the first complete line, beginning with the # symbol and followed by its End of Line character(s), with part, after symbol #, stored as group 1
Any part, of the form ((?s).*?), is the smallest multi-line range of characters ( standard or EOL ones ) till a User-symbol ( @, = or _ ) and stored as groups 2, 4 or 6
The parts @(.+)\R and =(.+)\R represent the first complete line, beginning with the @ or = symbol and followed by its End of Line character(s), with the part, after the symbol, stored as group 3 and 5
The last part _(.+\R?) stands for the first complete line, beginning with the _ symbol, followed by optional End of Line character(s), and the part, after the _ symbol is stored as group 7
In replacement, the first part, \1 \3 \5 \7, rewrites each line, without its initial User-symbol, separated by a space character, as an unique line, ended by End of Line character(s)
Then, the remaining of the four lists, \2\4\6, is, simply, rewritten, without any change !
The table , below, marks the beginning of each of the seven defined groups :

----------------1-----2---------3-----4---------5-----6---------7------
SEARCH   (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)

Cheers,

guy038