Concatenate corresponding lines from two files
-
Lets say I have two txt files. I want to concatenate the corresponding lines from one into another, how can this be automated in Notepadd++
Example
File1:
word1
word2
word3
etc…File2:
2243
2314
4231
3241And end up with either a new file or the first file containing
word12243
word22314
word44231
word53241I’ve tried using column editing but no all the lines are the same length so it doesnt seem to work for me.
Thanks.
-
And I found a solution online but cant delete my post so if anyone is looking https://www.gillmeister-software.com/online-tools/text/merge-lists-line-by-line.aspx
-
Hello, @jeff-caldwell and All,
Regex updated and section remark added, on 08/22/2017 - 11.07am ( French TZ )
Of course, Jeff, your on-line tool, to merge two lists, with same number of elements, works fine !
But, as I’m “mad” about regular expressions, I just tried, as an holiday’s exercise ( Indeed, I’m on holidays ! ) to find out a way to merge two identical lists, with a regex Search/Replacement
We’ll just need to extra symbols, which are not found in your two original lists. I chose the
#
and@
symbols
So, for instance, copy/paste the list of the 20 most common family names, in United Kingdom, below, in a new tab :
Smith Jones Taylor Brown Williams Wilson Johnson Davies Robinson Wright Thompson Evans Walker White Roberts Green Hall Wood Jackson Clarke
And copy/paste the list of the 20 most common given names ( 10 male / 10 female ), in United Kingdom, after the first list and possible blank lines :
Oliver Amelia Jack Olivia Harry Emily George Isla Charlie Ava Jacob Jessica Thomas Ella Noah Isabella William Poppy Oscar Mia
We, first, add :
-
A
#
symbol, in front of any line, containing a family name -
A
@
symbol, in front of any line, containing a given name
NOTE : This may be done, using the N++ column mode feature OR the simple regex ( SEARCH
^
and REPLACE#
( or@
)Thus, we obtain the complete list, below :
#Smith #Jones #Taylor #Brown #Williams #Wilson #Johnson #Davies #Robinson #Wright #Thompson #Evans #Walker #White #Roberts #Green #Hall #Wood #Jackson #Clarke @Oliver @Amelia @Jack @Olivia @Harry @Emily @George @Isla @Charlie @Ava @Jacob @Jessica @Thomas @Ella @Noah @Isabella @William @Poppy @Oscar @Mia
Now :
-
Move back to the beginning of the first list, or on a blank line, above
-
Open the Replace dialog (
CTRL + H
) -
Type the regex
(?-s)^#(.+)\R((?s).*?)@(.+\R?)
, in the Find what: zone -
Type the regex
\1 \3\2
, in the Replace with: zone, with aspace
character, after\1
-
Select the Regular expression search mode
-
Press, repeatedly, on the
ALT + A
shortcut ( idem. Replace All button ), till no other occurrence can be found
Et voilà !
After
20
Replace All actions, you should get the expected list :Smith Oliver Jones Amelia Taylor Jack Brown Olivia Williams Harry Wilson Emily Johnson George Davies Isla Robinson Charlie Wright Ava Thompson Jacob Evans Jessica Walker Thomas White Ella Roberts Noah Green Isabella Hall William Wood Poppy Jackson Oscar Clarke Mia
Remark :
I previously built the search regex as
(?-s)^#(.+)\R(?s)(.*?)(?-s)@(.+\R?)
Then, I understood that the modifier
(?s)
, in the middle of the regex, could be embedded, inside the second group(.*?)
, in order to limit its action to group 2, onlyBy that means, we don’t have to repeat the necessary
(?-s)
, modifier to get the next item of the second list and we get the final regex(?-s)^#(.+)\R((?s).*?)@(.+\R?)
Best Regards,
guy038
P.S. : The family and given names lists, above, are extracted from the two addresses, below :
https://en.wikipedia.org/wiki/Lists_of_most_common_surnames
https://en.wikipedia.org/wiki/List_of_most_popular_given_names
-
-
Hi, All,
Regex updated and section Notes, added, on 08/22/2017 - 10.45am ( French TZ )
Just realized that my previous regex can be extended to reorganize more than two lists !
For instance, let’s suppose I double the complete list of family and given names, from my previous post, adding the new symbols
=
and_
, in order to get the text, below :#Smith #Jones #Taylor #Brown #Williams #Wilson #Johnson #Davies #Robinson #Wright #Thompson #Evans #Walker #White #Roberts #Green #Hall #Wood #Jackson #Clarke @Oliver @Amelia @Jack @Olivia @Harry @Emily @George @Isla @Charlie @Ava @Jacob @Jessica @Thomas @Ella @Noah @Isabella @William @Poppy @Oscar @Mia =Smith =Jones =Taylor =Brown =Williams =Wilson =Johnson =Davies =Robinson =Wright =Thompson =Evans =Walker =White =Roberts =Green =Hall =Wood =Jackson =Clarke _Oliver _Amelia _Jack _Olivia _Harry _Emily _George _Isla _Charlie _Ava _Jacob _Jessica _Thomas _Ella _Noah _Isabella _William _Poppy _Oscar _Mia
Then, the regex S/R :
SEARCH
(?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)
REPLACE
\1 \3 \5 \7\2\4\6
, with aspace
character after\1
,\3
and\5
would return, after
20
hits, on theALT + A
shortcut ( Replace All ), the single shortened list :Smith Oliver Smith Oliver Jones Amelia Jones Amelia Taylor Jack Taylor Jack Brown Olivia Brown Olivia Williams Harry Williams Harry Wilson Emily Wilson Emily Johnson George Johnson George Davies Isla Davies Isla Robinson Charlie Robinson Charlie Wright Ava Wright Ava Thompson Jacob Thompson Jacob Evans Jessica Evans Jessica Walker Thomas Walker Thomas White Ella White Ella Roberts Noah Roberts Noah Green Isabella Green Isabella Hall William Hall William Wood Poppy Wood Poppy Jackson Oscar Jackson Oscar Clarke Mia Clarke Mia
Notes :
-
The first part
(?-s)
means that the regex engine will consider, by default, that the dot meta-character matches any single standard character, only -
Then the part
^#(.+)\R
represents the first complete line, beginning with the#
symbol and followed by its End of Line character(s), with part, after symbol#
, stored as group 1 -
Any part, of the form
((?s).*?)
, is the smallest multi-line range of characters ( standard or EOL ones ) till a User-symbol (@
,=
or_
) and stored as groups 2, 4 or 6 -
The parts
@(.+)\R
and=(.+)\R
represent the first complete line, beginning with the@
or=
symbol and followed by its End of Line character(s), with the part, after the symbol, stored as group 3 and 5 -
The last part
_(.+\R?)
stands for the first complete line, beginning with the_
symbol, followed by optional End of Line character(s), and the part, after the_
symbol is stored as group 7 -
In replacement, the first part,
\1 \3 \5 \7
, rewrites each line, without its initial User-symbol, separated by a space character, as an unique line, ended by End of Line character(s) -
Then, the remaining of the four lists,
\2\4\6
, is, simply, rewritten, without any change ! -
The table , below, marks the beginning of each of the seven defined groups :
----------------1-----2---------3-----4---------5-----6---------7------ SEARCH (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)
Cheers,
guy038
-