Easy way to align lines from different files by line numbers and tabs
-
Hello
Is there an easy way to align lines from different files by matching line numbers and tabs? Normally I would do that via Excel - copying lines from file 1 into column A and lines from file 2 into column B in Excel, selecting and copying the two columns and pasting back into Notepad. But Excel has one mio. or so rows limitation and I have two sets of files, each set having 22 mio. lines in total, and I want to align them all preferably in one go or at most in couple of steps.What I want to accomplish:
In file 1: Line 1: This is the first line
In file 2: Line 1: Das ist die erste Zeile
In the aligned file: Line 1: This is the first line TAB (\t) Das ist die erste ZeileThanks in advance!
-
Hello Glossar,
I found a solution, not too difficult, which needs the use of :
-
The Column editor ( Alt + C ), run FOUR times
-
A classical ascending sort, run ONCE only
-
One regex S/R, ( Ctrl + H ), run ONCE only
Well, let’s go :
Just one hypothesis : your two files must have the SAME number of lines.
-
File A refers to the file, whose contents will begin each line
-
File B refers to the file, whose contents will be added, in each line, after the tabulation character
-
Move the caret at the very beginning of file A ( CTRL + Home )
-
Open the Column Editor ( Alt + C )
-
Select the Number to Insert option
-
Type 1 as Initial number
-
Type 1 as Increase by
-
Check the Leading zeros option
-
Click on the OK button
-
Move, again, the caret, at the first line of file A, JUST BETWEEN the initial number
0..01
and the text of line 1 -
Re-Open the Column Editor ( Alt + C )
-
Select the Text to Insert option
-
Type the single upper-case letter A as text
-
Click on the OK button
-
Copy all the contents of file A, in the clipboard ( Ctrl + A )
-
Open a new tab ( Ctrl + N ), which will stands for the resulting file C
-
Paste the clipboard, in that new file C ( Ctrl + V )
-
REPLAY the first 13 operations, above, for file B
-
Move to the very end of file C and type on the ENTER key
-
Paste the clipboard, in that new file C ( Ctrl + V )
-
Run the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending
-
Move back to the very beginning of file C ( CTRL + Home )
-
In file C, open the Replace dialog ( Ctrl + H )
-
Type
^\d+A|(\R\d+B)
, in the Find what: zone -
Type
?1\t
, in the Replace with: zone -
UNCHECK the Wrap around option
-
Select the Regular exprression search mode :
-
Click on the Replace All button, ONCE only !
-
Save your changed file C ( Ctrl + S )
Et voilà :-))
One example :
Contents of file A :
This is small example Let's see has
After the first use of the Column editor, adding numbers :
1This is 2small example 3Let's see 4has
After the second use of the Column editor, adding the string “A” :
1AThis is 2Asmall example 3ALet's see 4Ahas
Contents of file B :
a of text how this text been modified
After the first use of the Column editor, adding numbers :
1a 2of text 3how this text 4been modified
After the second use of the Column editor, adding string “B” :
1Ba 2Bof text 3Bhow this text 4Bbeen modified
So, contents of file C, by adding, first, file A, then, file B :
1AThis is 2Asmall example 3ALet's see 4Ahas 1Ba 2Bof text 3Bhow this text 4Bbeen modified
After the ascending sort :
1AThis is 1Ba 2Asmall example 2Bof text 3ALet's see 3Bhow this text 4Ahas 4Bbeen modified
After the final Search/Replacement :
This is a small example of text Let's see how this text has been modified
Cheers,
guy038
-
-
Hello Guy!
Thank you for your help! I’ve just tried to follow your instructions but am stucked with the 13rd step - that is, pasting “the clipboard in that new file C”. It seems there is something wrong with my clipboard, just experienced it now again, I have tried to copy and paste “the clipboard in that new file c” from your answer but it didn’t work, so I had to type! Strange! It didn’t work with Notepad either. I have tried it both with keyboard schortcuts and with maus. Okay, Notepad was open, I have closed it and now it works: “Paste the clipboard, in that new file C ( Ctrl + V )” Yes, it now works! So, somehow Notepad seems to lock my clipboard! Strange!
Is there anything that can be done about this Notepad-Clipboard relationship?
I am sure and it is obvious your solution will work, it seems also a clever one, but I am stuck with copy-paste operation! I’m using Notepad 7.2.2 32-bit version, Windows 10.
Again, thank you for your help!
-
Hi, glossar,
Quite weird ! Just a trivial question : Did you stop completely and re-start your machine. Most of problems disappear, after a cool reboot ;-))
Cheers,
guy038
-
Hi guy!
I’ve just started again from where I was. :) I have restarted the computer, but no luck! The chunk that I created to try contains 1 million lines and is 218 MB. I think Notepad (32 bit version) cannot send the data that big to the clipboard, even it could open the file. I have tried it with EditPad Lite without problem, so reinstalled the 64 version of Notepad, trading for the “Sort output only UNIQUE (at column) lines” function of the plugin TextFX, and the 64 version can handle it.
Luckily, I have found a regex for removing duplicate lines (“^(.?)$\s+?^(?=.^\1$)”), so I don’t have to be cheeky asking for it. :)
I have finally managed to go through the above steps, and it works as expected.
Again, thank you for your help! I do appreciate it!