Easy way to align lines from different files by line numbers and tabs



  • Hello
    Is there an easy way to align lines from different files by matching line numbers and tabs? Normally I would do that via Excel - copying lines from file 1 into column A and lines from file 2 into column B in Excel, selecting and copying the two columns and pasting back into Notepad. But Excel has one mio. or so rows limitation and I have two sets of files, each set having 22 mio. lines in total, and I want to align them all preferably in one go or at most in couple of steps.

    What I want to accomplish:

    In file 1: Line 1: This is the first line
    In file 2: Line 1: Das ist die erste Zeile
    In the aligned file: Line 1: This is the first line TAB (\t) Das ist die erste Zeile

    Thanks in advance!



  • Hello Glossar,

    I found a solution, not too difficult, which needs the use of :

    • The Column editor ( Alt + C ), run FOUR times

    • A classical ascending sort, run ONCE only

    • One regex S/R, ( Ctrl + H ), run ONCE only


    Well, let’s go :

    Just one hypothesis : your two files must have the SAME number of lines.

    • File A refers to the file, whose contents will begin each line

    • File B refers to the file, whose contents will be added, in each line, after the tabulation character


    • Move the caret at the very beginning of file A ( CTRL + Home )

    • Open the Column Editor ( Alt + C )

    • Select the Number to Insert option

    • Type 1 as Initial number

    • Type 1 as Increase by

    • Check the Leading zeros option

    • Click on the OK button

    • Move, again, the caret, at the first line of file A, JUST BETWEEN the initial number 0..01 and the text of line 1

    • Re-Open the Column Editor ( Alt + C )

    • Select the Text to Insert option

    • Type the single upper-case letter A as text

    • Click on the OK button

    • Copy all the contents of file A, in the clipboard ( Ctrl + A )

    • Open a new tab ( Ctrl + N ), which will stands for the resulting file C

    • Paste the clipboard, in that new file C ( Ctrl + V )

    • REPLAY the first 13 operations, above, for file B

    • Move to the very end of file C and type on the ENTER key

    • Paste the clipboard, in that new file C ( Ctrl + V )

    • Run the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending

    • Move back to the very beginning of file C ( CTRL + Home )

    • In file C, open the Replace dialog ( Ctrl + H )

    • Type ^\d+A|(\R\d+B) , in the Find what: zone

    • Type ?1\t , in the Replace with: zone

    • UNCHECK the Wrap around option

    • Select the Regular exprression search mode :

    • Click on the Replace All button, ONCE only !

    • Save your changed file C ( Ctrl + S )

    Et voilà :-))


    One example :

    Contents of file A :

    This is
    small example
    Let's see
    has
    

    After the first use of the Column editor, adding numbers :

    1This is
    2small example
    3Let's see
    4has
    

    After the second use of the Column editor, adding the string “A” :

    1AThis is
    2Asmall example
    3ALet's see
    4Ahas
    

    Contents of file B :

    a
    of text
    how this text
    been modified
    

    After the first use of the Column editor, adding numbers :

    1a
    2of text
    3how this text
    4been modified
    

    After the second use of the Column editor, adding string “B” :

    1Ba
    2Bof text
    3Bhow this text
    4Bbeen modified
    

    So, contents of file C, by adding, first, file A, then, file B :

    1AThis is
    2Asmall example
    3ALet's see
    4Ahas
    1Ba
    2Bof text
    3Bhow this text
    4Bbeen modified
    

    After the ascending sort :

    1AThis is
    1Ba
    2Asmall example
    2Bof text
    3ALet's see
    3Bhow this text
    4Ahas
    4Bbeen modified
    

    After the final Search/Replacement :

    This is	a
    small example	of text
    Let's see	how this text
    has	been modified
    

    Cheers,

    guy038



  • Hello Guy!

    Thank you for your help! I’ve just tried to follow your instructions but am stucked with the 13rd step - that is, pasting “the clipboard in that new file C”. It seems there is something wrong with my clipboard, just experienced it now again, I have tried to copy and paste “the clipboard in that new file c” from your answer but it didn’t work, so I had to type! Strange! It didn’t work with Notepad either. I have tried it both with keyboard schortcuts and with maus. Okay, Notepad was open, I have closed it and now it works: “Paste the clipboard, in that new file C ( Ctrl + V )” Yes, it now works! So, somehow Notepad seems to lock my clipboard! Strange!

    Is there anything that can be done about this Notepad-Clipboard relationship?

    I am sure and it is obvious your solution will work, it seems also a clever one, but I am stuck with copy-paste operation! I’m using Notepad 7.2.2 32-bit version, Windows 10.

    Again, thank you for your help!



  • Hi, glossar,

    Quite weird ! Just a trivial question : Did you stop completely and re-start your machine. Most of problems disappear, after a cool reboot ;-))

    Cheers,

    guy038



  • Hi guy!

    I’ve just started again from where I was. :) I have restarted the computer, but no luck! The chunk that I created to try contains 1 million lines and is 218 MB. I think Notepad (32 bit version) cannot send the data that big to the clipboard, even it could open the file. I have tried it with EditPad Lite without problem, so reinstalled the 64 version of Notepad, trading for the “Sort output only UNIQUE (at column) lines” function of the plugin TextFX, and the 64 version can handle it.

    Luckily, I have found a regex for removing duplicate lines ("^(.?)$\s+?^(?=.^\1$)"), so I don’t have to be cheeky asking for it. :)

    I have finally managed to go through the above steps, and it works as expected.
    Again, thank you for your help! I do appreciate it!


Log in to reply