Community
    • Login

    Easy way to align lines from different files by line numbers and tabs

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • glossarG
      glossar
      last edited by

      Hello
      Is there an easy way to align lines from different files by matching line numbers and tabs? Normally I would do that via Excel - copying lines from file 1 into column A and lines from file 2 into column B in Excel, selecting and copying the two columns and pasting back into Notepad. But Excel has one mio. or so rows limitation and I have two sets of files, each set having 22 mio. lines in total, and I want to align them all preferably in one go or at most in couple of steps.

      What I want to accomplish:

      In file 1: Line 1: This is the first line
      In file 2: Line 1: Das ist die erste Zeile
      In the aligned file: Line 1: This is the first line TAB (\t) Das ist die erste Zeile

      Thanks in advance!

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello Glossar,

        I found a solution, not too difficult, which needs the use of :

        • The Column editor ( Alt + C ), run FOUR times

        • A classical ascending sort, run ONCE only

        • One regex S/R, ( Ctrl + H ), run ONCE only


        Well, let’s go :

        Just one hypothesis : your two files must have the SAME number of lines.

        • File A refers to the file, whose contents will begin each line

        • File B refers to the file, whose contents will be added, in each line, after the tabulation character


        • Move the caret at the very beginning of file A ( CTRL + Home )

        • Open the Column Editor ( Alt + C )

        • Select the Number to Insert option

        • Type 1 as Initial number

        • Type 1 as Increase by

        • Check the Leading zeros option

        • Click on the OK button

        • Move, again, the caret, at the first line of file A, JUST BETWEEN the initial number 0..01 and the text of line 1

        • Re-Open the Column Editor ( Alt + C )

        • Select the Text to Insert option

        • Type the single upper-case letter A as text

        • Click on the OK button

        • Copy all the contents of file A, in the clipboard ( Ctrl + A )

        • Open a new tab ( Ctrl + N ), which will stands for the resulting file C

        • Paste the clipboard, in that new file C ( Ctrl + V )

        • REPLAY the first 13 operations, above, for file B

        • Move to the very end of file C and type on the ENTER key

        • Paste the clipboard, in that new file C ( Ctrl + V )

        • Run the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending

        • Move back to the very beginning of file C ( CTRL + Home )

        • In file C, open the Replace dialog ( Ctrl + H )

        • Type ^\d+A|(\R\d+B) , in the Find what: zone

        • Type ?1\t , in the Replace with: zone

        • UNCHECK the Wrap around option

        • Select the Regular exprression search mode :

        • Click on the Replace All button, ONCE only !

        • Save your changed file C ( Ctrl + S )

        Et voilà :-))


        One example :

        Contents of file A :

        This is
        small example
        Let's see
        has
        

        After the first use of the Column editor, adding numbers :

        1This is
        2small example
        3Let's see
        4has
        

        After the second use of the Column editor, adding the string “A” :

        1AThis is
        2Asmall example
        3ALet's see
        4Ahas
        

        Contents of file B :

        a
        of text
        how this text
        been modified
        

        After the first use of the Column editor, adding numbers :

        1a
        2of text
        3how this text
        4been modified
        

        After the second use of the Column editor, adding string “B” :

        1Ba
        2Bof text
        3Bhow this text
        4Bbeen modified
        

        So, contents of file C, by adding, first, file A, then, file B :

        1AThis is
        2Asmall example
        3ALet's see
        4Ahas
        1Ba
        2Bof text
        3Bhow this text
        4Bbeen modified
        

        After the ascending sort :

        1AThis is
        1Ba
        2Asmall example
        2Bof text
        3ALet's see
        3Bhow this text
        4Ahas
        4Bbeen modified
        

        After the final Search/Replacement :

        This is	a
        small example	of text
        Let's see	how this text
        has	been modified
        

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • glossarG
          glossar
          last edited by glossar

          Hello Guy!

          Thank you for your help! I’ve just tried to follow your instructions but am stucked with the 13rd step - that is, pasting “the clipboard in that new file C”. It seems there is something wrong with my clipboard, just experienced it now again, I have tried to copy and paste “the clipboard in that new file c” from your answer but it didn’t work, so I had to type! Strange! It didn’t work with Notepad either. I have tried it both with keyboard schortcuts and with maus. Okay, Notepad was open, I have closed it and now it works: “Paste the clipboard, in that new file C ( Ctrl + V )” Yes, it now works! So, somehow Notepad seems to lock my clipboard! Strange!

          Is there anything that can be done about this Notepad-Clipboard relationship?

          I am sure and it is obvious your solution will work, it seems also a clever one, but I am stuck with copy-paste operation! I’m using Notepad 7.2.2 32-bit version, Windows 10.

          Again, thank you for your help!

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hi, glossar,

            Quite weird ! Just a trivial question : Did you stop completely and re-start your machine. Most of problems disappear, after a cool reboot ;-))

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • glossarG
              glossar
              last edited by

              Hi guy!

              I’ve just started again from where I was. :) I have restarted the computer, but no luck! The chunk that I created to try contains 1 million lines and is 218 MB. I think Notepad (32 bit version) cannot send the data that big to the clipboard, even it could open the file. I have tried it with EditPad Lite without problem, so reinstalled the 64 version of Notepad, trading for the “Sort output only UNIQUE (at column) lines” function of the plugin TextFX, and the 64 version can handle it.

              Luckily, I have found a regex for removing duplicate lines (“^(.?)$\s+?^(?=.^\1$)”), so I don’t have to be cheeky asking for it. :)

              I have finally managed to go through the above steps, and it works as expected.
              Again, thank you for your help! I do appreciate it!

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors