• Login
Community
  • Login

Easy way to align lines from different files by line numbers and tabs

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 2 Posters 4.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    glossar
    last edited by Dec 11, 2016, 7:15 PM

    Hello
    Is there an easy way to align lines from different files by matching line numbers and tabs? Normally I would do that via Excel - copying lines from file 1 into column A and lines from file 2 into column B in Excel, selecting and copying the two columns and pasting back into Notepad. But Excel has one mio. or so rows limitation and I have two sets of files, each set having 22 mio. lines in total, and I want to align them all preferably in one go or at most in couple of steps.

    What I want to accomplish:

    In file 1: Line 1: This is the first line
    In file 2: Line 1: Das ist die erste Zeile
    In the aligned file: Line 1: This is the first line TAB (\t) Das ist die erste Zeile

    Thanks in advance!

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Dec 14, 2016, 8:19 PM Dec 11, 2016, 11:10 PM

      Hello Glossar,

      I found a solution, not too difficult, which needs the use of :

      • The Column editor ( Alt + C ), run FOUR times

      • A classical ascending sort, run ONCE only

      • One regex S/R, ( Ctrl + H ), run ONCE only


      Well, let’s go :

      Just one hypothesis : your two files must have the SAME number of lines.

      • File A refers to the file, whose contents will begin each line

      • File B refers to the file, whose contents will be added, in each line, after the tabulation character


      • Move the caret at the very beginning of file A ( CTRL + Home )

      • Open the Column Editor ( Alt + C )

      • Select the Number to Insert option

      • Type 1 as Initial number

      • Type 1 as Increase by

      • Check the Leading zeros option

      • Click on the OK button

      • Move, again, the caret, at the first line of file A, JUST BETWEEN the initial number 0..01 and the text of line 1

      • Re-Open the Column Editor ( Alt + C )

      • Select the Text to Insert option

      • Type the single upper-case letter A as text

      • Click on the OK button

      • Copy all the contents of file A, in the clipboard ( Ctrl + A )

      • Open a new tab ( Ctrl + N ), which will stands for the resulting file C

      • Paste the clipboard, in that new file C ( Ctrl + V )

      • REPLAY the first 13 operations, above, for file B

      • Move to the very end of file C and type on the ENTER key

      • Paste the clipboard, in that new file C ( Ctrl + V )

      • Run the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending

      • Move back to the very beginning of file C ( CTRL + Home )

      • In file C, open the Replace dialog ( Ctrl + H )

      • Type ^\d+A|(\R\d+B) , in the Find what: zone

      • Type ?1\t , in the Replace with: zone

      • UNCHECK the Wrap around option

      • Select the Regular exprression search mode :

      • Click on the Replace All button, ONCE only !

      • Save your changed file C ( Ctrl + S )

      Et voilà :-))


      One example :

      Contents of file A :

      This is
      small example
      Let's see
      has
      

      After the first use of the Column editor, adding numbers :

      1This is
      2small example
      3Let's see
      4has
      

      After the second use of the Column editor, adding the string “A” :

      1AThis is
      2Asmall example
      3ALet's see
      4Ahas
      

      Contents of file B :

      a
      of text
      how this text
      been modified
      

      After the first use of the Column editor, adding numbers :

      1a
      2of text
      3how this text
      4been modified
      

      After the second use of the Column editor, adding string “B” :

      1Ba
      2Bof text
      3Bhow this text
      4Bbeen modified
      

      So, contents of file C, by adding, first, file A, then, file B :

      1AThis is
      2Asmall example
      3ALet's see
      4Ahas
      1Ba
      2Bof text
      3Bhow this text
      4Bbeen modified
      

      After the ascending sort :

      1AThis is
      1Ba
      2Asmall example
      2Bof text
      3ALet's see
      3Bhow this text
      4Ahas
      4Bbeen modified
      

      After the final Search/Replacement :

      This is	a
      small example	of text
      Let's see	how this text
      has	been modified
      

      Cheers,

      guy038

      1 Reply Last reply Reply Quote 0
      • G
        glossar
        last edited by glossar Dec 12, 2016, 6:19 PM Dec 12, 2016, 6:16 PM

        Hello Guy!

        Thank you for your help! I’ve just tried to follow your instructions but am stucked with the 13rd step - that is, pasting “the clipboard in that new file C”. It seems there is something wrong with my clipboard, just experienced it now again, I have tried to copy and paste “the clipboard in that new file c” from your answer but it didn’t work, so I had to type! Strange! It didn’t work with Notepad either. I have tried it both with keyboard schortcuts and with maus. Okay, Notepad was open, I have closed it and now it works: “Paste the clipboard, in that new file C ( Ctrl + V )” Yes, it now works! So, somehow Notepad seems to lock my clipboard! Strange!

        Is there anything that can be done about this Notepad-Clipboard relationship?

        I am sure and it is obvious your solution will work, it seems also a clever one, but I am stuck with copy-paste operation! I’m using Notepad 7.2.2 32-bit version, Windows 10.

        Again, thank you for your help!

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by Dec 12, 2016, 7:36 PM

          Hi, glossar,

          Quite weird ! Just a trivial question : Did you stop completely and re-start your machine. Most of problems disappear, after a cool reboot ;-))

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 0
          • G
            glossar
            last edited by Dec 14, 2016, 6:51 PM

            Hi guy!

            I’ve just started again from where I was. :) I have restarted the computer, but no luck! The chunk that I created to try contains 1 million lines and is 218 MB. I think Notepad (32 bit version) cannot send the data that big to the clipboard, even it could open the file. I have tried it with EditPad Lite without problem, so reinstalled the 64 version of Notepad, trading for the “Sort output only UNIQUE (at column) lines” function of the plugin TextFX, and the 64 version can handle it.

            Luckily, I have found a regex for removing duplicate lines (“^(.?)$\s+?^(?=.^\1$)”), so I don’t have to be cheeky asking for it. :)

            I have finally managed to go through the above steps, and it works as expected.
            Again, thank you for your help! I do appreciate it!

            1 Reply Last reply Reply Quote 0
            3 out of 5
            • First post
              3/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors