Community
    • Login

    Concatenate corresponding lines from two files

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 6.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jeff CaldwellJ
      Jeff Caldwell
      last edited by

      Lets say I have two txt files. I want to concatenate the corresponding lines from one into another, how can this be automated in Notepadd++

      Example

      File1:

      word1
      word2
      word3
      etc…

      File2:
      2243
      2314
      4231
      3241

      And end up with either a new file or the first file containing

      word12243
      word22314
      word44231
      word53241

      I’ve tried using column editing but no all the lines are the same length so it doesnt seem to work for me.

      Thanks.

      1 Reply Last reply Reply Quote 0
      • Jeff CaldwellJ
        Jeff Caldwell
        last edited by

        And I found a solution online but cant delete my post so if anyone is looking https://www.gillmeister-software.com/online-tools/text/merge-lists-line-by-line.aspx

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello, @jeff-caldwell and All,

          Regex updated and section remark added, on 08/22/2017 - 11.07am ( French TZ )

          Of course, Jeff, your on-line tool, to merge two lists, with same number of elements, works fine !

          But, as I’m “mad” about regular expressions, I just tried, as an holiday’s exercise ( Indeed, I’m on holidays ! ) to find out a way to merge two identical lists, with a regex Search/Replacement

          We’ll just need to extra symbols, which are not found in your two original lists. I chose the # and @ symbols


          So, for instance, copy/paste the list of the 20 most common family names, in United Kingdom, below, in a new tab :

          Smith
          Jones
          Taylor
          Brown
          Williams
          Wilson
          Johnson
          Davies
          Robinson
          Wright
          Thompson
          Evans
          Walker
          White
          Roberts
          Green
          Hall
          Wood
          Jackson
          Clarke
          

          And copy/paste the list of the 20 most common given names ( 10 male / 10 female ), in United Kingdom, after the first list and possible blank lines :

          Oliver
          Amelia
          Jack
          Olivia
          Harry
          Emily
          George
          Isla
          Charlie
          Ava
          Jacob
          Jessica
          Thomas
          Ella
          Noah
          Isabella
          William
          Poppy
          Oscar
          Mia
          

          We, first, add :

          • A # symbol, in front of any line, containing a family name

          • A @ symbol, in front of any line, containing a given name

          NOTE : This may be done, using the N++ column mode feature OR the simple regex ( SEARCH ^ and REPLACE # ( or @ )

          Thus, we obtain the complete list, below :

          #Smith
          #Jones
          #Taylor
          #Brown
          #Williams
          #Wilson
          #Johnson
          #Davies
          #Robinson
          #Wright
          #Thompson
          #Evans
          #Walker
          #White
          #Roberts
          #Green
          #Hall
          #Wood
          #Jackson
          #Clarke
          
          
          @Oliver
          @Amelia
          @Jack
          @Olivia
          @Harry
          @Emily
          @George
          @Isla
          @Charlie
          @Ava
          @Jacob
          @Jessica
          @Thomas
          @Ella
          @Noah
          @Isabella
          @William
          @Poppy
          @Oscar
          @Mia
          

          Now :

          • Move back to the beginning of the first list, or on a blank line, above

          • Open the Replace dialog ( CTRL + H )

          • Type the regex (?-s)^#(.+)\R((?s).*?)@(.+\R?), in the Find what: zone

          • Type the regex \1 \3\2, in the Replace with: zone, with a space character, after \1

          • Select the Regular expression search mode

          • Press, repeatedly, on the ALT + A shortcut ( idem. Replace All button ), till no other occurrence can be found

          Et voilà !

          After 20 Replace All actions, you should get the expected list :

          Smith Oliver
          Jones Amelia
          Taylor Jack
          Brown Olivia
          Williams Harry
          Wilson Emily
          Johnson George
          Davies Isla
          Robinson Charlie
          Wright Ava
          Thompson Jacob
          Evans Jessica
          Walker Thomas
          White Ella
          Roberts Noah
          Green Isabella
          Hall William
          Wood Poppy
          Jackson Oscar
          Clarke Mia
          

          Remark :

          I previously built the search regex as (?-s)^#(.+)\R(?s)(.*?)(?-s)@(.+\R?)

          Then, I understood that the modifier (?s), in the middle of the regex, could be embedded, inside the second group (.*?), in order to limit its action to group 2, only

          By that means, we don’t have to repeat the necessary (?-s), modifier to get the next item of the second list and we get the final regex (?-s)^#(.+)\R((?s).*?)@(.+\R?)

          Best Regards,

          guy038

          P.S. : The family and given names lists, above, are extracted from the two addresses, below :

          https://en.wikipedia.org/wiki/Lists_of_most_common_surnames

          https://en.wikipedia.org/wiki/List_of_most_popular_given_names

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, All,

            Regex updated and section Notes, added, on 08/22/2017 - 10.45am ( French TZ )

            Just realized that my previous regex can be extended to reorganize more than two lists !

            For instance, let’s suppose I double the complete list of family and given names, from my previous post, adding the new symbols = and _, in order to get the text, below :

            #Smith
            #Jones
            #Taylor
            #Brown
            #Williams
            #Wilson
            #Johnson
            #Davies
            #Robinson
            #Wright
            #Thompson
            #Evans
            #Walker
            #White
            #Roberts
            #Green
            #Hall
            #Wood
            #Jackson
            #Clarke
            
            
            @Oliver
            @Amelia
            @Jack
            @Olivia
            @Harry
            @Emily
            @George
            @Isla
            @Charlie
            @Ava
            @Jacob
            @Jessica
            @Thomas
            @Ella
            @Noah
            @Isabella
            @William
            @Poppy
            @Oscar
            @Mia
            
            =Smith
            =Jones
            =Taylor
            =Brown
            =Williams
            =Wilson
            =Johnson
            =Davies
            =Robinson
            =Wright
            =Thompson
            =Evans
            =Walker
            =White
            =Roberts
            =Green
            =Hall
            =Wood
            =Jackson
            =Clarke
            
            
            _Oliver
            _Amelia
            _Jack
            _Olivia
            _Harry
            _Emily
            _George
            _Isla
            _Charlie
            _Ava
            _Jacob
            _Jessica
            _Thomas
            _Ella
            _Noah
            _Isabella
            _William
            _Poppy
            _Oscar
            _Mia
            

            Then, the regex S/R :

            SEARCH (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)

            REPLACE \1 \3 \5 \7\2\4\6, with a space character after \1, \3 and \5

            would return, after 20 hits, on the ALT + A shortcut ( Replace All ), the single shortened list :

            Smith Oliver Smith Oliver
            Jones Amelia Jones Amelia
            Taylor Jack Taylor Jack
            Brown Olivia Brown Olivia
            Williams Harry Williams Harry
            Wilson Emily Wilson Emily
            Johnson George Johnson George
            Davies Isla Davies Isla
            Robinson Charlie Robinson Charlie
            Wright Ava Wright Ava
            Thompson Jacob Thompson Jacob
            Evans Jessica Evans Jessica
            Walker Thomas Walker Thomas
            White Ella White Ella
            Roberts Noah Roberts Noah
            Green Isabella Green Isabella
            Hall William Hall William
            Wood Poppy Wood Poppy
            Jackson Oscar Jackson Oscar
            Clarke Mia Clarke Mia
            

            Notes :

            • The first part (?-s) means that the regex engine will consider, by default, that the dot meta-character matches any single standard character, only

            • Then the part ^#(.+)\R represents the first complete line, beginning with the # symbol and followed by its End of Line character(s), with part, after symbol #, stored as group 1

            • Any part, of the form ((?s).*?), is the smallest multi-line range of characters ( standard or EOL ones ) till a User-symbol ( @, = or _ ) and stored as groups 2, 4 or 6

            • The parts @(.+)\R and =(.+)\R represent the first complete line, beginning with the @ or = symbol and followed by its End of Line character(s), with the part, after the symbol, stored as group 3 and 5

            • The last part _(.+\R?) stands for the first complete line, beginning with the _ symbol, followed by optional End of Line character(s), and the part, after the _ symbol is stored as group 7

            • In replacement, the first part, \1 \3 \5 \7, rewrites each line, without its initial User-symbol, separated by a space character, as an unique line, ended by End of Line character(s)

            • Then, the remaining of the four lists, \2\4\6, is, simply, rewritten, without any change !

            • The table , below, marks the beginning of each of the seven defined groups :

            ----------------1-----2---------3-----4---------5-----6---------7------
            SEARCH   (?-s)^#(.+)\R((?s).*?)@(.+)\R((?s).*?)=(.+)\R((?s).*?)_(.+\R?)
            

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors