Community
    • Login

    Creating two columns from two files via copy-paste?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • YutMarmaY
      YutMarma
      last edited by

      Hello!

      I have two files:

      File 1:
      A
      BBB
      CC
      DDDDDDD

      File 2:
      Adam
      Brad
      Charles
      Dylan

      I would like to somehow copy-paste file #2 into file #1 (or file #1 into file #2, it doesn’t matter) so that I get:

      File 3:
      A Adam
      BBB Brad
      CC Charles
      DDDDDDD Dylan

      Is this something that’s possible to do in Notepad++, or am I going to have to do horrible obscene things with spreadsheets? :)

      Mark OlsonM CoisesC mkupperM 4 Replies Last reply Reply Quote 0
      • Mark OlsonM
        Mark Olson @YutMarma
        last edited by Mark Olson

        @YutMarma
        There are three ways that I know of to do this in Notepad++:

        1. a way that works in one step but requires the PythonScript plugin. Fastest, but uses a programming language
        2. a way that works in four steps (add line numbers to each file, paste second file at end of first, do a regex-replace). Fast, but slow when combining large files (EDIT: I’m not even sure if this way exists)
        3. a way that works in seven steps (do a regex-replace in each file, add line numbers to each file, paste second file at end of first, sort lines, do a regex-replace). Fast

        EDIT: IMPORTANT NOTE: This post presents a method that only works properly if the two files have the same number of lines. I’m writing a PythonScript script that addresses this potential issue.

        If you want, I can show you the first or second (if it exists) ways, but I like the third way, so I’ll describe it here.

        High-level overview

        1. Label every line of both files with the same number; if you want line 1 of file A to come before line 1 of file B, you should give file A the lower number
        2. Add line numbers to each file; this ensure that the lines can be put in order
        3. Paste file B at the end of file A, making sure that there’s a single empty line at the end of file A before you do this. Now you have a document that looks like this:
          01 1 line 1 of file A
          02 1 line 2 of file A
          03 1 line 3 of file A
          01 2 line 1 of file B
          02 2 line 2 of file B
          03 2 line 3 of file B
          
        4. Sort the lines of this combined file. Now the document will look like this:
          01 1 line 1 of file A
          01 2 line 1 of file B
          02 1 line 2 of file A
          02 2 line 2 of file B
          03 1 line 3 of file A
          03 2 line 3 of file B
          
        5. Do a regex-replacement that leaves you with your desired result:
          line 1 of file A line 1 of file B
          line 2 of file A line 2 of file B
          line 3 of file A line 3 of file B
          

        individual steps

        1. Copy the first file into a new 1 scratch buffer (you don’t want to corrupt the original data).
        2. Use Search->Replace... to open the find/replace form and do the following replacement to add 1 at the start of each line of new 1:
          • Find what: ^
          • Replace with: \x201\x20
          • Search Mode: Regular expression (this is true for all find/replacements, so I won’t mention it again going forward)
        3. Move your caret to the beginning of new 1 and use Edit->Column Editor to create line numbers at the start of each line with the following settings:
          • Number to Insert checked
          • Initial number: 1
          • Increase by: 1
          • Repeat: 1
          • Leading: Zeros
        4. Copy your second file into a new 2 scratch buffer (to avoid data corruption)
        5. Use Search->Replace... to open the find/replace form and do the following replacement to add 2 at the start of each line of new 2:
          • Find what: ^
          • Replace with: \x202\x20
        6. Repeat step 3 above on new 2.
        7. Add a single blank line at the end of new 1
        8. Paste new 2 at the end of new 1. (We are now at the end of step 3 of the high-level overview)
        9. Use Edit->Line Operations->Sort Lines Lexicographically Ascending to sort the lines. (We are now at the end of step 4 of the high-level overview)
        10. Use Search->Replace... to open the find/replace form and do the following replacement to remove the file labels and line numbers:
          • Find what: (?-s)^\d+\x201\x20(.*)\r\n\d+\x202\x20(.*)
          • Replace with: ${1}\x20${2}

        And that’s it! I haven’t explained the regular expressions I used, but the user manual has an excellent guide.

        1 Reply Last reply Reply Quote 1
        • CoisesC
          Coises @YutMarma
          last edited by

          @YutMarma said in Creating two columns from two files via copy-paste?:

          File 1:
          A
          BBB
          CC
          DDDDDDD

          File 2:
          Adam
          Brad
          Charles
          Dylan

          I would like to somehow copy-paste file #2 into file #1 (or file #1 into file #2, it doesn’t matter) so that I get:

          File 3:
          A Adam
          BBB Brad
          CC Charles
          DDDDDDD Dylan

          Open File 1 and File 2 in two separate tabs.

          Add a blank at the beginning of every line in File 2. To do that, click at the beginning of the first line of the file, then Alt+Shift+click at the beginning of the last line of the file. (If you have to scroll, do it using the scroll wheel or the scroll bar, not the keyboard.) Now, type a space; you’ll see that a space appears at the beginning of each line.

          Make a rectangular selection encompassing all of File 1. To do that, first click at the beginning of the first line of the file. Now, Alt+Shift+click far enough into the empty space to the right of the last line in the file so that the selection rectangle encloses all the data on all the lines. Scroll as needed to make sure you have everything; if you don’t, just Alt+Shift+click again, further to the right on the last line, until you have it all. It doesn’t matter if you have extra empty space.

          Copy to the clipboard (Ctrl+C).

          Now, go back to File 2. Click at the beginning of the first line (to the left of the blank).

          Paste (Ctrl+V).

          1 Reply Last reply Reply Quote 2
          • CoisesC
            Coises @YutMarma
            last edited by Coises

            @YutMarma said in Creating two columns from two files via copy-paste?:

            You say “two columns” in your title. It is possible that the forum mangled your example, and you meant:

            A       Adam
            BBB     Brad
            CC      Charles
            DDDDDDD Dylan
            

            and not what it looks like in your post. (To avoid that in the future, please select example text and use the </> icon to make it a “code” block.)

            If that’s what you wanted, the procedure is similar to what I described in my last post, but not quite the same:

            Start with File 2. Enclose it in a rectangular selection. (See my previous post if you don’t know how to do that.) Copy to the clipboard.

            Switch to File 1. Place the cursor at the end of the first line and insert enough blanks that you are at least one blank past the longest line in the file — the position where you will want the second column to start.

            Now, paste.

            1 Reply Last reply Reply Quote 2
            • mkupperM
              mkupper @YutMarma
              last edited by

              @YutMarma, you can do the project entirely in Notepad++ but would end up spending considerable time fiddling with doing the merge and for large files you never would be sure if it was done correctly. Keep in mind that Notepad++ is a text editor, not a data organization tool.

              As spreadsheets are data organization tools, using them is often much easier, better, and more reliable.

              Copy/paste the first file into column 1 of a spreadsheet.
              Copy/paste the second file into column 2 of a spreadsheet.

              At this point your data is very nicely organized. You can sort, do indexing, and all sorts of wonderful things such as data validation.

              You can then chose to copy/paste the data into a plain text file. By default, most spreadsheet software will add a tab as a delimiter between the columns when using copy/paste. Nearly all spreadsheet software packages also offer ways to export or save-as the data in various formats, including plain text, CSV, etc.

              If you have many sets of files that you need or desire to merge then I would use scripting which will give you far more control over exceptions to the data and to the format or layout of the resulting merged files.

              1 Reply Last reply Reply Quote 1
              • Mark OlsonM
                Mark Olson
                last edited by Mark Olson

                I think the suggestions by @mkupper and @Coises above are both very reasonable, and there’s nothing wrong with using their ideas and disregarding this post.

                That said, I’ve come up with a PythonScript script that allows you to paste any number of files together line-by-line.

                It has some basic documentation in the initial docstring, so it hopefully should be self-explanatory.

                '''
                Uses PythonScript v3.0.16 or higher: https://github.com/bruderstein/PythonScript/releases
                First referenced in this Notepad++ community discussion: https://community.notepad-plus-plus.org/topic/25950/creating-two-columns-from-two-files-via-copy-paste
                
                == SUMMARY ==
                This script can paste together any number of files line-by-line.
                == EXAMPLE ==
                For example, if you have the following files:
                ----------------
                ----------------
                file A
                line 1 of file A
                line 2 of file A
                line 3 of file A
                ----------------
                file B
                line 1 of file B
                line 2 of file B
                ----------------
                file C
                line 1 of file C
                line 2 of file C
                line 3 of file C
                ----------------
                ----------------
                and you entered this into the first dialog:
                [ 3] file A
                [2 ] file B
                [1 ] file C
                and you clicked Yes for the second prompt
                and you entered FOOBAR into the third dialog
                and you clicked Yes for the fourth prompt
                you would get a new buffer opened with the following text:
                line 1 of file CFOOBARline 1 of file BFOOBARline 1 of file A
                line 2 of file CFOOBARline 2 of file BFOOBARline 2 of file A
                == NOTES ==
                This script does basic data validation. Specifically, if the files you wanted to paste together have different numbers of lines, final file will have as many lines as the file with the fewest lines, assuming you click Yes on the dialog that notifies you of this.
                I intentionally did not provide an option for the final file to include lines from files that are longer than the shortest file, because the result could be confusing or ambiguous.
                '''
                from Npp import editor, notepad
                import re
                
                def pl1of2al1of1():
                    all_filenames = [x[0] for x in notepad.getFiles()]
                    if len(all_filenames) < 2:
                        notepad.messageBox("only one file is open, so this script can't work")
                        return
                    # find the user's preferences
                    print(all_filenames)
                    selected_files = []
                    while True:
                        txt = notepad.prompt(("Put a whole number in the box for each file you want to combine, or hit Cancel to exit.\r\n"
                                              "The numbers determine the order in which the lines of each file are pasted."),
                            'Select files to combine', '\r\n'.join(f'[ ] {fname}' for fname in all_filenames))
                        if txt is None:
                            return
                        selected_file_text = re.findall(r'^ *\[ *([+-]?\d+) *\] *(\S[^\r\n]*?) *\r?$', txt, re.MULTILINE)
                        print(selected_file_text)
                        if len(selected_file_text) < 2:
                            notepad.messageBox('You must put whole numbers in the boxes for at least two files')
                            continue
                        selected_nums = set(int(x[0]) for x in selected_file_text)
                        if len(selected_nums) < len(selected_file_text):
                            notepad.messageBox('Each chosen file must have a distinct number')
                            continue
                        selected_files = [x[1].strip() for x in sorted(selected_file_text, key=lambda x: int(x[0]))]
                        if not all(file in all_filenames for file in selected_files):
                            notepad.messageBox('You accidentally changed the name of a file you selected. Make sure not to enter any text, other than putting numbers in the boxes.')
                            continue
                        if notepad.messageBox((f'The n^th line of the combined file will have the n^th lines of each of the chosen files in the order shown:\r\n' + 
                                               '\r\n'.join(selected_files) + 
                                               '\r\n---------------\r\n' + 
                                               'Is that what you want?'),
                                               'confirm files and order', MESSAGEBOXFLAGS.YESNO
                            ) == MESSAGEBOXFLAGS.RESULTYES:
                            break
                    sep_str = notepad.prompt('Enter some buffer text to put between the n^th line of file N and the n^th line of file N+1', 'Enter buffer text', '')
                    if sep_str is None:
                        return
                    # get the text
                    lines_by_file = []
                    eol = '\r\n'
                    previously_opened_file = notepad.getCurrentFilename()
                    for file in selected_files:
                        notepad.open(file)
                        eol = ['\r\n', '\r', '\n'][editor.getEOLMode()]
                        lines_by_file.append(editor.getText().splitlines())
                    notepad.open(previously_opened_file)
                    # combine the text
                    all_nlines = [len(x) for x in lines_by_file]
                    min_nlines = min(all_nlines)
                    if len(set(all_nlines)) != 1:
                        if notepad.messageBox((f'Not all files have the same number of lines.\r\n'
                                                 f'As a result, all lines past line {min_nlines + 1} will be cut off.\r\n'
                                                 'Yes: Trim off all the extra lines.\r\n'
                                                 'No: Cancel the operation.'),
                                'Choose what to do with extra lines', MESSAGEBOXFLAGS.YESNO
                            ) == MESSAGEBOXFLAGS.RESULTNO:
                            return
                    combined_text = eol.join(
                        sep_str.join(lines[ii] for lines in lines_by_file)
                        for ii in range(min_nlines)
                    )
                    notepad.new()
                    editor.setText(combined_text)
                
                
                if __name__ == '__main__':
                    pl1of2al1of1()
                
                YutMarmaY 1 Reply Last reply Reply Quote 2
                • Mark OlsonM Mark Olson referenced this topic on
                • YutMarmaY
                  YutMarma
                  last edited by

                  I wanted to thank everyone for the incredible suggestions, especially @Mark-Olson - you went not just above, but above and beyond.

                  I don’t have much processing to do with the text - the gist of it is that it’s stuff that had to be OCR’d in two separate batches due to formatting decisions that read perfectly fine on paper but completely bork the brain of a sensible-minded piece of character recognition software.

                  I’ll give the PythonScript a whirl, and I’ll report back if (when…?) I have any issues, if that’s okay!

                  1 Reply Last reply Reply Quote 2
                  • YutMarmaY
                    YutMarma @Mark Olson
                    last edited by

                    @Mark-Olson I’m getting an error in line 54:

                    File "C:\Program Files\Notepad++\plugins\PythonScript\scripts\columns.py", line 54
                      'Select files to combine', '\r\n'.join(f'[ ] {fname}' for fname in all_filenames))
                                                                          ^
                    SyntaxError: invalid syntax
                    
                    Mark OlsonM 1 Reply Last reply Reply Quote 0
                    • Mark OlsonM
                      Mark Olson @YutMarma
                      last edited by

                      @YutMarma
                      I bet that syntax error comes from you using PythonScript 2, which very out-of-date version that’s still on the plugin list. I recommend upgrading to PythonScript 3, but in case you don’t have time for that, it’s pretty easy to make this script compatible with Python 2:

                      '''
                      Uses PythonScript v3.0.16 or higher: https://github.com/bruderstein/PythonScript/releases
                      First referenced in this Notepad++ community discussion: https://community.notepad-plus-plus.org/topic/25950/creating-two-columns-from-two-files-via-copy-paste
                      
                      == SUMMARY ==
                      This script can paste together any number of files line-by-line.
                      == EXAMPLE ==
                      For example, if you have the following files:
                      ----------------
                      ----------------
                      file A
                      line 1 of file A
                      line 2 of file A
                      line 3 of file A
                      ----------------
                      file B
                      line 1 of file B
                      line 2 of file B
                      ----------------
                      file C
                      line 1 of file C
                      line 2 of file C
                      line 3 of file C
                      ----------------
                      ----------------
                      and you entered this into the first dialog:
                      [ 3] file A
                      [2 ] file B
                      [1 ] file C
                      and you clicked Yes for the second prompt
                      and you entered FOOBAR into the third dialog
                      and you clicked Yes for the fourth prompt
                      you would get a new buffer opened with the following text:
                      line 1 of file CFOOBARline 1 of file BFOOBARline 1 of file A
                      line 2 of file CFOOBARline 2 of file BFOOBARline 2 of file A
                      == NOTES ==
                      This script does basic data validation. Specifically, if the files you wanted to paste together have different numbers of lines, final file will have as many lines as the file with the fewest lines, assuming you click Yes on the dialog that notifies you of this.
                      I intentionally did not provide an option for the final file to include lines from files that are longer than the shortest file, because the result could be confusing or ambiguous.
                      '''
                      from Npp import editor, notepad
                      import re
                      
                      def pl1of2al1of1():
                          all_filenames = [x[0] for x in notepad.getFiles()]
                          if len(all_filenames) < 2:
                              notepad.messageBox("only one file is open, so this script can't work")
                              return
                          # find the user's preferences
                          print(all_filenames)
                          selected_files = []
                          while True:
                              txt = notepad.prompt(("Put a whole number in the box for each file you want to combine, or hit Cancel to exit.\r\n"
                                                    "The numbers determine the order in which the lines of each file are pasted."),
                                  'Select files to combine', '\r\n'.join('[ ] ' + fname for fname in all_filenames))
                              if txt is None:
                                  return
                              selected_file_text = re.findall(r'^ *\[ *([+-]?\d+) *\] *(\S[^\r\n]*?) *\r?$', txt, re.MULTILINE)
                              print(selected_file_text)
                              if len(selected_file_text) < 2:
                                  notepad.messageBox('You must put whole numbers in the boxes for at least two files')
                                  continue
                              selected_nums = set(int(x[0]) for x in selected_file_text)
                              if len(selected_nums) < len(selected_file_text):
                                  notepad.messageBox('Each chosen file must have a distinct number')
                                  continue
                              selected_files = [x[1].strip() for x in sorted(selected_file_text, key=lambda x: int(x[0]))]
                              if not all(file in all_filenames for file in selected_files):
                                  notepad.messageBox('You accidentally changed the name of a file you selected. Make sure not to enter any text, other than putting numbers in the boxes.')
                                  continue
                              if notepad.messageBox(('The n^th line of the combined file will have the n^th lines of each of the chosen files in the order shown:\r\n' + 
                                                     '\r\n'.join(selected_files) + 
                                                     '\r\n---------------\r\n' + 
                                                     'Is that what you want?'),
                                                     'confirm files and order', MESSAGEBOXFLAGS.YESNO
                                  ) == MESSAGEBOXFLAGS.RESULTYES:
                                  break
                          sep_str = notepad.prompt('Enter some buffer text to put between the n^th line of file N and the n^th line of file N+1', 'Enter buffer text', '')
                          if sep_str is None:
                              return
                          # get the text
                          lines_by_file = []
                          eol = '\r\n'
                          previously_opened_file = notepad.getCurrentFilename()
                          for file in selected_files:
                              notepad.open(file)
                              eol = ['\r\n', '\r', '\n'][editor.getEOLMode()]
                              lines_by_file.append(editor.getText().splitlines())
                          notepad.open(previously_opened_file)
                          # combine the text
                          all_nlines = [len(x) for x in lines_by_file]
                          min_nlines = min(all_nlines)
                          if len(set(all_nlines)) != 1:
                              if notepad.messageBox(('Not all files have the same number of lines.\r\n' +
                                                     'As a result, all lines past line {0} will be cut off.\r\n' + 
                                                     'Yes: Trim off all the extra lines.\r\n' +
                                                     'No: Cancel the operation.').format(min_nlines + 1),
                                      'Choose what to do with extra lines', MESSAGEBOXFLAGS.YESNO
                                  ) == MESSAGEBOXFLAGS.RESULTNO:
                                  return
                          combined_text = eol.join(
                              sep_str.join(lines[ii] for lines in lines_by_file)
                              for ii in range(min_nlines)
                          )
                          notepad.new()
                          editor.setText(combined_text)
                      
                      
                      if __name__ == '__main__':
                          pl1of2al1of1()
                      
                      
                      1 Reply Last reply Reply Quote 1
                      • Mark OlsonM Mark Olson referenced this topic on
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors