Creating two columns from two files via copy-paste?
-
Hello!
I have two files:
File 1:
A
BBB
CC
DDDDDDDFile 2:
Adam
Brad
Charles
DylanI would like to somehow copy-paste file #2 into file #1 (or file #1 into file #2, it doesn’t matter) so that I get:
File 3:
A Adam
BBB Brad
CC Charles
DDDDDDD DylanIs this something that’s possible to do in Notepad++, or am I going to have to do horrible obscene things with spreadsheets? :)
-
@YutMarma
There are three ways that I know of to do this in Notepad++:- a way that works in one step but requires the PythonScript plugin. Fastest, but uses a programming language
- a way that works in four steps (add line numbers to each file, paste second file at end of first, do a regex-replace). Fast, but slow when combining large files (EDIT: I’m not even sure if this way exists)
- a way that works in seven steps (do a regex-replace in each file, add line numbers to each file, paste second file at end of first, sort lines, do a regex-replace). Fast
EDIT: IMPORTANT NOTE: This post presents a method that only works properly if the two files have the same number of lines. I’m writing a PythonScript script that addresses this potential issue.
If you want, I can show you the first or second (if it exists) ways, but I like the third way, so I’ll describe it here.
High-level overview
- Label every line of both files with the same number; if you want line
1
of fileA
to come before line1
of fileB
, you should give fileA
the lower number - Add line numbers to each file; this ensure that the lines can be put in order
- Paste file
B
at the end of fileA
, making sure that there’s a single empty line at the end of fileA
before you do this. Now you have a document that looks like this:01 1 line 1 of file A 02 1 line 2 of file A 03 1 line 3 of file A 01 2 line 1 of file B 02 2 line 2 of file B 03 2 line 3 of file B
- Sort the lines of this combined file. Now the document will look like this:
01 1 line 1 of file A 01 2 line 1 of file B 02 1 line 2 of file A 02 2 line 2 of file B 03 1 line 3 of file A 03 2 line 3 of file B
- Do a regex-replacement that leaves you with your desired result:
line 1 of file A line 1 of file B line 2 of file A line 2 of file B line 3 of file A line 3 of file B
individual steps
- Copy the first file into a
new 1
scratch buffer (you don’t want to corrupt the original data). - Use
Search->Replace...
to open the find/replace form and do the following replacement to add1
at the start of each line ofnew 1
:- Find what:
^
- Replace with:
\x201\x20
- Search Mode:
Regular expression
(this is true for all find/replacements, so I won’t mention it again going forward)
- Find what:
- Move your caret to the beginning of
new 1
and useEdit->Column Editor
to create line numbers at the start of each line with the following settings:- Number to Insert checked
- Initial number: 1
- Increase by: 1
- Repeat: 1
- Leading: Zeros
- Copy your second file into a
new 2
scratch buffer (to avoid data corruption) - Use
Search->Replace...
to open the find/replace form and do the following replacement to add2
at the start of each line ofnew 2
:- Find what:
^
- Replace with:
\x202\x20
- Find what:
- Repeat step 3 above on
new 2
. - Add a single blank line at the end of
new 1
- Paste
new 2
at the end ofnew 1
. (We are now at the end of step 3 of the high-level overview) - Use
Edit->Line Operations->Sort Lines Lexicographically Ascending
to sort the lines. (We are now at the end of step 4 of the high-level overview) - Use
Search->Replace...
to open the find/replace form and do the following replacement to remove the file labels and line numbers:- Find what:
(?-s)^\d+\x201\x20(.*)\r\n\d+\x202\x20(.*)
- Replace with:
${1}\x20${2}
- Find what:
And that’s it! I haven’t explained the regular expressions I used, but the user manual has an excellent guide.
-
@YutMarma said in Creating two columns from two files via copy-paste?:
File 1:
A
BBB
CC
DDDDDDDFile 2:
Adam
Brad
Charles
DylanI would like to somehow copy-paste file #2 into file #1 (or file #1 into file #2, it doesn’t matter) so that I get:
File 3:
A Adam
BBB Brad
CC Charles
DDDDDDD DylanOpen File 1 and File 2 in two separate tabs.
Add a blank at the beginning of every line in File 2. To do that, click at the beginning of the first line of the file, then Alt+Shift+click at the beginning of the last line of the file. (If you have to scroll, do it using the scroll wheel or the scroll bar, not the keyboard.) Now, type a space; you’ll see that a space appears at the beginning of each line.
Make a rectangular selection encompassing all of File 1. To do that, first click at the beginning of the first line of the file. Now, Alt+Shift+click far enough into the empty space to the right of the last line in the file so that the selection rectangle encloses all the data on all the lines. Scroll as needed to make sure you have everything; if you don’t, just Alt+Shift+click again, further to the right on the last line, until you have it all. It doesn’t matter if you have extra empty space.
Copy to the clipboard (Ctrl+C).
Now, go back to File 2. Click at the beginning of the first line (to the left of the blank).
Paste (Ctrl+V).
-
@YutMarma said in Creating two columns from two files via copy-paste?:
You say “two columns” in your title. It is possible that the forum mangled your example, and you meant:
A Adam BBB Brad CC Charles DDDDDDD Dylan
and not what it looks like in your post. (To avoid that in the future, please select example text and use the </> icon to make it a “code” block.)
If that’s what you wanted, the procedure is similar to what I described in my last post, but not quite the same:
Start with File 2. Enclose it in a rectangular selection. (See my previous post if you don’t know how to do that.) Copy to the clipboard.
Switch to File 1. Place the cursor at the end of the first line and insert enough blanks that you are at least one blank past the longest line in the file — the position where you will want the second column to start.
Now, paste.
-
@YutMarma, you can do the project entirely in Notepad++ but would end up spending considerable time fiddling with doing the merge and for large files you never would be sure if it was done correctly. Keep in mind that Notepad++ is a text editor, not a data organization tool.
As spreadsheets are data organization tools, using them is often much easier, better, and more reliable.
Copy/paste the first file into column 1 of a spreadsheet.
Copy/paste the second file into column 2 of a spreadsheet.At this point your data is very nicely organized. You can sort, do indexing, and all sorts of wonderful things such as data validation.
You can then chose to copy/paste the data into a plain text file. By default, most spreadsheet software will add a tab as a delimiter between the columns when using copy/paste. Nearly all spreadsheet software packages also offer ways to export or save-as the data in various formats, including plain text, CSV, etc.
If you have many sets of files that you need or desire to merge then I would use scripting which will give you far more control over exceptions to the data and to the format or layout of the resulting merged files.
-
I think the suggestions by @mkupper and @Coises above are both very reasonable, and there’s nothing wrong with using their ideas and disregarding this post.
That said, I’ve come up with a PythonScript script that allows you to paste any number of files together line-by-line.
It has some basic documentation in the initial docstring, so it hopefully should be self-explanatory.
''' Uses PythonScript v3.0.16 or higher: https://github.com/bruderstein/PythonScript/releases First referenced in this Notepad++ community discussion: https://community.notepad-plus-plus.org/topic/25950/creating-two-columns-from-two-files-via-copy-paste == SUMMARY == This script can paste together any number of files line-by-line. == EXAMPLE == For example, if you have the following files: ---------------- ---------------- file A line 1 of file A line 2 of file A line 3 of file A ---------------- file B line 1 of file B line 2 of file B ---------------- file C line 1 of file C line 2 of file C line 3 of file C ---------------- ---------------- and you entered this into the first dialog: [ 3] file A [2 ] file B [1 ] file C and you clicked Yes for the second prompt and you entered FOOBAR into the third dialog and you clicked Yes for the fourth prompt you would get a new buffer opened with the following text: line 1 of file CFOOBARline 1 of file BFOOBARline 1 of file A line 2 of file CFOOBARline 2 of file BFOOBARline 2 of file A == NOTES == This script does basic data validation. Specifically, if the files you wanted to paste together have different numbers of lines, final file will have as many lines as the file with the fewest lines, assuming you click Yes on the dialog that notifies you of this. I intentionally did not provide an option for the final file to include lines from files that are longer than the shortest file, because the result could be confusing or ambiguous. ''' from Npp import editor, notepad import re def pl1of2al1of1(): all_filenames = [x[0] for x in notepad.getFiles()] if len(all_filenames) < 2: notepad.messageBox("only one file is open, so this script can't work") return # find the user's preferences print(all_filenames) selected_files = [] while True: txt = notepad.prompt(("Put a whole number in the box for each file you want to combine, or hit Cancel to exit.\r\n" "The numbers determine the order in which the lines of each file are pasted."), 'Select files to combine', '\r\n'.join(f'[ ] {fname}' for fname in all_filenames)) if txt is None: return selected_file_text = re.findall(r'^ *\[ *([+-]?\d+) *\] *(\S[^\r\n]*?) *\r?$', txt, re.MULTILINE) print(selected_file_text) if len(selected_file_text) < 2: notepad.messageBox('You must put whole numbers in the boxes for at least two files') continue selected_nums = set(int(x[0]) for x in selected_file_text) if len(selected_nums) < len(selected_file_text): notepad.messageBox('Each chosen file must have a distinct number') continue selected_files = [x[1].strip() for x in sorted(selected_file_text, key=lambda x: int(x[0]))] if not all(file in all_filenames for file in selected_files): notepad.messageBox('You accidentally changed the name of a file you selected. Make sure not to enter any text, other than putting numbers in the boxes.') continue if notepad.messageBox((f'The n^th line of the combined file will have the n^th lines of each of the chosen files in the order shown:\r\n' + '\r\n'.join(selected_files) + '\r\n---------------\r\n' + 'Is that what you want?'), 'confirm files and order', MESSAGEBOXFLAGS.YESNO ) == MESSAGEBOXFLAGS.RESULTYES: break sep_str = notepad.prompt('Enter some buffer text to put between the n^th line of file N and the n^th line of file N+1', 'Enter buffer text', '') if sep_str is None: return # get the text lines_by_file = [] eol = '\r\n' previously_opened_file = notepad.getCurrentFilename() for file in selected_files: notepad.open(file) eol = ['\r\n', '\r', '\n'][editor.getEOLMode()] lines_by_file.append(editor.getText().splitlines()) notepad.open(previously_opened_file) # combine the text all_nlines = [len(x) for x in lines_by_file] min_nlines = min(all_nlines) if len(set(all_nlines)) != 1: if notepad.messageBox((f'Not all files have the same number of lines.\r\n' f'As a result, all lines past line {min_nlines + 1} will be cut off.\r\n' 'Yes: Trim off all the extra lines.\r\n' 'No: Cancel the operation.'), 'Choose what to do with extra lines', MESSAGEBOXFLAGS.YESNO ) == MESSAGEBOXFLAGS.RESULTNO: return combined_text = eol.join( sep_str.join(lines[ii] for lines in lines_by_file) for ii in range(min_nlines) ) notepad.new() editor.setText(combined_text) if __name__ == '__main__': pl1of2al1of1()
-
-
I wanted to thank everyone for the incredible suggestions, especially @Mark-Olson - you went not just above, but above and beyond.
I don’t have much processing to do with the text - the gist of it is that it’s stuff that had to be OCR’d in two separate batches due to formatting decisions that read perfectly fine on paper but completely bork the brain of a sensible-minded piece of character recognition software.
I’ll give the PythonScript a whirl, and I’ll report back if (when…?) I have any issues, if that’s okay!
-
@Mark-Olson I’m getting an error in line 54:
File "C:\Program Files\Notepad++\plugins\PythonScript\scripts\columns.py", line 54 'Select files to combine', '\r\n'.join(f'[ ] {fname}' for fname in all_filenames)) ^ SyntaxError: invalid syntax
-
@YutMarma
I bet that syntax error comes from you using PythonScript 2, which very out-of-date version that’s still on the plugin list. I recommend upgrading to PythonScript 3, but in case you don’t have time for that, it’s pretty easy to make this script compatible with Python 2:''' Uses PythonScript v3.0.16 or higher: https://github.com/bruderstein/PythonScript/releases First referenced in this Notepad++ community discussion: https://community.notepad-plus-plus.org/topic/25950/creating-two-columns-from-two-files-via-copy-paste == SUMMARY == This script can paste together any number of files line-by-line. == EXAMPLE == For example, if you have the following files: ---------------- ---------------- file A line 1 of file A line 2 of file A line 3 of file A ---------------- file B line 1 of file B line 2 of file B ---------------- file C line 1 of file C line 2 of file C line 3 of file C ---------------- ---------------- and you entered this into the first dialog: [ 3] file A [2 ] file B [1 ] file C and you clicked Yes for the second prompt and you entered FOOBAR into the third dialog and you clicked Yes for the fourth prompt you would get a new buffer opened with the following text: line 1 of file CFOOBARline 1 of file BFOOBARline 1 of file A line 2 of file CFOOBARline 2 of file BFOOBARline 2 of file A == NOTES == This script does basic data validation. Specifically, if the files you wanted to paste together have different numbers of lines, final file will have as many lines as the file with the fewest lines, assuming you click Yes on the dialog that notifies you of this. I intentionally did not provide an option for the final file to include lines from files that are longer than the shortest file, because the result could be confusing or ambiguous. ''' from Npp import editor, notepad import re def pl1of2al1of1(): all_filenames = [x[0] for x in notepad.getFiles()] if len(all_filenames) < 2: notepad.messageBox("only one file is open, so this script can't work") return # find the user's preferences print(all_filenames) selected_files = [] while True: txt = notepad.prompt(("Put a whole number in the box for each file you want to combine, or hit Cancel to exit.\r\n" "The numbers determine the order in which the lines of each file are pasted."), 'Select files to combine', '\r\n'.join('[ ] ' + fname for fname in all_filenames)) if txt is None: return selected_file_text = re.findall(r'^ *\[ *([+-]?\d+) *\] *(\S[^\r\n]*?) *\r?$', txt, re.MULTILINE) print(selected_file_text) if len(selected_file_text) < 2: notepad.messageBox('You must put whole numbers in the boxes for at least two files') continue selected_nums = set(int(x[0]) for x in selected_file_text) if len(selected_nums) < len(selected_file_text): notepad.messageBox('Each chosen file must have a distinct number') continue selected_files = [x[1].strip() for x in sorted(selected_file_text, key=lambda x: int(x[0]))] if not all(file in all_filenames for file in selected_files): notepad.messageBox('You accidentally changed the name of a file you selected. Make sure not to enter any text, other than putting numbers in the boxes.') continue if notepad.messageBox(('The n^th line of the combined file will have the n^th lines of each of the chosen files in the order shown:\r\n' + '\r\n'.join(selected_files) + '\r\n---------------\r\n' + 'Is that what you want?'), 'confirm files and order', MESSAGEBOXFLAGS.YESNO ) == MESSAGEBOXFLAGS.RESULTYES: break sep_str = notepad.prompt('Enter some buffer text to put between the n^th line of file N and the n^th line of file N+1', 'Enter buffer text', '') if sep_str is None: return # get the text lines_by_file = [] eol = '\r\n' previously_opened_file = notepad.getCurrentFilename() for file in selected_files: notepad.open(file) eol = ['\r\n', '\r', '\n'][editor.getEOLMode()] lines_by_file.append(editor.getText().splitlines()) notepad.open(previously_opened_file) # combine the text all_nlines = [len(x) for x in lines_by_file] min_nlines = min(all_nlines) if len(set(all_nlines)) != 1: if notepad.messageBox(('Not all files have the same number of lines.\r\n' + 'As a result, all lines past line {0} will be cut off.\r\n' + 'Yes: Trim off all the extra lines.\r\n' + 'No: Cancel the operation.').format(min_nlines + 1), 'Choose what to do with extra lines', MESSAGEBOXFLAGS.YESNO ) == MESSAGEBOXFLAGS.RESULTNO: return combined_text = eol.join( sep_str.join(lines[ii] for lines in lines_by_file) for ii in range(min_nlines) ) notepad.new() editor.setText(combined_text) if __name__ == '__main__': pl1of2al1of1()
-