Copy From one text file and Paste it on another Text File using Regex
-
Hi, @ohm-dios, @terry-r and All,
I’ve got a solution which may not work if your files are too big or contains a huge number of lines :-( Just try it out !
Here is the road map :
-
First copy the
File_2.txt
contents ( with empty paragraphs ) in a new file, namedFile_3.txt
-
At the very end of the
File_3.txt
file, add a new line=========
( at least,3
equal signs ! ) -
Then, under that line, append all
File_1.txt
contents ( with paragrahs which must be recopied ) -
Save the new contents of
File_3.txt
-
Move back to the very beginning of
File_3.txt
file (Ctrl + Home
) -
Open the Replace dialog (
Ctrl + H
)-
SEARCH
(?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1.+?END\R))|^===.+
-
REPLACE
\2
-
Select the
Regular expression
search mode -
Click, once, on the
Replace All
button ( or several times on theReplace
button )
-
Notes :
-
The boundaries
START #
andEND
must be written in uppercase -
Each match, in
File_3.txt
, looks for an entire paragrahSTART #n ..... END
( initially, inFile_2.txt
) and replaces it with the corresponding contents of the same paragraphSTART #n ..... END
( initially, inFile_1.txt
), located after the line=========
-
The last match grabs and deletes all the contents betwwen the line
=========
, included and the very end of file ( the temporaryFile_1.txt
contents )
Unlike I said, in my previous post :
-
The initial contents of each paragraph
START ..... END
ofFile_2.txt
do not matter. They could even be empty ! -
The initial contents of each paragraph
START ..... END
ofFile_2.txt
may have different number of lines than the same paragraph inFile_1.txt
IMPORTANT :
I test my regex S/R against a
10 Mb
file, containing52,000
lines, about :- Beginning with :
START 1 END Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE START 2 END Text OUTSIDE START 3 END Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE START 4 END
Ending with :
START 1 Para 1 line 1 Para 1 line 2 Para 1 line 3 Para 1 line 4 END START 2 Para 2 Line 1 END START 3 Para 3 Line 1 Para 3 Line 2 Para 3 Line 3 Para 3 Line 4 Para 3 Line 5 Para 3 Line 6 END START 4 Para 4 Line 1 Para 4 Line 2 Para 4 Line 3 END
- And containing
52,000
lines about of repetitiveLicense.txt
contents, in between !
=> The replacement was succesful, after some seconds, changing the
File_3.txt
contents into the expected text :START 1 Para 1 line 1 Para 1 line 2 Para 1 line 3 Para 1 line 4 END Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE START 2 Para 2 Line 1 END Text OUTSIDE START 3 Para 3 Line 1 Para 3 Line 2 Para 3 Line 3 Para 3 Line 4 Para 3 Line 5 Para 3 Line 6 END Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE Text OUTSIDE START 4 Para 4 Line 1 Para 4 Line 2 Para 4 Line 3 END
Best Regards,
guy038
-
-
@guy038 Hi sir Thanks , As usual simplified solution for complex issue. Found one Issue when replace, the number sequence looks like this(my file has ex:340 paragraph) 199,299,336,49,59,69,79,89,99,109,119,…199,209 etc Instead of 1,2,3. Please look into that. Thanks.
-
@Terry-R Sir, Thanks. Worked Nicely Only thing its little lengthy process.
Only one small Bug Found that after completion END tag creates another 4 empty Lines and one More END tag addsEND END
Other than this All is fine. Thanks once again.
-
@Terry-R P.S: In step no 5 both START ? tag
-
I have another question, what if I have 20 txt files in one folder, and I want to make the replace with another 20 txt files in another folder, and each of files from folder 1 also begin with
Start 1
and ends withEND
and the same in folder 2?And consider that the files from both folders has the same names:
File-1.txt -> File-1.txt
File-2.txt -> File-2.txt
File-3.txt -> File-3.txt
File-4.txt -> File-4.txt
…
File-20.txt -> File-20.txt -
You said :
Found one Issue when replace, the number sequence looks like this(my file has ex:340 paragraph) 199,299,336,49,59,69,79,89,99,109,119,…199,209 etc Instead of 1,2,3.
I did a quick test, replacing the values
1
,2
,3
and4
with199
,299
,336
and49
, without any problem !?
So, as usual, could you provide some text to test against and some information on the issue. How can you expect some help without giving us any data and vision of your workflow ?!
BR
guy038
-
@guy038 Thanks, Again sorry for my bad communication. My text file has 300 paragraph Numbered from 1 to 300 Ascending order. When Replacing this order changes instead of 1,2,3 it paste 199,299,336,49,59…99,109,119 etc.209,219,229 this is order.
************File2*********** START 1 END Between para line START 2 END Between para line START 3 END Between para line START 4 END Between para line START 5 END Between para line START 6 END Between para line START 10 END Between para line START 11 END Between para line START 12 END Between para line START 13 END Between para line START 14 END ================= ************File 1********** START 1 some line END File 1 para between START 2 some line END File 1 para between START 3 some line END File 1 para between START 4 some line END File 1 para between START 5 some line END File 1 para between START 6 some line END File 1 para between START 10 some line END File 1 para between START 11 some line END File 1 para between START 12 some line END File 1 para between START 13 some line END File 1 para between START 14 some line END
ouput
************File2*********** START 13 some line END Between para line START 2 some line END Between para line START 3 some line END Between para line START 4 some line END Between para line START 5 some line END Between para line START 6 some line END Between para line START 10 some line END Between para line START 11 some line END Between para line START 12 some line END Between para line START 13 some line END Between para line START 13 some line END
Hope you will get my point. The sequence or ordering changes instead of 1,2,3. It shows first 13.Thanks.
-
Hello, @ohm-dios, @terry-r and All,
Ah… OK ! I understood the problem :
-
First, I suppose that the last line of your file did not end with two chars
CRLF
. So the regex just considered theSTART 13 ..... END
paragrah as the last valid one ! -
Secondly, I forgot to limit the same number to find,
\1
, with a line-break needed right after. Indeed, when searching in the second part (File_1
part ) forSTART 1
, we must tell the regex to avoid matches asSTART 11
orSTART 199
and, generally,START 1
followed with any range of digits !
So, the following regex S/R should work correctly, even if the last line of current file does not end with
CRLF
:SEARCH
(?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1\R.+?END\R?))|^===.+
REPLACE
\2
You’ll note the new syntax
\1\R
to get the exact number, in the part under===========
and the\R?
syntax, near the end of the regex, in order to match, whatever the last chars ending current file !Best Regards
guy038
-
-
@guy038 I Pray God to Give Unlimited Love To you. Its 100% Fine now and You Really Saved A lot of Time and Effort. Thanks a Lot.
-
@guy038 P.S.: Again sorry to disturb it works upto 100 after that it just replace the whole content from file_1(the one which pasted after ===========).Please look into that.
-
Hi, @Ohm-dios and All,
But, my regex SR is just built to do so !!
Indeed, the result file
File_3.txt
contains :-
Firstly, the contents of
File_2.txt
-
The line
============
-
Secondly, the contents of
File_1.txt
-
Then, when running the S/R, it :
-
Copies all contents of paragraphs, located after the line
========
, into the corresponding paragraphs, located above the line========
-
Finally, deletes the line
==========
and everything, till the very end of file
-
-
So, after saving the new contents of
File_3.txt
, this file becomes your new expected fileFile_2.txt
Or, am I missing something obvious ?
BR
guy038
-
-
@guy038 .Functioning is absolutely Right. Only 100 paragraph copied and corresponding file_1 deleted. But after 100 , the file_2 content totally replaced by file_1. Instead of copied and deleted. May be the number issue because until 100 perfect after that only issue comes.
I understood the function . lines ====== below will be copied to respective above after it gets deleted till the iteration of Paragraph Numbering.
1-100 NO ISSUE. Issue starts from 101 then all the content of file_1 directly replaced till end of file_2.Hope i explained a little you may catch my point. -
@Ohm-Dios said in Copy From one text file and Paste it on another Text File using Regex:
Only one small Bug Found that after completion END tag creates another 4 empty Lines and one More END tag adds
If you are finding there are unpaired END tags then I would assume they were there from the start. As a suggestion you can count the number of START and END tags in each file to confirm each file has the same number. To do so, use the Find function. Then type in
(?-i)^START\s*\d+$
and click on the Count button. perform the same with(?-i)^END$
. After getting that number you could then also perform another count as a secondary verification to count each set of START/END tags by using(?s-i)^START\s*\d+\R.+?^END$
. If any of those numbers differed from the others you have an issue with your data.There is one other count I’d like you to do since I have thought of one possibility where the empty START/END line would appear after the replacement START/END line at step 4. Use
(?-si)^START.+?\R[!"#$]
on file 1 and confirm the number is 0. If it is NOT 0 then I may need to amend slightly my regexesAs for my solution being a lengthy process, it is, intentionally. @guy038 solution is a much neater solution, however as he pointed out there can be issues using it on larger amounts of data. That’s why I refrain from offering it.
You also posted a question about step 5 with 2 START tags. I don’t know if you completely understand each step, although I did provide a description for each step. In step 5 we have 2 START/END lines together. The first should be the “empty” START/END line, the second the replacement START/END line. As the first has the original “line number” attached at the end we need to keep that. Step 5 regex identifies the relevant strings of characters in both lines, keeps the replacement START/END string, but attaches the number from the first line.
Terry
-
Hello, @ohm-dios,
If your
File_1
andFile_2
files are not personal nor confidential, could you send me these files, by e-mail, for further testing sequences ?My e-mail address, temporary displayed :
BR
guy038
-
@guy038 Dear sir, Files are not personal. But i found the issue as i told after 100, then i checked START 101 it has START 101 . (FULL STOP) That only caused the issue. After clear the character it worked without any issue. It’s god’s grace that’s why i have got help from like you masters to save my efforts and time . My Love and Hug to You. Thanks You Rock always.
-
Hi, @ohm-dios,
Ah…, of course ! In the search regex :
(?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1\R.+?END\R?))|^===.+
You’ll notice the part
START\h*\1\R
, which defines the beginning of the paragraph that need to be copied, located under the========
lineThe back-reference
\1
to the group1
( which is the number after theSTART
string and space char(s) ) must be immediately followed with the EOL chars (\R
)Thus, if anything is located between the number and the end of line, it cannot be equal to the corresponding number, located above the
=========
line => NO match of this specific paragraph :-((BR
guy038