Community
    • Login

    Copy From one text file and Paste it on another Text File using Regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 4 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Terry RT
      Terry R @Ohm Dios
      last edited by Terry R

      @Ohm-Dios said in Copy From one text file and Paste it on another Text File using Regex:

      So now need to copy that paragraph content from Text-file-1 and keep the other text of text-file-2 as it is.

      This is turning out to be quite a complicated (not difficult) process. I have managed to get it down to 8 major steps. Each step builds on the previous step and further changes the text ready for the next step. All steps involving regexes mean the search mode MUST be “regular expression”. The cursor should be at the very start of which ever file is being processed before each step is run. The regexes search for “START” and “END”, not “Start”, “start”, “End” or “end” or any other combination. If this is not correct then alterations to the regex are required.

      So the steps are:

      1. Combine sets of lines in both files to become 1 line. the START/END sequence are combined to 1 line. All the “inbetween” lines are also combined into 1 line together with any additional empty/blank lines before and after.
        Replace Function (perform this on file 1 and file 2):
        Find What:(?-i)(END\R)|(\R(?!START))
        Replace With:(?1\1)(?2%%)
        The %% is used as a flag for where carriage return/line feeds need to be recreated in the last step. If %% is a likely character string within the text this can be changed to any other string such as #@ or @& as examples. This change would need to be made in step 8 as well as this step.

      2. Remove unwanted lines in file 1.
        Mark Function:
        Find What:(?-is)^START.+
        Have “Bookmark line” ticked
        After “Mark All” has been clicked the lines to keep will be “marked” with the blue circle (default icon). To remove the unwanted lines use the “remove unmarked lines” option in “Search”, “Bookmark” menu.

      3. Number the lines in file 2. Then move number to end of line.
        Use the “Column Editor” function, first insert text, use the @ character. Use the Column editor again, this time with “number to insert”, initial number is 1, increase by 1 and tick “leading zeros.”
        Next use Replace function:
        Find What:(?-s)^(\d+)@(.+)
        Replace With:\2@\1

      4. Copy file 1 lines to file 2 (insert anywhere, but possibly after last line in file 2) and “sort lines lexicographically ascending”. This is under the "Line Operations, under the Edit menu.

      5. Replace empty start/end set with the new ones keeping the original number at end of line.
        Replace function:
        Find What:(?-is)^START.+@(\d+)\R(START.+)$
        Replace With:\2@\1

      6. Move number to start of line.
        Replace function:
        Find What:(?-s)^(.+)@(\d+)$
        Replace With:\2@\1

      7. Sort lines as integer ascending. This is under the "Line Operations, under the Edit menu.

      8. Remove number and recreate the CR/LFs.
        Replace function:
        Find What:(?-s)^\d+@|(%%)
        Replace With:(?1\r\n)

      Hopefully at this point you have the result you expected. I tested on a small scale and it worked as I expected it to.

      Terry

      Ohm DiosO 2 Replies Last reply Reply Quote 4
      • guy038G
        guy038
        last edited by guy038

        Hi, @ohm-dios, @terry-r and All,

        I’ve got a solution which may not work if your files are too big or contains a huge number of lines :-( Just try it out !

        Here is the road map :

        • First copy the File_2.txt contents ( with empty paragraphs ) in a new file, named File_3.txt

        • At the very end of the File_3.txt file, add a new line ========= ( at least, 3 equal signs ! )

        • Then, under that line, append all File_1.txt contents ( with paragrahs which must be recopied )

        • Save the new contents of File_3.txt

        • Move back to the very beginning of File_3.txt file ( Ctrl + Home )

        • Open the Replace dialog ( Ctrl + H )

          • SEARCH (?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1.+?END\R))|^===.+

          • REPLACE \2

          • Select the Regular expression search mode

          • Click, once, on the Replace All button ( or several times on the Replace button )


        Notes :

        • The boundaries START # and END must be written in uppercase

        • Each match, in File_3.txt, looks for an entire paragrah START #n ..... END ( initially, in File_2.txt ) and replaces it with the corresponding contents of the same paragraph START #n ..... END ( initially, in File_1.txt ), located after the line =========

        • The last match grabs and deletes all the contents betwwen the line =========, included and the very end of file ( the temporary File_1.txt contents )


        Unlike I said, in my previous post :

        • The initial contents of each paragraph START ..... END of File_2.txt do not matter. They could even be empty !

        • The initial contents of each paragraph START ..... END of File_2.txt may have different number of lines than the same paragraph in File_1.txt

        IMPORTANT :

        I test my regex S/R against a 10 Mb file, containing 52,000 lines, about :

        • Beginning with :
        START 1
                     
                     
        END
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        START 2
        
        
        
        
        
        
        
                     
        END
        Text OUTSIDE
        START 3
                     
                     
        END
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        START 4
                     
                     
                     
        END
        

        Ending with :

        START 1
        Para 1 line 1
        Para 1 line 2
        Para 1 line 3
        Para 1 line 4
        END
        START 2
        Para 2 Line 1
        END
        START 3
        Para 3 Line 1
        Para 3 Line 2
        Para 3 Line 3
        Para 3 Line 4
        Para 3 Line 5
        Para 3 Line 6
        END
        START 4
        Para 4 Line 1
        Para 4 Line 2
        Para 4 Line 3
        END
        
        • And containing 52,000 lines about of repetitive License.txt contents, in between !

        => The replacement was succesful, after some seconds, changing the File_3.txt contents into the expected text :

        START 1
        Para 1 line 1
        Para 1 line 2
        Para 1 line 3
        Para 1 line 4
        END
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        START 2
        Para 2 Line 1
        END
        Text OUTSIDE
        START 3
        Para 3 Line 1
        Para 3 Line 2
        Para 3 Line 3
        Para 3 Line 4
        Para 3 Line 5
        Para 3 Line 6
        END
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        Text OUTSIDE
        START 4
        Para 4 Line 1
        Para 4 Line 2
        Para 4 Line 3
        END
        

        Best Regards,

        guy038

        Ohm DiosO 1 Reply Last reply Reply Quote 3
        • Ohm DiosO
          Ohm Dios @guy038
          last edited by

          @guy038 Hi sir Thanks , As usual simplified solution for complex issue. Found one Issue when replace, the number sequence looks like this(my file has ex:340 paragraph) 199,299,336,49,59,69,79,89,99,109,119,…199,209 etc Instead of 1,2,3. Please look into that. Thanks.

          1 Reply Last reply Reply Quote 0
          • Ohm DiosO
            Ohm Dios @Terry R
            last edited by

            @Terry-R Sir, Thanks. Worked Nicely Only thing its little lengthy process.
            Only one small Bug Found that after completion END tag creates another 4 empty Lines and one More END tag adds

            END
            
            
            
            END
            

            Other than this All is fine. Thanks once again.

            Terry RT 1 Reply Last reply Reply Quote 0
            • Ohm DiosO
              Ohm Dios @Terry R
              last edited by

              @Terry-R P.S: In step no 5 both START ? tag

              1 Reply Last reply Reply Quote 0
              • Robin CruiseR
                Robin Cruise
                last edited by Robin Cruise

                I have another question, what if I have 20 txt files in one folder, and I want to make the replace with another 20 txt files in another folder, and each of files from folder 1 also begin with Start 1 and ends with END and the same in folder 2?

                And consider that the files from both folders has the same names:

                File-1.txt -> File-1.txt
                File-2.txt -> File-2.txt
                File-3.txt -> File-3.txt
                File-4.txt -> File-4.txt
                …
                File-20.txt -> File-20.txt

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  @ohm-dios,

                  You said :

                  Found one Issue when replace, the number sequence looks like this(my file has ex:340 paragraph) 199,299,336,49,59,69,79,89,99,109,119,…199,209 etc Instead of 1,2,3.

                  I did a quick test, replacing the values 1, 2, 3 and 4 with 199, 299, 336 and 49, without any problem !?


                  So, as usual, could you provide some text to test against and some information on the issue. How can you expect some help without giving us any data and vision of your workflow ?!

                  BR

                  guy038

                  Ohm DiosO 1 Reply Last reply Reply Quote 0
                  • Ohm DiosO
                    Ohm Dios @guy038
                    last edited by

                    @guy038 Thanks, Again sorry for my bad communication. My text file has 300 paragraph Numbered from 1 to 300 Ascending order. When Replacing this order changes instead of 1,2,3 it paste 199,299,336,49,59…99,109,119 etc.209,219,229 this is order.

                    ************File2***********
                    START 1
                     
                    END
                    Between para line
                    START 2
                     
                    END
                    Between para line
                    START 3
                     
                    END
                    Between para line
                    START 4
                     
                    END
                    Between para line
                    START 5
                     
                    END
                    Between para line
                    START 6
                     
                    END
                    Between para line
                    START 10
                     
                    END
                    Between para line
                    START 11
                     
                    END
                    Between para line
                    START 12
                     
                    END
                    Between para line
                    START 13
                     
                    END
                    Between para line
                    START 14
                     
                    END
                    =================
                    ************File 1**********
                    START 1
                    some line
                    END
                    File 1 para between
                    START 2
                    some line
                    END
                    File 1 para between
                    START 3
                    some line
                    END
                    File 1 para between
                    START 4
                    some line
                    END
                    File 1 para between
                    START 5
                    some line
                    END
                    File 1 para between
                    START 6
                    some line
                    END
                    File 1 para between
                    START 10
                    some line
                    END
                    File 1 para between
                    START 11
                    some line
                    END
                    File 1 para between
                    START 12
                    some line
                    END
                    File 1 para between
                    START 13
                    some line
                    END
                    File 1 para between
                    START 14
                    some line
                    END
                    

                    ouput

                    ************File2***********
                    START 13
                    some line
                    END
                    Between para line
                    START 2
                    some line
                    END
                    Between para line
                    START 3
                    some line
                    END
                    Between para line
                    START 4
                    some line
                    END
                    Between para line
                    START 5
                    some line
                    END
                    Between para line
                    START 6
                    some line
                    END
                    Between para line
                    START 10
                    some line
                    END
                    Between para line
                    START 11
                    some line
                    END
                    Between para line
                    START 12
                    some line
                    END
                    Between para line
                    START 13
                    some line
                    END
                    Between para line
                    START 13
                    some line
                    END
                    

                    Hope you will get my point. The sequence or ordering changes instead of 1,2,3. It shows first 13.Thanks.

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @ohm-dios, @terry-r and All,

                      Ah… OK ! I understood the problem :

                      • First, I suppose that the last line of your file did not end with two chars CRLF. So the regex just considered the START 13 ..... END paragrah as the last valid one !

                      • Secondly, I forgot to limit the same number to find, \1, with a line-break needed right after. Indeed, when searching in the second part ( File_1 part ) for START 1, we must tell the regex to avoid matches as START 11 or START 199 and, generally, START 1 followed with any range of digits !


                      So, the following regex S/R should work correctly, even if the last line of current file does not end with CRLF :

                      SEARCH (?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1\R.+?END\R?))|^===.+

                      REPLACE \2

                      You’ll note the new syntax \1\R to get the exact number, in the part under =========== and the \R? syntax, near the end of the regex, in order to match, whatever the last chars ending current file !

                      Best Regards

                      guy038

                      Ohm DiosO 2 Replies Last reply Reply Quote 2
                      • Ohm DiosO
                        Ohm Dios @guy038
                        last edited by

                        @guy038 I Pray God to Give Unlimited Love To you. Its 100% Fine now and You Really Saved A lot of Time and Effort. Thanks a Lot.

                        1 Reply Last reply Reply Quote 0
                        • Ohm DiosO
                          Ohm Dios @guy038
                          last edited by

                          @guy038 P.S.: Again sorry to disturb it works upto 100 after that it just replace the whole content from file_1(the one which pasted after ===========).Please look into that.

                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by guy038

                            Hi, @Ohm-dios and All,

                            But, my regex SR is just built to do so !!

                            Indeed, the result file File_3.txt contains :

                            • Firstly, the contents of File_2.txt

                            • The line ============

                            • Secondly, the contents of File_1.txt

                            • Then, when running the S/R, it :

                              • Copies all contents of paragraphs, located after the line ========, into the corresponding paragraphs, located above the line ========

                              • Finally, deletes the line ========== and everything, till the very end of file

                            • So, after saving the new contents of File_3.txt, this file becomes your new expected file File_2.txt

                            Or, am I missing something obvious ?

                            BR

                            guy038

                            Ohm DiosO 1 Reply Last reply Reply Quote 1
                            • Ohm DiosO
                              Ohm Dios @guy038
                              last edited by

                              @guy038 .Functioning is absolutely Right. Only 100 paragraph copied and corresponding file_1 deleted. But after 100 , the file_2 content totally replaced by file_1. Instead of copied and deleted. May be the number issue because until 100 perfect after that only issue comes.
                              I understood the function . lines ====== below will be copied to respective above after it gets deleted till the iteration of Paragraph Numbering.
                              1-100 NO ISSUE. Issue starts from 101 then all the content of file_1 directly replaced till end of file_2.Hope i explained a little you may catch my point.

                              1 Reply Last reply Reply Quote 0
                              • Terry RT
                                Terry R @Ohm Dios
                                last edited by

                                @Ohm-Dios said in Copy From one text file and Paste it on another Text File using Regex:

                                Only one small Bug Found that after completion END tag creates another 4 empty Lines and one More END tag adds

                                If you are finding there are unpaired END tags then I would assume they were there from the start. As a suggestion you can count the number of START and END tags in each file to confirm each file has the same number. To do so, use the Find function. Then type in (?-i)^START\s*\d+$ and click on the Count button. perform the same with (?-i)^END$. After getting that number you could then also perform another count as a secondary verification to count each set of START/END tags by using (?s-i)^START\s*\d+\R.+?^END$. If any of those numbers differed from the others you have an issue with your data.

                                There is one other count I’d like you to do since I have thought of one possibility where the empty START/END line would appear after the replacement START/END line at step 4. Use (?-si)^START.+?\R[!"#$] on file 1 and confirm the number is 0. If it is NOT 0 then I may need to amend slightly my regexes

                                As for my solution being a lengthy process, it is, intentionally. @guy038 solution is a much neater solution, however as he pointed out there can be issues using it on larger amounts of data. That’s why I refrain from offering it.

                                You also posted a question about step 5 with 2 START tags. I don’t know if you completely understand each step, although I did provide a description for each step. In step 5 we have 2 START/END lines together. The first should be the “empty” START/END line, the second the replacement START/END line. As the first has the original “line number” attached at the end we need to keep that. Step 5 regex identifies the relevant strings of characters in both lines, keeps the replacement START/END string, but attaches the number from the first line.

                                Terry

                                1 Reply Last reply Reply Quote 1
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  Hello, @ohm-dios,

                                  If your File_1 and File_2 files are not personal nor confidential, could you send me these files, by e-mail, for further testing sequences ?

                                  My e-mail address, temporary displayed :

                                  BR

                                  guy038

                                  Ohm DiosO 1 Reply Last reply Reply Quote 0
                                  • Ohm DiosO
                                    Ohm Dios @guy038
                                    last edited by

                                    @guy038 Dear sir, Files are not personal. But i found the issue as i told after 100, then i checked START 101 it has START 101 . (FULL STOP) That only caused the issue. After clear the character it worked without any issue. It’s god’s grace that’s why i have got help from like you masters to save my efforts and time . My Love and Hug to You. Thanks You Rock always.

                                    1 Reply Last reply Reply Quote 1
                                    • guy038G
                                      guy038
                                      last edited by

                                      Hi, @ohm-dios,

                                      Ah…, of course ! In the search regex :

                                      (?s-i)START\h*(\d+).+?END\R(?=.+(START\h*\1\R.+?END\R?))|^===.+

                                      You’ll notice the part START\h*\1\R, which defines the beginning of the paragraph that need to be copied, located under the ======== line

                                      The back-reference \1 to the group 1 ( which is the number after the START string and space char(s) ) must be immediately followed with the EOL chars ( \R )

                                      Thus, if anything is located between the number and the end of line, it cannot be equal to the corresponding number, located above the ========= line => NO match of this specific paragraph :-((

                                      BR

                                      guy038

                                      1 Reply Last reply Reply Quote 2
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors