Community
    • Login

    Copy, search and replace between 2 HTML files

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 4 Posters 942 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi, @hientwi and All,

      Ah…, of course, It cannot work because, there are a random number of lines between each KOSMOS line ! So, here is an other method which should work fine, although it contains numerous steps ;-))

      To begin with, from your pictures, I noticed that your file A contains 223,145 lines and I assume that your file B contains 895 lines only

      OK, let’s go !


      • Open your two files A and B in Notepad++

      Let’s suppose the following file A, containing only 5 lines KOSMOS, among the 223,145 lines of file A, then the input text :

      Line 1
      Line 2
      Line 3
      KOSMOS
      Line 5
      KOSMOS
      Line 7
      KOSMOS
      Line 9
      .....
      .....
      .....
      .....
      Line 223,139
      KOSMOS
      Line 223,141
      Line 223,142
      KOSMOS
      Line 223,144
      Line 223,145
      
      • Open the Column Editor`

        • Select Number to Insert

        • Type in 1 in the following three zones

        • Tick the Leading zeros option

        • Verify the Dec format

        • Click on the OK button

      You should get :

      000001Line 1
      000002Line 2
      000003Line 3
      000004KOSMOS
      000005Line 5
      000006KOSMOS
      000007Line 7
      000008KOSMOS
      000009Line 9
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      223139Line 223,139
      223140KOSMOS
      223141Line 223,141
      223142Line 223,142
      223143KOSMOS
      223144Line 223,144
      223145Line 223,145
      
      • Now open the Mark dialog ( Search > Mark... option )

        • SEARCH (?-i)KOSMOS

        • Option Bookmark line ticked

        • Option Purge for each search ticked, preferably

        • Option Wrap around ticked

        • Mode Regular expression selected

        • Click on the Mark All

      => The 895 lines KOSMOS should be bookmarked

      • Then, run the option Search > Bookmark > Copy bookmarked Lines

      • Now, select your File B tab, containing also 5 lines, which will replace each KOSMOS line of file A

      -- The Line 1 contents ( File B ) --
      -- The Line 2 contents ( File B ) --
      -- The Line 3 contents ( File B ) --
      -- The Line 4 contents ( File B ) --
      -- The Line 5 contents ( File B ) --
      
      • After the 895 lines of file B, add a separation line with, at least, 3 consecutive equal signs, so the string === with a line-break

      • Then paste the contents of the clipboard, with Ctrl + V ( so the 895 lines KOSMOS of file A )

      Thus, the contents of file B should contain 895 lines before the ===: line and 895 after ( 5, in our example )

      -- The Line 1 contents ( File B ) --
      -- The Line 2 contents ( File B ) --
      -- The Line 3 contents ( File B ) --
      -- The Line 4 contents ( File B ) --
      -- The Line 5 contents ( File B ) --
      ===
      000004KOSMOS
      000006KOSMOS
      000008KOSMOS
      223140KOSMOS
      223143KOSMOS
      
      • Perform the following regex S/R, in the Replace dialog ( Ctrl + H )

        • SEARCH (?-si).+(?=\R(?s:.+?\R){5}(.+))|(?s)===.+ ( Of course, use the quantifier {895}, instead of {5}, with your present file B )

        • REPLACE ?1\1$0

        • Option Wrap around ticked and Regular expression selected

        • Click on the Replace All button

      After 895 replacements ( 5, in our example ), we get, at once, the following text :

      000004KOSMOS-- The Line 1 contents ( File B ) --
      000006KOSMOS-- The Line 2 contents ( File B ) --
      000008KOSMOS-- The Line 3 contents ( File B ) --
      223140KOSMOS-- The Line 4 contents ( File B ) --
      223143KOSMOS-- The Line 5 contents ( File B ) --
      
      • Then select all the contents of file B, with Ctrl + A

      • Copy it into the clipboard, with Ctrl + C

      • Select the file A tab

      • Paste the clipboard contents, after the last line of file A, with Ctrl + V

      => So, the file A contents are as below :

      000001Line 1
      000002Line 2
      000003Line 3
      000004KOSMOS
      000005Line 5
      000006KOSMOS
      000007Line 7
      000008KOSMOS
      000009Line 9
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      223139Line 223,139
      223140KOSMOS
      223141Line 223,141
      223142Line 223,142
      223143KOSMOS
      223144Line 223,144
      223145Line 223,145
      000004KOSMOS-- The Line 1 contents ( File B ) --
      000006KOSMOS-- The Line 2 contents ( File B ) --
      000008KOSMOS-- The Line 3 contents ( File B ) --
      223140KOSMOS-- The Line 4 contents ( File B ) --
      223143KOSMOS-- The Line 5 contents ( File B ) --
      
      • Now, sort the lines of file A, with the option Edit Line operations > Sort Lines Lexicographically Ascending

      We get the following output :

      000001Line 1
      000002Line 2
      000003Line 3
      000004KOSMOS
      000004KOSMOS-- The Line 1 contents ( File B ) --
      000005Line 5
      000006KOSMOS
      000006KOSMOS-- The Line 2 contents ( File B ) --
      000007Line 7
      000008KOSMOS
      000008KOSMOS-- The Line 3 contents ( File B ) --
      000009Line 9
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      xxxxxx.....
      223139Line 223,139
      223140KOSMOS
      223140KOSMOS-- The Line 4 contents ( File B ) --
      223141Line 223,141
      223142Line 223,142
      223143KOSMOS
      223143KOSMOS-- The Line 5 contents ( File B ) --
      223144Line 223,144
      223145Line 223,145
      

      Finally, run this last regex S/R :

      • SEARCH (?-is)^\d{6}|\h*KOSMOS\h*\R?

      • REPLACE Leave EMPTY

      Here we are ! We have the expected output, below :

      Line 1
      Line 2
      Line 3
      -- The Line 1 contents ( File B ) --
      Line 5
      -- The Line 2 contents ( File B ) --
      Line 7
      -- The Line 3 contents ( File B ) --
      Line 9
      .....
      .....
      .....
      .....
      Line 223,139
      -- The Line 4 contents ( File B ) --
      Line 223,141
      Line 223,142
      -- The Line 5 contents ( File B ) --
      Line 223,144
      Line 223,145
      

      If OK, I’ll explain the regexes syntax, next time !

      See you later,

      Best Regards,

      guy038

      Kosmos HuynhK 1 Reply Last reply Reply Quote 3
      • HienTwiH
        HienTwi
        last edited by

        Hi @guy038 and all,

        Definitely, it works perfectly with @guy038 smart solution. Many many many thanks for your solution which helps me a lots to save my time. It would be really nice if you can explain the regexes syntax, when you have free time!

        In addition, I want to split file A into 895 files based on “KOSMOS”. Could you please give me a further favor? For instances,

        file 1: From the very beginning of file A to the first KOSMOS, but not include it.
        file 2: From the 1st KOSMOS to the 2nd KOSMOS (not include the 2nd)
        file 3 ,… file 895 are similar file 2. The last KOSMOS (895th) I will be excluded.

        Bests,
        Kosmos

        1 Reply Last reply Reply Quote -1
        • HienTwiH
          HienTwi @astrosofista
          last edited by

          @astrosofista many thanks for your comments. The problem is solved with @guy038 solution.

          astrosofistaA 1 Reply Last reply Reply Quote 1
          • astrosofistaA
            astrosofista @HienTwi
            last edited by

            @HienTwi

            Good to know. Thank you for getting back to me.

            Best Regards.

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by

              Hello, @hientwi, @astrosofista and All,

              I’m quite confused, because I don’t see, exactly, the connexion between your previous goal and your new one ?

              Indeed, once your file A has been modified with our previous process, it does not contain any KOSMOS line which have all been replaced with a specific line from file B. So, it would be more difficult to determine each section which would have to be saved in the 895 files !

              On the other hand, If you decide to split the initial contents of file A into 895 files, first, then you’ll have to replace the first KOSMOS line of each file by the appropriate line of file B which seems to be more difficult than with my previous method !

              Please, could you enlighten us ?

              Best Regards,

              guy038

              HienTwiH 1 Reply Last reply Reply Quote 0
              • HienTwiH
                HienTwi @guy038
                last edited by

                Hi @guy038 and all,

                Sorry that I made you and others confused. I have another purpose which is totally different from my previous question. It means that I have two copies of file A. The one I wanted to split into multiple files based on “KOSMOS”. The other is used for my previous question. They are totally different questions.

                Best regards,
                Kosmos

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @hientwi, @astrosofista and All,

                  Sorry to be late ! So OK : these are two tasks absolutely different !

                  Well, as you would like to manage file’s creation, regexes are not a nice tool for such a task. Personally, I would use the Gawk application. So, if you do not have this program, yet :

                  • Create a new folder

                  • Download the gawk-5.0.1-w32-bin-zip archive from    https://sourceforge.net/projects/ezwinports/files/

                  • Double-click on the gawk-5.0.1-w32-bin-zip archive

                  • Double-click on the bin folder

                  • Extract only the 5 files gawk.exe, libgmp-10.dll, libmpfr-4.dll, libncurses5.dll and libreadline6.dll in the new folder

                  • Copy your file A in that folder, which will be renamed as File_A.txt

                  • With N++, just add a line KOSMOS, at the very beginning of File_A.txt

                  • Open a DOS cmd window

                  • Type in and run the following command :

                    • gawk "BEGIN {n=0} $0!=\"KOSMOS\" {print > \"File_\"n\".txt\"} $0==\"KOSMOS\" {n++}" File_A.txt
                  • Wait a few moments … …

                  Et voilà ! You should see, in this new folder, 895 files from File_1.txt to File_895.txt ;-))


                  An other possibility would be :

                  • With N++, just add a line KOSMOS, at the very beginning of File_A.txt

                  • Change, in your File_A.txt, each KOSMOS line into a pure empty line, with the regex :

                    • SEARCH (?-i)^KOSMOS(?=\R)

                    • REPLACE Leave EMPTY

                  • Then, in your DOS window, you would run the following command :

                    • gawk "BEGIN {n=0} NF {print > \"File_\"n\".txt\"} !NF {n++}" File_A.txt

                  That’s all ! Powerful, isn’t ?

                  Remark : I suppose that your file did not contain, initially, any true empty line !! ( may be searched with the regex ^\R )


                  For more information, you can download the latest PDF manual ( gawk v5.0 ) from    https://www.gnu.org/software/gawk/manual/

                  Best Regards

                  guy038

                  P.S. :

                  In order to select each zone, beginning with a KOSMOS line, till the next KOSMOS line, excluded, of your File_A.txt, simply use the regex :

                  SEARCH (?-i)(KOSMOS)?(?s).+?(?=^KOSMOS\R|\z)

                  HienTwiH Kosmos HuynhK 3 Replies Last reply Reply Quote 3
                  • HienTwiH
                    HienTwi @guy038
                    last edited by

                    Dear @guy038 and all,

                    I am so sorry that I responded too late. It seems that everything can be soIved with you. Many thanks in advacne and I will let you know later on.

                    Stay healthy and best regards,
                    Kosmos

                    1 Reply Last reply Reply Quote 1
                    • Kosmos HuynhK
                      Kosmos Huynh @guy038
                      last edited by

                      Dear @guy038, dear all

                      Today, I have tried your first solution (File_B.txt which contains KOSMOS) and I got the error as in the following:
                      792ca86d-4ebc-4a8f-a63a-5d400ced3af3-image.png

                      It is the same with your second solution with File_A.txt with blank line) as well.
                      c09891c8-7465-4129-b716-d5e991427523-image.png

                      Could you please kindly give me a favor?

                      Many thanks in advance!
                      Bests,
                      Kosmos

                      1 Reply Last reply Reply Quote 0
                      • Kosmos HuynhK
                        Kosmos Huynh @guy038
                        last edited by

                        Dear @guy038 ,

                        I got the solution by correct quotations as the followings:

                        gawk ‘BEGIN {n=0} NF {print > “File_“n”.txt”} !NF {n++}’ File_A.txt

                        Best regards,
                        Kosmos.

                        1 Reply Last reply Reply Quote 1
                        • Kosmos HuynhK
                          Kosmos Huynh @guy038
                          last edited by

                          This post is deleted!
                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors