Copy, search and replace between 2 HTML files
-
Hi @guy038 and all,
Definitely, it works perfectly with @guy038 smart solution. Many many many thanks for your solution which helps me a lots to save my time. It would be really nice if you can explain the regexes syntax, when you have free time!
In addition, I want to split file A into 895 files based on “KOSMOS”. Could you please give me a further favor? For instances,
file 1: From the very beginning of file A to the first KOSMOS, but not include it.
file 2: From the 1st KOSMOS to the 2nd KOSMOS (not include the 2nd)
file 3 ,… file 895 are similar file 2. The last KOSMOS (895th) I will be excluded.Bests,
Kosmos -
@astrosofista many thanks for your comments. The problem is solved with @guy038 solution.
-
-
Hello, @hientwi, @astrosofista and All,
I’m quite confused, because I don’t see, exactly, the connexion between your previous goal and your new one ?
Indeed, once your file
A
has been modified with our previous process, it does not contain anyKOSMOS
line which have all been replaced with a specific line from fileB
. So, it would be more difficult to determine each section which would have to be saved in the895
files !On the other hand, If you decide to split the initial contents of file
A
into895
files, first, then you’ll have to replace the firstKOSMOS
line of each file by the appropriate line of fileB
which seems to be more difficult than with my previous method !Please, could you enlighten us ?
Best Regards,
guy038
-
Hi @guy038 and all,
Sorry that I made you and others confused. I have another purpose which is totally different from my previous question. It means that I have two copies of file A. The one I wanted to split into multiple files based on “KOSMOS”. The other is used for my previous question. They are totally different questions.
Best regards,
Kosmos -
Hello, @hientwi, @astrosofista and All,
Sorry to be late ! So OK : these are two tasks absolutely different !
Well, as you would like to manage file’s creation, regexes are not a nice tool for such a task. Personally, I would use the
Gawk
application. So, if you do not have this program, yet :-
Create a new folder
-
Download the
gawk-5.0.1-w32-bin-zip
archive from https://sourceforge.net/projects/ezwinports/files/ -
Double-click on the
gawk-5.0.1-w32-bin-zip
archive -
Double-click on the
bin
folder -
Extract only the
5
filesgawk.exe
,libgmp-10.dll
,libmpfr-4.dll
,libncurses5.dll
andlibreadline6.dll
in the new folder -
Copy your file
A
in that folder, which will be renamed asFile_A.txt
-
With N++, just add a line
KOSMOS
, at the very beginning ofFile_A.txt
-
Open a DOS
cmd
window -
Type in and run the following command :
gawk "BEGIN {n=0} $0!=\"KOSMOS\" {print > \"File_\"n\".txt\"} $0==\"KOSMOS\" {n++}" File_A.txt
-
Wait a few moments … …
Et voilà ! You should see, in this new folder,
895
files fromFile_1.txt
toFile_895.txt
;-))
An other possibility would be :
-
With N++, just add a line
KOSMOS
, at the very beginning ofFile_A.txt
-
Change, in your
File_A.txt
, eachKOSMOS
line into a pure empty line, with the regex :-
SEARCH
(?-i)^KOSMOS(?=\R)
-
REPLACE
Leave EMPTY
-
-
Then, in your DOS window, you would run the following command :
gawk "BEGIN {n=0} NF {print > \"File_\"n\".txt\"} !NF {n++}" File_A.txt
That’s all ! Powerful, isn’t ?
Remark : I suppose that your file did not contain, initially, any true empty line !! ( may be searched with the regex
^\R
)
For more information, you can download the latest
PDF
manual ( gawkv5.0
) from https://www.gnu.org/software/gawk/manual/Best Regards
guy038
P.S. :
In order to select each zone, beginning with a
KOSMOS
line, till the nextKOSMOS
line, excluded, of yourFile_A.txt
, simply use the regex :SEARCH
(?-i)(KOSMOS)?(?s).+?(?=^KOSMOS\R|\z)
-
-
Dear @guy038 and all,
I am so sorry that I responded too late. It seems that everything can be soIved with you. Many thanks in advacne and I will let you know later on.
Stay healthy and best regards,
Kosmos -
Dear @guy038, dear all
Today, I have tried your first solution (File_B.txt which contains KOSMOS) and I got the error as in the following:
It is the same with your second solution with File_A.txt with blank line) as well.
Could you please kindly give me a favor?
Many thanks in advance!
Bests,
Kosmos -
Dear @guy038 ,
I got the solution by correct quotations as the followings:
gawk ‘BEGIN {n=0} NF {print > “File_“n”.txt”} !NF {n++}’ File_A.txt
Best regards,
Kosmos. -
This post is deleted!