How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too)



  • suppose I have this html code from FILE-1.html:
    
    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="ro">
    <title>YES, I love her</title>
    <head>
    </head>
    <body>
    ...
    <div id="coloana_centru">
      <div class="container mM" id="incadrare_text_mijloc_2" itemscope itemtype="http://schema.org/Product">
        <div align="justify">
    
            <!-- * * * * * START HERE * * * * * -->
    
            <p class="TATA"><em>At the mobile site I put as the header location in case the device isn't mobile? And then it executes the php code you gave stack..</em></p>
            <p class="MAMA">Simply check if the referrer is coming from within your site. If they are, they have already seen a page and have chosen where they want to go next</p>
    
            <!-- * * * * * END HERE * * * * * -->
    </div>
    </div>
    </body>
    </html>
    

    I want to move all the entire text content from <!-- * * * * * START HERE * * * * * --> to <!-- * * * * * END HERE * * * * * --> to another html file, from a different folder:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="ro">
    <title>The cars are ready</title>
    <head>
    </head>
    <body>
    ...
    <div id="bingo">
      <div class="container good-job" id="love" itemscope itemtype="http://schema.org/Product">
        <div align="center">
    
            <!-- * * * * * START HERE * * * * * -->
    
            <p class="TATA"><em> Other text 1 </em></p>
            <p class="MAMA"> Other text 2</p>
    
            <!-- * * * * * END HERE * * * * * -->
    </div>
    </div>
    </body>
    </html>
    

    So, basically, to move only the text article from one file to another file. That text (also with the class) are put between comments. The problem is that I must change 3.000 files.

    My first solution, using search and replace of any text processor, is to replace all from the

    <body to <!-- * * * * * START HERE * * * * * -->

    and all from

    <!-- * * * * * END HERE * * * * * --> to </body>

    from some file to another, so as to remain the first text content in that file, but to change everything before and after it.

    I was wondering if there is another solution, with REGEX, PHP or other program. Is it possible a different solution than mine?



  • @Robin-Cruise said in How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too):

    I was wondering if there is another solution, with REGEX, PHP or other program. Is it possible a different solution than mine?

    Yes.

    Well, not regex, because regex has no concept of “one file to another”.

    Any programming language worth its salt could implement such a solution. And some are so well liked among the Notepad++ community that people have written plugins or libraries to interface their favorite library with the contents of the Notepad++ editor tabs. So in theory, you could use PythonScript or LuaScript or the affectionately nicknamed “Perl Script” (though that one isn’t a plugin yet). My guess is that if it had just been a couple of files you wanted to manipulate, someone here might have created a PythonScript for you. But when you’re talking about 3000 files, going through the Notepad++ interface is highly inefficient, when Python or Lua or Perl or <insert favorite programming language here> could do it directly, from the command line, without using Notepad++ as a middle-man, with no problem.

    This is a Notepad++ forum, but the main part of your problem has nothing to do with Notepad++. You should definitely use Notepad++ to write your code. But the implementation is really up to you … and I have a feeling, since you didn’t immediately consider writing it yourself in your favorite programming language rather than asking here, that you don’t actually have a favorite language to write in… so you’re going to have to learn a programming language first. And this forum isn’t appropriate to teach you a programming language.



  • Hello, @robin-cruise, @peterjones and all,

    We still can simplify your problem !

    If I understand you properly, each of the 3,000 files, or so, located in a specific folder, contains one or several sections :

            <!-- * * * * * START HERE * * * * * -->
    ....
    ....
    ....
            <!-- * * * * * END HERE * * * * * -->
    

    which are useless and which should be replaced with this exact comment section, below :

            <!-- * * * * * START HERE * * * * * -->
    
            <p class="TATA"><em>At the mobile site I put as the header location in case the device isn't mobile? And then it executes the php code you gave stack..</em></p>
            <p class="MAMA">Simply check if the referrer is coming from within your site. If they are, they have already seen a page and have chosen where they want to go next</p>
    
            <!-- * * * * * END HERE * * * * * -->
    

    If this assumption is correct, here a method, which would still need the scripting Python or Lua plugin but doable from within Notepad++ !

    • First, do a backup of your 3,000 files ( Important )

    • Execute a Replace in Files operation, with the following regex S/R :

      • SEARCH (?s)^\h*\Q<!-- * * * * * START HERE * * * * * -->\E.+?\Q<!-- * * * * * END HERE * * * * * -->\E

      • REPLACE <!-- Replacement Point -->

    => This S/R will change any multi-lines section <!-- START HERE............END -->, in your 3,000 files, with a single line <!-- Replacement Point -->

    • Copy the section, below, which must be copied in all your files, in the clipboard with a simple Ctrl + V action
            <!-- * * * * * START HERE * * * * * -->
    
            <p class="TATA"><em>At the mobile site I put as the header location in case the device isn't mobile? And then it executes the php code you gave stack..</em></p>
            <p class="MAMA">Simply check if the referrer is coming from within your site. If they are, they have already seen a page and have chosen where they want to go next</p>
    
            <!-- * * * * * END HERE * * * * * -->
    
    • Now, with the help of a scripting language, the further steps are :

      • For each scanned file :

        • Bookmark all the lines <!-- Replacement Point -->

        • Perform a Search > Bookmark > Paste to (Replace) Bookmarked Lines action

    The last command should replace any bookmarked line, of each file, with the contents of the clipboard :-))

    Best Regards,

    guy038



  • good day @guy038 First, the text content is not the same, is different in every html pages. That was just an example. Yes, both have the same <!-- * * * * * START HERE * * * * * --> and <!-- * * * * * END HERE * * * * * -->

    I understand the regex, but where is the location I must use it? Because there are 2 different folders. One with the old web design, one with the new design (in witch I have to add the texts content). I must copy from a folder to another folder, from a html files to another html files.



  • in fact, I have to change the design of a site, but I have to keep thousands of articles. I can’t copy every text article in the new template, page by page. Must using something quicker.



  • @Robin-Cruise said in How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too):

    I can’t copy every text article in the new template, page by page. Must using something quicker.

    This problem is so complex I think it would need multiple steps, each building on the previous one and possibly using a different process. Of course as @PeterJones states using a programming language would work, but you’d need to learn that and I suppose time is of the essence.

    My idea would likely build on some abilities you already know (or at least know of) and should be able to accomplish with little effort.

    The steps would be:

    1. Copy both folders elsewhere as it would be very important to do all this in offline copies and then test the results and proof read some of the files to confirm the results.
    2. For the first file which provides the text to be inserted into the second file use a regex to remove ALL but the lines that will be copied. This could be accomplished with a regex using the Find in Files function.
    3. Add the content of the first file which remains to the end of the second file. It may require an additional line to delimit the addition, say a line of hash’s (######…) in between the current file content and the additional new lines from file 1, this might help in the next step.
    4. Again using the Find in Files function, changing out the old data with the additional lines at the bottom of each file from step #3.
    5. Proof the new files to confirm data changed as required.

    Notice there is a big gap in all this. That is how do you determine the file name of the donor file and it’s replacement file in the new structure. You haven’t mentioned that at any point and I think that is going to be the biggest hurdle, unless some naming convention was used to make it easier to pair the files. If a naming convention was used such as “design1file0001.html” and it’s pair in the new structure such as “file0001structure2.html”. If something like this was used then again a sort and regex process within Notepad++ might get the 2 files paired relatively easily and then enable you to create a “bat” (MS-DOS batch) file which would do step #3.

    Terry



  • @Terry-R yes, this was my solution from the beginning. Delete everything between <body> and <!-- * * * * * START HERE * * * * * --> and delete everything after <!-- * * * * * END HERE * * * * * --> and </body> So to keep the text content (and the meta tags of the beginning of html)

    And replace those too section (deleted) with the format style (html code) of the new web template. And I can use TextCrawler software for larger codes in order to make Search and Replace.

    Yes, I wish it was a safer way than that. Regex would have been much better, if if it could be used, even in more steps.



  • @Robin-Cruise said in How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too):

    Regex would have been much better, if if it could be used, even in more steps.

    I think you are still missing the point, how do you pair the files. Regex has NO concept of files. It is tasked with editing or finding characters within other characters. In this case it is either presented with a tab (within NPP) or a text from a file if using the Find in Files function. it doesn’t know where the text came from, nor where it goes after the regex is finished. it is just a step in the process, which NPP handles from start to finish.

    Your biggest job is as I say, pairing the files together, the other steps are fairly simple to create.

    Terry



  • actually, @guy038 guy has a great idea with Bookmark, except notepad++ has not yet the does not yet have the possibility of multiple bookmarks and the possibility of insert them into a specific folder. Because the name of the html files are identicaly, only the html body is different , and must put the text where I indicated.

    I could use copy bookmark in one folder files, and paste it into another folder.



  • @Robin-Cruise said in How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too):

    I could use copy bookmark in one folder files, and paste it into another folder.

    I think at this point your “original idea” backed up by some input from this forum (my idea which seems to correlate to yours and upvotes of it) should tell you that it is the RIGHT solution. Don’t going looking for things you know don’t exist and hoping.

    The idea presented is right, easy to understand and should be “safe” as you put it. As you have identical filenames in both structures my main concern has now evaporated (how to pair the files).

    So for getting a “bat” file created you need the filenames in 2 lists. You would a “DIR” command at the command prompt with some parameters which leave the list in a bare state (hint /B) and possibly sorted (hint again /ON).

    Terry



  • maybe the new future of Multiple Bookmark should memorize the file names and the selected content in a temporary txt file before making a replacement in other files. And it can replace the content in order, from A-Z names of files. Also, may skip the content of the file that does not have a pair name. Something like that.



  • @Robin-Cruise said in How to copy html text content from a section of several pages to the section of other several different pages (with the tags of the text too):

    maybe the new future of Multiple Bookmark should memorize the file names and

    I will say this once only. Forget what doesn’t exist. If you are serious about fixing your immediate problem, continuing to hope is pointless. You have the answer from the forum. We can help with portions such as helping create the regex to remove text not needed, and insert other text. We can also help with creating the BAT file using regex.

    But if you continue down this road of “hoping” you will get nowhere and others here will also likely dismiss your requests as you don’t seem to be overly concerned about solving it either.

    Terry



  • Layout a website in pure html with over 3000 pages is complete nonsense!
    Сms for the site)))



  • THIS IS THE ANSWER !

    A GREAT ANSWER for this problem, but using PowerShell in Windows. Very simple !!

    https://superuser.com/questions/1620195/parsing-how-to-copy-html-text-content-from-a-section-of-several-pages-to-the-se


Log in to reply