How to replace a particular "url" with "url" of each webpage in multiple files using Notepad++?



  • @PeterJones Yes, you got it right. I want Notepad++ to find and add the file name of each file to the end of the url mentioned in the Meta canonical tag as above

    Change <link rel"canonical" href="https://cure4incurables.in" /> to <link rel"canonical" href="https://cure4incurables.in/201_Alopecia.html" /> in that file
    Change <link rel"canonical" href="https://cure4incurables.in" /> to <linlk rel"canonical" href="https://cure4incurables.in/Anal-fissure.html" /> in that file Change <link rel"canonical" href="https://cure4incurables.in" /> to <linlk rel"canonical" href="https://cure4incurables.in/adhd.htm" /> in that file and so on
    


  • @PeterJones 201_Alopecia.html, Anal-fissure.html, adhd.htm, asthma.htm etc. are the file names



  • @PeterJones If I have to open each file, I might as well add the file name to that meta tag in each file manually - tell me something easier. Thanks for making time to answer



  • @Ramanand-Jhingade said in How to replace a particular "url" with "url" of each webpage in multiple files using Notepad++?:

    If I have to open each file, I might as well add the file name to that meta tag in each file manually - tell me something easier.

    If you read my post in that link then you would know that in my solution each file DID need to be opened in Notepad++. However that is not the only method of adding the filename to a text (or html) file. You can use the DOS command in a similar fashion to:
    for %1 in (*.txt) do echo %1 >> %1
    In this case you would have the current directory as the one containing the html files. The (*.html) would be all the html files you want to edit. If some of those files are not to be included you would need to change that filter to exclude them, such as (t*.html) would only work on html files that have a name starting with t.

    It adds the filename directly behind the last character in the file, so if there isn’t a blank line as the last line you will find the filename adds itself to the end of some characters and might be difficult to find with a regex at a later stage.

    No doubt you can see it isn’t as easy as you think. What ever method you use you will have some steps involved to achieve it. If you really only have a small number of files to edit, then you might be better off doing it manually.

    Generally the use of macros and regex are for when a task is performed over and over again and/or for a large number of files. It does sound like your task is neither.

    Terry



  • @Terry-R I have about 300 files, so I would certainly want to try what you are saying. If I open Command Prompt from the Windows “Start” menu and select/change to the directory containing these html files (using the command “cd folder name”), will the command “for %1 in (*.txt) do echo %1 >> %1” get the file name of each file and add it to the “last character in the file” as you say?



  • @Ramanand-Jhingade said in How to replace a particular "url" with "url" of each webpage in multiple files using Notepad++?:

    will the command “for %1 in (*.txt) do echo %1 >> %1” get the file name of each file and add it to the “last character in the file” as you say?

    Well in a test it did it for me. As I say if the last line of your file contains text then doing this will add the filename directly behind the text. Similar to:

    https://community.notepad-plus-plus.org/topic/10470
    https://community.notepad-plus-plus.org/topic/10471
    https://community.notepad-plus-plus.org/topic/104721.txt
    

    Note the last line has the number followed by the filename which is 1.txt. So a possibility is to do the echo command twice. The first time use it to add 2 special characters such as @@ first, then run it again with the %1 so the filename is added. in this situation you will get the @@ directly against the last characters within the file, then a space and/or line feed (I think), then the filename.

    Once that’s completed then a regex would need to be constructed to find the line to add the filename, then look ahead, grab the filename and copy it to the current position. Then the regex would (as in my solution) perform a last step of erasing the additions (@@ and filename). That regex you want is NOT the one in my solution, but something similar.

    So in your example above, is the line that needs changing (starting with <link rel"canonical" the only one like that. because the regex needs to be able to correctly identify it. If more than one line similar, how do you identify the exact one amongst the other similar lines?

    Terry



  • @Ramanand-Jhingade said in How to replace a particular "url" with "url" of each webpage in multiple files using Notepad++?:

    I have about 300 files, so I would certainly want to try what you are saying.

    Quite honestly, if you’ve got that many files, and want to just add the filename into each of the files based on the same literal text, I don’t know why you didn’t use the powershell solution you already found in the https://www.codeproject.com/Questions/1258369/Replace-string-value-in-a-file-with-filename link you already shared. This isn’t a powershell forum, so you can find somewhere else if you want more help with doing it that way.

    Based on what you’ve said, use the cmd script that Terry handed you (even though this isn’t a cmd forum) – including the @@ echo before the filename echo (as long as @@ isn’t anywhere in your files).

    for %1 in (*.txt) do echo @@ >> %1
    for %1 in (*.txt) do echo %1 >> %1
    

    Then use Notepad++'s Find in Files to search for (?s)(<link rel"canonical" href=".*?)(".*)@@(.*) and replace with $1/$3$2 (this works by putting the <link rel"canonical" href=" in group1, the bulk of the file in group2, and the filename in group3). However, this won’t work if your file is too big, so that group3 takes up too much memory – a 1MB or 16MB file worked for me; wow, even 100MB worked – though it took a long time to complete.



  • Hello, @ramanand-jhingade, @alan-kilborn, @terry-r, @peterjones and All,

    @ramanand-jhingade, I’ve found out a solution to achieve what you want ! My method use the Windows version of the Unix gawk utility program.

    It allows you to add the name of the current file before the string " /> ending any line which begins with the string <link rel"canonical" href=" in current file

    Here is the road map :

    • Open a DOS command prompt

    • cd /d <Asolute_Path to Folder containing ALL your *.HTM? files>

    • md resul ( create a sub-folder resul which will contain the same files as original ones, after modifications )

    • Double-click to the link below, to download the gawk-4.1.0-bin.zip archive

    https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/gnu-on-windows/gawk-4.1.0-bin.zip

    • Move this archive to the folder containing all your *.HTM? files

    • Double-click on the gawk-4.1.0-bin.zip archive

    • Extract the only program gawk.exe from the archive, into the folder containing all your *.HTM? files

    • Paste the long line, below, in your DOS command prompt and valid with the Enter key :

    for %F in (*.htm?) do @del "resul\%F" 2>nul & @gawk.exe -F"\x22 />" " /^<link rel\"canonical\" href=\x22/ {$1 = $1 \"/\" FILENAME FS} ; {print} " "%F" >> "resul\%F"

    • After complete execution, double-click on the resul folder

    => Within the resul folder, you should get the list of all original *.htm? files with the current filename added at the end of any line, beginning with <link rel\"canonical\" href=", right before the string " />

    For instance, if current filename is Test.htm, any line, beginning with the string <link rel"canonical" href=", in current file, will be changed into <link rel"canonical" href="•••••/Test.htm" />, where the part ••••• represents the initial link of that line


    Notes :

    • Your original files are not changed at all !

    • You may re-run the line for %F in •••••••••• "resul\%F", without any problem, as the process deletes any current file, in resul folder, before rewriting it !

    Best Regards,

    guy038



  • @guy038 I did whatever you wrote above but it copied all the files to the resul folder without any addition of the names of the files. I used Solution 2 mentioned at www.codeproject.com/Answers/5301640/How-to-find-and-add-the-file-name-of-each-file-in#answer2 and it added the file names but without the “.htm” or “.html” after the names of the files. I have read what you have posted in other threads, so I think you can help to add the filenames with the “.htm” or “.html” after the names of the files. You can give me a method which usesCommand Prompt or PowerShell. Thanks for your time and help!



  • @Ramanand-Jhingade OK, in the Solution2 mentioned in the link above, I changed BaseName to FullName. I then used Notepad++ to remove the full path (using “find” and “replace”) but keep the filenames and their extensions. Thanks for all the help guys!



  • Hello, @ramanand-jhingade, @alan-kilborn, @terry-r, @peterjones and All,

    @ramanand-jhingade, sorry that my method did not work :-( I don’t understand !

    It’s important to note that my method would work ONLY IF the beginning of the “canonical” line is exactly :

    <link rel"canonical" href="
    ^
    |
    Beginning of line
    

    If this line could be, for instance, any of these ones, below :

    <link rel "canonical" href="
    <link rel"canonical"href="
    <link rel "canonical"href="
    
      <link rel "canonical" href="
      <link rel"canonical"href="
      <link rel "canonical"href="
    

    My method would not find the canonical line ! So, just tell me about it !


    I even made tests with files containing a space chars in their names like This is a test.html and it did create the correct line :

    <link rel"canonical" href="https://cure4incurables.in/This is a test.html" />

    Of course, in this specific case, in order to get functional links, we must change any space char with the %20 syntax, thank to the S/R, below :

    SEARCH (?-i)(?:href="|(?!\A)\G)(?:(?!").)*?\K\x20

    REPLACE %20

    Using the Regular expression search mode and ticking the Wrap around option would result, after replacement, in the functional line :

    <link rel"canonical" href="https://cure4incurables.in/This%20is%20a%20test.html" />

    Best Regards,

    guy038


Log in to reply