Community
    • Login

    How to replace a particular "url" with "url" of each webpage in multiple files using Notepad++?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    24 Posts 5 Posters 3.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @ramanand-jhingade, @alan-kilborn, @terry-r, @peterjones and All,

      @ramanand-jhingade, I’ve found out a solution to achieve what you want ! My method use the Windows version of the Unix gawk utility program.

      It allows you to add the name of the current file before the string " /> ending any line which begins with the string <link rel"canonical" href=" in current file

      Here is the road map :

      • Open a DOS command prompt

      • cd /d <Asolute_Path to Folder containing ALL your *.HTM? files>

      • md resul ( create a sub-folder resul which will contain the same files as original ones, after modifications )

      • Double-click to the link below, to download the gawk-4.1.0-bin.zip archive

      https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/gnu-on-windows/gawk-4.1.0-bin.zip

      • Move this archive to the folder containing all your *.HTM? files

      • Double-click on the gawk-4.1.0-bin.zip archive

      • Extract the only program gawk.exe from the archive, into the folder containing all your *.HTM? files

      • Paste the long line, below, in your DOS command prompt and valid with the Enter key :

      for %F in (*.htm?) do @del "resul\%F" 2>nul & @gawk.exe -F"\x22 />" " /^<link rel\"canonical\" href=\x22/ {$1 = $1 \"/\" FILENAME FS} ; {print} " "%F" >> "resul\%F"

      • After complete execution, double-click on the resul folder

      => Within the resul folder, you should get the list of all original *.htm? files with the current filename added at the end of any line, beginning with <link rel\"canonical\" href=", right before the string " />

      For instance, if current filename is Test.htm, any line, beginning with the string <link rel"canonical" href=", in current file, will be changed into <link rel"canonical" href="•••••/Test.htm" />, where the part ••••• represents the initial link of that line


      Notes :

      • Your original files are not changed at all !

      • You may re-run the line for %F in •••••••••• "resul\%F", without any problem, as the process deletes any current file, in resul folder, before rewriting it !

      Best Regards,

      guy038

      Ramanand JhingadeR 1 Reply Last reply Reply Quote 0
      • Ramanand JhingadeR
        Ramanand Jhingade @guy038
        last edited by Ramanand Jhingade

        @guy038 I did whatever you wrote above but it copied all the files to the resul folder without any addition of the names of the files. I used Solution 2 mentioned at www.codeproject.com/Answers/5301640/How-to-find-and-add-the-file-name-of-each-file-in#answer2 and it added the file names but without the “.htm” or “.html” after the names of the files. I have read what you have posted in other threads, so I think you can help to add the filenames with the “.htm” or “.html” after the names of the files. You can give me a method which usesCommand Prompt or PowerShell. Thanks for your time and help!

        Ramanand JhingadeR 1 Reply Last reply Reply Quote 0
        • Ramanand JhingadeR
          Ramanand Jhingade @Ramanand Jhingade
          last edited by Ramanand Jhingade

          @Ramanand-Jhingade OK, in the Solution2 mentioned in the link above, I changed BaseName to FullName. I then used Notepad++ to remove the full path (using “find” and “replace”) but keep the filenames and their extensions. Thanks for all the help guys!

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hello, @ramanand-jhingade, @alan-kilborn, @terry-r, @peterjones and All,

            @ramanand-jhingade, sorry that my method did not work :-( I don’t understand !

            It’s important to note that my method would work ONLY IF the beginning of the “canonical” line is exactly :

            <link rel"canonical" href="
            ^
            |
            Beginning of line
            

            If this line could be, for instance, any of these ones, below :

            <link rel "canonical" href="
            <link rel"canonical"href="
            <link rel "canonical"href="
            
              <link rel "canonical" href="
              <link rel"canonical"href="
              <link rel "canonical"href="
            

            My method would not find the canonical line ! So, just tell me about it !


            I even made tests with files containing a space chars in their names like This is a test.html and it did create the correct line :

            <link rel"canonical" href="https://cure4incurables.in/This is a test.html" />

            Of course, in this specific case, in order to get functional links, we must change any space char with the %20 syntax, thank to the S/R, below :

            SEARCH (?-i)(?:href="|(?!\A)\G)(?:(?!").)*?\K\x20

            REPLACE %20

            Using the Regular expression search mode and ticking the Wrap around option would result, after replacement, in the functional line :

            <link rel"canonical" href="https://cure4incurables.in/This%20is%20a%20test.html" />

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 0
            • dr ramaanandD dr ramaanand referenced this topic on
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors