• Login
Community
  • Login

search / replace /delete parts of URL on several hundred pages

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
9 Posts 3 Posters 474 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H
    HaPe Krummen
    last edited by HaPe Krummen Sep 8, 2024, 7:17 PM Sep 8, 2024, 7:16 PM

    Fellow Notepad++ Users,

    Could you please help me the the following search-and-replace problem I am having?

    I have hundreds of textpages that contains about a dozen of URLs each that I need to change.

    Here is the data I currently have (“before” data):

    "https://url_old/NAME/url_options#JumpTo"
    

    Here is how I would like that data to look (“after” data):

    "https://url_new/NAME#JumpTo"
    

    It is easy, to open a page, search an replace it per document, however, I did not find a way to do it for several hundred documents at once.

    So, in document 1all links should show
    /url_new/NAME1#JumpTo, page two should show
    /url_new/NAME2#jumpTo, and so on. NAME should always stay the same per document. Only what is before should be changed, and some of what is after should be deleted.

    NAME could be from 3 to 9 characters.

    As I understood, this could be done with regular expressions and I experimented a bit but did not understand the logic behind so far. I’m really new to regex …

    For two weeks I tried with macro recorder to open a single page, replace the url with infos from an excel spreadsheet, but it takes ages and the makro utility is not bullet proof and tends to mess up the document.

    Then I found Notepad++ that was very helpful in searching and replacing wrong translations and now I got stuck with the url-link. I tried with dots and could find all instances, but no way, to keep the NAME on every page.

    Is there any hint or help you can provide?

    Thanks for your help.

    Regards, from Switzerland

    C 1 Reply Last reply Sep 8, 2024, 8:32 PM Reply Quote 2
    • C
      Coises @HaPe Krummen
      last edited by Coises Sep 8, 2024, 8:34 PM Sep 8, 2024, 8:32 PM

      @HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:

      I have hundreds of textpages that contains about a dozen of URLs each that I need to change.

      Here is the data I currently have (“before” data):

      "https://url_old/NAME/url_options#JumpTo"
      

      Here is how I would like that data to look (“after” data):

      "https://url_new/NAME#JumpTo"
      

      You can do this with the “Find in Files” function (which also does replace).

      First thing: Be aware that you can’t undo a replace done on the Find in Files tab. You’ll want to be certain that you work on a copy of your files, not the originals, in case something goes wrong… and with regular expressions, there is always something that can go wrong.

      Collect the copies of all your files in a folder (it’s OK if there are sub-folders), then in Notepad++, choose Search | Find in Files… from the main menu. In the dialog, enter:

      Find what: https://url_old/([\w\-+]*)/[\w\-+?]*#
      Replace with: https://url_new/$1#
      Filters: *.* (if you only want to scan some files, use a pattern, like *.htm)
      Directory (use … to select your folder)
      Search Mode: Regular expression

      Then click Replace in Files.

      There could be various reasons the above expressions might not be exactly what you need. I’m assuming the NAME part includes only letters, numbers, underscores, hyphens and plus signs, and the url_options includes only those characters and possibly question marks. If there are other possible characters, you’ll need to include them in the brackets.

      If url_old or url_new includes characters that have special meanings in regular expressions (offhand, the only one likely I can think of is the plus sign), you’ll need to escape them.

      If #JumpTo can be omitted, the above doesn’t account for that.

      H 1 Reply Last reply Sep 9, 2024, 4:06 AM Reply Quote 3
      • H
        HaPe Krummen @Coises
        last edited by HaPe Krummen Sep 9, 2024, 4:07 AM Sep 9, 2024, 4:06 AM

        @Coises said in search / replace /delete parts of URL on several hundred pages:

        Thank you @Coises for the fast answer!

        With this example I begin to understand, how it should work. Everything in () at search can be referenced as $1 in replace. Between [] I can define, what’s possible and allowed there.

        So, I tried this RegEx but unfortunately the search was not successful, I get 0 hits. Therefore I try to specify my link again, maybe you see, where there’s an issue:

        Here is the data I currently have (“before” data):

        https://oldsite.ch/en/one/unilfp1/opt/expo?print#note
        

        Here is how I would like that data to look (“after” data):

        https://newsite.ch/de/eins/unilfp1#note
        

        I tried also the following RegEx (to avoid ://), but same result:

        oldsite.ch/en/one/([\w\-+]*)/[\w\-+?]*#
        

        I tried also to find ‘easier’ text, but obviously I do something incorrect.

        Can you see, where I went wrong? Syntax? Logic?

        C 1 Reply Last reply Sep 9, 2024, 4:35 AM Reply Quote 0
        • C
          Coises @HaPe Krummen
          last edited by Sep 9, 2024, 4:35 AM

          @HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:

          Here is the data I currently have (“before” data):

          https://oldsite.ch/en/one/unilfp1/opt/expo?print#note
          

          Here is how I would like that data to look (“after” data):

          https://newsite.ch/de/eins/unilfp1#note
          

          […]

          Can you see, where I went wrong? Syntax? Logic?

          The expression I wrote assumed there would be just one directory level in the part you called url_options; so the expression for that part, [\w\-+?]*, doesn’t allow for a forward slash character. Change that part to [\w\-+?/]* and I think it will work.

          H 2 Replies Last reply Sep 9, 2024, 5:55 AM Reply Quote 1
          • H
            HaPe Krummen @Coises
            last edited by HaPe Krummen Sep 9, 2024, 6:10 AM Sep 9, 2024, 5:55 AM

            @Coises

            Thanks, that was it!

            So, / must be explicit in the [] to be able to search correctly.

            What about other special characters? I tried another one and like to delete the following iframe:

            <iframe src="./Subdir_files_ä/unilfp1.html" 
            

            To find it, I used the following RegExes in Search:

            <iframe src=“./[\w-+/].html”
            <iframe src=“./[\w-+/_].html”

            Could there be a problem with an Umlaut in the subdir name?

            Found it, I need the *
            <iframe src=“./[\w-+/_]*.html”

            :-)

            H 1 Reply Last reply Sep 9, 2024, 5:57 AM Reply Quote 0
            • H
              HaPe Krummen @HaPe Krummen
              last edited by HaPe Krummen Sep 9, 2024, 5:57 AM Sep 9, 2024, 5:57 AM

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • H
                HaPe Krummen @Coises
                last edited by PeterJones Sep 9, 2024, 1:35 PM Sep 9, 2024, 6:17 AM

                @Coises

                Looks like I got the idea behind it …

                Another question:
                when the search contains regular ( ), I need to use ^( right?

                <iframe src=".[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte (seit 2020)"></iframe>
                

                my RegEx for that doesn’t work:

                <iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte ^(seit 2020^)"></iframe>
                

                With a shorter search I find the passage, so, there is an issue with the ()

                <iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte
                

                How can I handle ( and ) in a search?

                edit: reading manuals helps :-)

                <iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte \(seit 2020\)"></iframe>
                

                —

                moderator added more code markdown around text; thanks for putting it around some of your blocks, but please don’t forget to use the </> button to mark example text as “code” around all example text, so that characters don’t get changed by the forum

                P 1 Reply Last reply Sep 9, 2024, 12:48 PM Reply Quote 2
                • P
                  PeterJones @HaPe Krummen
                  last edited by PeterJones Sep 9, 2024, 1:36 PM Sep 9, 2024, 12:48 PM

                  @HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:

                  when the search contains regular ( ), I need to use ^( right?

                  No. The escape character in Notepad++'s regular expressions is \ not ^ → so a literal parenthesis is matched by \( .

                  update: Sorry, I didn’t see your correctly-escaped parentheses in the final “edit: reading manuals helps”, because you didn’t use markdown except on the first block in that post. I used moderator-power to fix your post so it’s readable for others.

                  ----

                  Useful References

                  • Notepad++ Online User Manual: Searching/Regex
                  • FAQ: Where to find other regular expressions (regex) documentation
                  H 1 Reply Last reply Sep 11, 2024, 8:34 AM Reply Quote 2
                  • H
                    HaPe Krummen @PeterJones
                    last edited by Sep 11, 2024, 8:34 AM

                    Just wanted to say THANK YOU for your help!

                    Beeing able to use regular expressions changes a lot and makes searching / replacing and deleting of texts so much easier.

                    Have a good day!

                    1 Reply Last reply Reply Quote 0
                    3 out of 9
                    • First post
                      3/9
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors