search / replace /delete parts of URL on several hundred pages
-
Fellow Notepad++ Users,
Could you please help me the the following search-and-replace problem I am having?
I have hundreds of textpages that contains about a dozen of URLs each that I need to change.
Here is the data I currently have (“before” data):
"https://url_old/NAME/url_options#JumpTo"
Here is how I would like that data to look (“after” data):
"https://url_new/NAME#JumpTo"
It is easy, to open a page, search an replace it per document, however, I did not find a way to do it for several hundred documents at once.
So, in document 1all links should show
/url_new/NAME1#JumpTo, page two should show
/url_new/NAME2#jumpTo, and so on. NAME should always stay the same per document. Only what is before should be changed, and some of what is after should be deleted.NAME could be from 3 to 9 characters.
As I understood, this could be done with regular expressions and I experimented a bit but did not understand the logic behind so far. I’m really new to regex …
For two weeks I tried with macro recorder to open a single page, replace the url with infos from an excel spreadsheet, but it takes ages and the makro utility is not bullet proof and tends to mess up the document.
Then I found Notepad++ that was very helpful in searching and replacing wrong translations and now I got stuck with the url-link. I tried with dots and could find all instances, but no way, to keep the NAME on every page.
Is there any hint or help you can provide?
Thanks for your help.
Regards, from Switzerland
-
@HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:
I have hundreds of textpages that contains about a dozen of URLs each that I need to change.
Here is the data I currently have (“before” data):
"https://url_old/NAME/url_options#JumpTo"
Here is how I would like that data to look (“after” data):
"https://url_new/NAME#JumpTo"
You can do this with the “Find in Files” function (which also does replace).
First thing: Be aware that you can’t undo a replace done on the Find in Files tab. You’ll want to be certain that you work on a copy of your files, not the originals, in case something goes wrong… and with regular expressions, there is always something that can go wrong.
Collect the copies of all your files in a folder (it’s OK if there are sub-folders), then in Notepad++, choose Search | Find in Files… from the main menu. In the dialog, enter:
Find what:
https://url_old/([\w\-+]*)/[\w\-+?]*#
Replace with:https://url_new/$1#
Filters:*.*
(if you only want to scan some files, use a pattern, like*.htm
)
Directory (use … to select your folder)
Search Mode: Regular expressionThen click Replace in Files.
There could be various reasons the above expressions might not be exactly what you need. I’m assuming the NAME part includes only letters, numbers, underscores, hyphens and plus signs, and the url_options includes only those characters and possibly question marks. If there are other possible characters, you’ll need to include them in the brackets.
If url_old or url_new includes characters that have special meanings in regular expressions (offhand, the only one likely I can think of is the plus sign), you’ll need to escape them.
If #JumpTo can be omitted, the above doesn’t account for that.
-
@Coises said in search / replace /delete parts of URL on several hundred pages:
Thank you @Coises for the fast answer!
With this example I begin to understand, how it should work. Everything in () at search can be referenced as $1 in replace. Between [] I can define, what’s possible and allowed there.
So, I tried this RegEx but unfortunately the search was not successful, I get 0 hits. Therefore I try to specify my link again, maybe you see, where there’s an issue:
Here is the data I currently have (“before” data):
https://oldsite.ch/en/one/unilfp1/opt/expo?print#note
Here is how I would like that data to look (“after” data):
https://newsite.ch/de/eins/unilfp1#note
I tried also the following RegEx (to avoid ://), but same result:
oldsite.ch/en/one/([\w\-+]*)/[\w\-+?]*#
I tried also to find ‘easier’ text, but obviously I do something incorrect.
Can you see, where I went wrong? Syntax? Logic?
-
@HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:
Here is the data I currently have (“before” data):
https://oldsite.ch/en/one/unilfp1/opt/expo?print#note
Here is how I would like that data to look (“after” data):
https://newsite.ch/de/eins/unilfp1#note
[…]
Can you see, where I went wrong? Syntax? Logic?
The expression I wrote assumed there would be just one directory level in the part you called url_options; so the expression for that part,
[\w\-+?]*
, doesn’t allow for a forward slash character. Change that part to[\w\-+?/]*
and I think it will work. -
Thanks, that was it!
So, / must be explicit in the [] to be able to search correctly.
What about other special characters? I tried another one and like to delete the following iframe:
<iframe src="./Subdir_files_ä/unilfp1.html"
To find it, I used the following RegExes in Search:
<iframe src=“./[\w-+/].html”
<iframe src=“./[\w-+/_].html”Could there be a problem with an Umlaut in the subdir name?
Found it, I need the *
<iframe src=“./[\w-+/_]*.html”:-)
-
This post is deleted! -
Looks like I got the idea behind it …
Another question:
when the search contains regular ( ), I need to use ^( right?<iframe src=".[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte (seit 2020)"></iframe>
my RegEx for that doesn’t work:
<iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte ^(seit 2020^)"></iframe>
With a shorter search I find the passage, so, there is an issue with the ()
<iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte
How can I handle ( and ) in a search?
edit: reading manuals helps :-)
<iframe src="./[\w\-+/_]*.html" width="640" height="640" frameborder="0" title="Mittelwerte \(seit 2020\)"></iframe>
—
moderator added more code markdown around text; thanks for putting it around some of your blocks, but please don’t forget to use the
</>
button to mark example text as “code” around all example text, so that characters don’t get changed by the forum -
@HaPe-Krummen said in search / replace /delete parts of URL on several hundred pages:
when the search contains regular ( ), I need to use ^( right?
No. The escape character in Notepad++'s regular expressions is
\
not^
→ so a literal parenthesis is matched by\(
.update: Sorry, I didn’t see your correctly-escaped parentheses in the final “edit: reading manuals helps”, because you didn’t use markdown except on the first block in that post. I used moderator-power to fix your post so it’s readable for others.
----
Useful References
-
Just wanted to say THANK YOU for your help!
Beeing able to use regular expressions changes a lot and makes searching / replacing and deleting of texts so much easier.
Have a good day!