Regex pattern needed
-
Hello,
I am looking for a way to replace all lines of the form:
https://www.server.example/path/something%20with%20spaces
with:
<a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
Using a single search/replace operation.Currently I do it with one operation that transforms the link to an
<h ref...>
format, and another one to replace a%20
with a space after the closing angle bracket, which has to be repeated several times until all the%20
instances are replaced.Please assist.
-
I would approach it as a two-step process.
- convert
https://www.server.example/path/something%20with%20spaces
to<a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a>
– because that’s a pretty easy regex - convert the
>something%20with%20spaces</a>
to>something with spaces</a>
I would do this because I assume that some of your URLs might have one %20, some might have two %20, and some might have more (or none). And coding a regex for all those edge cases is fragile. OTOH, if I can just search for a URL and break it into two pieces, that’s easy.
https://www.server.example/path/something%20with%20spaces https://www.different.example/path/one%20space https://www.third.example/path/spaceless
- FIND =
(?-s)^(https?://\S*/)([^"\s/]*)$
REPLACE =<a href="$1$2">$2</a>
MODE = Regular expression
<a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a> <a href="https://www.different.example/path/one%20space">one%20space</a> <a href="https://www.third.example/path/spaceless">spaceless</a>
2 . For this one, I would use @guy038’s generic “change data, but only between start and end markers” regex from this post
* Generic =(?-i:
BSR|(?!\A)\G)(?s:(?!
ESR).)*?\K(?-i:
FR)
* BSR =>
(for the end of the<a href="...">
)
* ESR =</a>
* FR =%20
* RR =\x20
(or a literal space
* => FIND =(?-i:>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
REPLACE =\x20
(or a literal space)Unfortunately, when I did that, my test data became
<a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a> <a href="https://www.different.example/path/one space">one space</a> <a href="https://www.third.example/path/spaceless">spaceless</a>
… and you can see that it replaced a %20 that was inside the href portion… I think because used such a small BSR expression. Unfortunately, my attempt at fixing it with BSR =
<a[^\s>]*>
, to be more specific, said it couldn’t find it at all. And unfortunately, I have to focus on my day job a bit more today, so I cannot continue debugging. But this is the path I’d follow.Maybe @guy038 will have time to tell us what I did wrong, or come up with a better BSR to keep the find-region out of the href value. Or maybe I will find some time this evening.
- convert
-
@PeterJones said in Regex pattern needed:
Or maybe I will find some time this evening.
Well, it was the next day, but…
My mistake in yesterday’s modified BSR =
<a[^\s>]*>
was including\s
in the complement character class, which meant it had to be<a...>
without any spaces, which obviously cannot match<a href="...">
. Once I realized that, it was easy to fix.- BSR =
<a[^>]*>
- ESR =
</a>
- FR =
%20
- RR =
\x20
(or literal space) - FIND =
(?-i:<a[^>]*>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
- REPLACE =
\x20
- Final Transformation of my previous data:
<a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a> <a href="https://www.different.example/path/one%20space">one space</a> <a href="https://www.third.example/path/spaceless">spaceless</a>
- BSR =