What regex to create a search limit boundary?
-
I need some kind of ‘stop search here / start again’ expression, as in the problem here:
The examples here are from downloaded metadata in html, from the naturalis.nl biodiversity database. Species names are split up by markup code, which I need to strip out.
Here’s a simple example:
<span class=“scientific”>Piper</span> L.So, stripping the markup out is a simple search and replace, resulting in the scientific genus Piper L.
Find: <span class=“scientific”>(.*)</span>
Replace: $1
Result: Piper L.But that strategy does not work on this more common binomial name example:
Desired result: Ficus septica Burm. fil.
Original html: <span class=“scientific”>Ficus</span> <span class=“scientific”>septica</span> Burm. fil.
Find: <span class=“scientific”>(.*)</span>
Replace: $1
Result: Ficus</span> <span class=“scientific”>septica Burm. fil.I seem to need a way to stop the regex after the first <span class =></span> couple, and start afresh for the next one. Instead, it is skipping to the last </span> occurrence.
How can I improve my regex code?
-
@Aporosa said in What regex to create a search limit boundary?:
How can I improve my regex code?
My small test shows this seems to work:
Find What<([^>]+)?>
Replace With: nothing in this fieldTerry
PS this will also find any sequences of
<>and since they make no sense being there this removes them as well. -
@Aporosa said in What regex to create a search limit boundary?:
How can I improve my regex code?
I looked more into your regex and the reason it didn’t work as planned was the
(.*)is greedy. If your were to change your regex to<span class=“scientific”>(.*?)</span>, note the inclusion of a?changes the regex from being greedy to lazy (not-greedy).The difference is that your’s will find the
longestsequence that the regex can be true with, the adjusted regex finds theshortestlength the regex is true with.Terry
-
@Terry-R said in What regex to create a search limit boundary?:
note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy)
Thank you Terry R, not only was this a very clearly explained and effective solution, but now I understand the idea of a ‘greedy’ regex expression, something I didn’t get from textbook examples that I’d read.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login