What regex to create a search limit boundary?
-
I need some kind of ‘stop search here / start again’ expression, as in the problem here:
The examples here are from downloaded metadata in html, from the naturalis.nl biodiversity database. Species names are split up by markup code, which I need to strip out.
Here’s a simple example:
<span class=“scientific”>Piper</span> L.So, stripping the markup out is a simple search and replace, resulting in the scientific genus Piper L.
Find: <span class=“scientific”>(.*)</span>
Replace: $1
Result: Piper L.But that strategy does not work on this more common binomial name example:
Desired result: Ficus septica Burm. fil.
Original html: <span class=“scientific”>Ficus</span> <span class=“scientific”>septica</span> Burm. fil.
Find: <span class=“scientific”>(.*)</span>
Replace: $1
Result: Ficus</span> <span class=“scientific”>septica Burm. fil.I seem to need a way to stop the regex after the first <span class =></span> couple, and start afresh for the next one. Instead, it is skipping to the last </span> occurrence.
How can I improve my regex code?
-
@Aporosa said in What regex to create a search limit boundary?:
How can I improve my regex code?
My small test shows this seems to work:
Find What<([^>]+)?>
Replace With: nothing in this fieldTerry
PS this will also find any sequences of
<>
and since they make no sense being there this removes them as well. -
@Aporosa said in What regex to create a search limit boundary?:
How can I improve my regex code?
I looked more into your regex and the reason it didn’t work as planned was the
(.*)
is greedy. If your were to change your regex to<span class=“scientific”>(.*?)</span>
, note the inclusion of a?
changes the regex from being greedy to lazy (not-greedy).The difference is that your’s will find the
longest
sequence that the regex can be true with, the adjusted regex finds theshortest
length the regex is true with.Terry
-
@Terry-R said in What regex to create a search limit boundary?:
note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy)
Thank you Terry R, not only was this a very clearly explained and effective solution, but now I understand the idea of a ‘greedy’ regex expression, something I didn’t get from textbook examples that I’d read.