What regex to create a search limit boundary?



  • I need some kind of ‘stop search here / start again’ expression, as in the problem here:

    The examples here are from downloaded metadata in html, from the naturalis.nl biodiversity database. Species names are split up by markup code, which I need to strip out.

    Here’s a simple example:
    <span class=“scientific”>Piper</span> L.

    So, stripping the markup out is a simple search and replace, resulting in the scientific genus Piper L.
    Find: <span class=“scientific”>(.*)</span>
    Replace: $1
    Result: Piper L.

    But that strategy does not work on this more common binomial name example:
    Desired result: Ficus septica Burm. fil.
    Original html: <span class=“scientific”>Ficus</span> <span class=“scientific”>septica</span> Burm. fil.
    Find: <span class=“scientific”>(.*)</span>
    Replace: $1
    Result: Ficus</span> <span class=“scientific”>septica Burm. fil.

    I seem to need a way to stop the regex after the first <span class =></span> couple, and start afresh for the next one. Instead, it is skipping to the last </span> occurrence.

    How can I improve my regex code?



  • @Aporosa said in What regex to create a search limit boundary?:

    How can I improve my regex code?

    My small test shows this seems to work:
    Find What<([^>]+)?>
    Replace With: nothing in this field

    Terry

    PS this will also find any sequences of <> and since they make no sense being there this removes them as well.



  • @Aporosa said in What regex to create a search limit boundary?:

    How can I improve my regex code?

    I looked more into your regex and the reason it didn’t work as planned was the (.*) is greedy. If your were to change your regex to <span class=“scientific”>(.*?)</span>, note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy).

    The difference is that your’s will find the longest sequence that the regex can be true with, the adjusted regex finds the shortest length the regex is true with.

    Terry



  • @Terry-R said in What regex to create a search limit boundary?:

    note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy)

    Thank you Terry R, not only was this a very clearly explained and effective solution, but now I understand the idea of a ‘greedy’ regex expression, something I didn’t get from textbook examples that I’d read.


Log in to reply