• Login
Community
  • Login

What regex to create a search limit boundary?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regex
4 Posts 2 Posters 264 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Aporosa
    last edited by Oct 4, 2021, 1:05 AM

    I need some kind of ‘stop search here / start again’ expression, as in the problem here:

    The examples here are from downloaded metadata in html, from the naturalis.nl biodiversity database. Species names are split up by markup code, which I need to strip out.

    Here’s a simple example:
    <span class=“scientific”>Piper</span> L.

    So, stripping the markup out is a simple search and replace, resulting in the scientific genus Piper L.
    Find: <span class=“scientific”>(.*)</span>
    Replace: $1
    Result: Piper L.

    But that strategy does not work on this more common binomial name example:
    Desired result: Ficus septica Burm. fil.
    Original html: <span class=“scientific”>Ficus</span> <span class=“scientific”>septica</span> Burm. fil.
    Find: <span class=“scientific”>(.*)</span>
    Replace: $1
    Result: Ficus</span> <span class=“scientific”>septica Burm. fil.

    I seem to need a way to stop the regex after the first <span class =></span> couple, and start afresh for the next one. Instead, it is skipping to the last </span> occurrence.

    How can I improve my regex code?

    1 Reply Last reply Reply Quote 0
    • T
      Terry R
      last edited by Terry R Oct 4, 2021, 1:18 AM Oct 4, 2021, 1:16 AM

      @Aporosa said in What regex to create a search limit boundary?:

      How can I improve my regex code?

      My small test shows this seems to work:
      Find What<([^>]+)?>
      Replace With: nothing in this field

      Terry

      PS this will also find any sequences of <> and since they make no sense being there this removes them as well.

      1 Reply Last reply Reply Quote 1
      • T
        Terry R
        last edited by Terry R Oct 4, 2021, 1:30 AM Oct 4, 2021, 1:27 AM

        @Aporosa said in What regex to create a search limit boundary?:

        How can I improve my regex code?

        I looked more into your regex and the reason it didn’t work as planned was the (.*) is greedy. If your were to change your regex to <span class=“scientific”>(.*?)</span>, note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy).

        The difference is that your’s will find the longest sequence that the regex can be true with, the adjusted regex finds the shortest length the regex is true with.

        Terry

        1 Reply Last reply Reply Quote 3
        • A
          Aporosa
          last edited by Oct 4, 2021, 5:59 AM

          @Terry-R said in What regex to create a search limit boundary?:

          note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy)

          Thank you Terry R, not only was this a very clearly explained and effective solution, but now I understand the idea of a ‘greedy’ regex expression, something I didn’t get from textbook examples that I’d read.

          1 Reply Last reply Reply Quote 2
          4 out of 4
          • First post
            4/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors