Community
    • Login

    What regex to create a search limit boundary?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    4 Posts 2 Posters 248 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • AporosaA
      Aporosa
      last edited by

      I need some kind of ‘stop search here / start again’ expression, as in the problem here:

      The examples here are from downloaded metadata in html, from the naturalis.nl biodiversity database. Species names are split up by markup code, which I need to strip out.

      Here’s a simple example:
      <span class=“scientific”>Piper</span> L.

      So, stripping the markup out is a simple search and replace, resulting in the scientific genus Piper L.
      Find: <span class=“scientific”>(.*)</span>
      Replace: $1
      Result: Piper L.

      But that strategy does not work on this more common binomial name example:
      Desired result: Ficus septica Burm. fil.
      Original html: <span class=“scientific”>Ficus</span> <span class=“scientific”>septica</span> Burm. fil.
      Find: <span class=“scientific”>(.*)</span>
      Replace: $1
      Result: Ficus</span> <span class=“scientific”>septica Burm. fil.

      I seem to need a way to stop the regex after the first <span class =></span> couple, and start afresh for the next one. Instead, it is skipping to the last </span> occurrence.

      How can I improve my regex code?

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        @Aporosa said in What regex to create a search limit boundary?:

        How can I improve my regex code?

        My small test shows this seems to work:
        Find What<([^>]+)?>
        Replace With: nothing in this field

        Terry

        PS this will also find any sequences of <> and since they make no sense being there this removes them as well.

        1 Reply Last reply Reply Quote 1
        • Terry RT
          Terry R
          last edited by Terry R

          @Aporosa said in What regex to create a search limit boundary?:

          How can I improve my regex code?

          I looked more into your regex and the reason it didn’t work as planned was the (.*) is greedy. If your were to change your regex to <span class=“scientific”>(.*?)</span>, note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy).

          The difference is that your’s will find the longest sequence that the regex can be true with, the adjusted regex finds the shortest length the regex is true with.

          Terry

          1 Reply Last reply Reply Quote 3
          • AporosaA
            Aporosa
            last edited by

            @Terry-R said in What regex to create a search limit boundary?:

            note the inclusion of a ? changes the regex from being greedy to lazy (not-greedy)

            Thank you Terry R, not only was this a very clearly explained and effective solution, but now I understand the idea of a ‘greedy’ regex expression, something I didn’t get from textbook examples that I’d read.

            1 Reply Last reply Reply Quote 2
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors