Community
    • Login

    Regex pattern needed

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 235 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • alexologA
      alexolog
      last edited by

      Hello,

      I am looking for a way to replace all lines of the form:
      https://www.server.example/path/something%20with%20spaces
      with:
      <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
      Using a single search/replace operation.

      Currently I do it with one operation that transforms the link to an <h ref...> format, and another one to replace a %20 with a space after the closing angle bracket, which has to be repeated several times until all the %20 instances are replaced.

      Please assist.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @alexolog
        last edited by

        @alexolog ,

        I would approach it as a two-step process.

        1. convert https://www.server.example/path/something%20with%20spaces to <a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a> – because that’s a pretty easy regex
        2. convert the >something%20with%20spaces</a> to >something with spaces</a>

        I would do this because I assume that some of your URLs might have one %20, some might have two %20, and some might have more (or none). And coding a regex for all those edge cases is fragile. OTOH, if I can just search for a URL and break it into two pieces, that’s easy.

        https://www.server.example/path/something%20with%20spaces
        https://www.different.example/path/one%20space
        https://www.third.example/path/spaceless
        
        1. FIND = (?-s)^(https?://\S*/)([^"\s/]*)$
          REPLACE = <a href="$1$2">$2</a>
          MODE = Regular expression
        <a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a>
        <a href="https://www.different.example/path/one%20space">one%20space</a>
        <a href="https://www.third.example/path/spaceless">spaceless</a>
        

        2 . For this one, I would use @guy038’s generic “change data, but only between start and end markers” regex from this post
        * Generic = (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
        * BSR = > (for the end of the <a href="...">)
        * ESR = </a>
        * FR = %20
        * RR = \x20 (or a literal space
        * => FIND = (?-i:>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
        REPLACE = \x20 (or a literal space)

        Unfortunately, when I did that, my test data became

        <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
        <a href="https://www.different.example/path/one space">one space</a>
        <a href="https://www.third.example/path/spaceless">spaceless</a>
        

        … and you can see that it replaced a %20 that was inside the href portion… I think because used such a small BSR expression. Unfortunately, my attempt at fixing it with BSR = <a[^\s>]*>, to be more specific, said it couldn’t find it at all. And unfortunately, I have to focus on my day job a bit more today, so I cannot continue debugging. But this is the path I’d follow.

        Maybe @guy038 will have time to tell us what I did wrong, or come up with a better BSR to keep the find-region out of the href value. Or maybe I will find some time this evening.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @PeterJones
          last edited by PeterJones

          @PeterJones said in Regex pattern needed:

          Or maybe I will find some time this evening.

          Well, it was the next day, but…

          My mistake in yesterday’s modified BSR = <a[^\s>]*> was including \s in the complement character class, which meant it had to be <a...> without any spaces, which obviously cannot match <a href="...">. Once I realized that, it was easy to fix.

          • BSR = <a[^>]*>
          • ESR = </a>
          • FR = %20
          • RR = \x20 (or literal space)
          • FIND = (?-i:<a[^>]*>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
          • REPLACE = \x20
          • Final Transformation of my previous data:
            <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
            <a href="https://www.different.example/path/one%20space">one space</a>
            <a href="https://www.third.example/path/spaceless">spaceless</a>
            
          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors