Community
    • Login

    Regex pattern needed

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 530 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • alexologA Offline
      alexolog
      last edited by

      Hello,

      I am looking for a way to replace all lines of the form:
      https://www.server.example/path/something%20with%20spaces
      with:
      <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
      Using a single search/replace operation.

      Currently I do it with one operation that transforms the link to an <h ref...> format, and another one to replace a %20 with a space after the closing angle bracket, which has to be repeated several times until all the %20 instances are replaced.

      Please assist.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP Offline
        PeterJones @alexolog
        last edited by

        @alexolog ,

        I would approach it as a two-step process.

        1. convert https://www.server.example/path/something%20with%20spaces to <a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a> – because that’s a pretty easy regex
        2. convert the >something%20with%20spaces</a> to >something with spaces</a>

        I would do this because I assume that some of your URLs might have one %20, some might have two %20, and some might have more (or none). And coding a regex for all those edge cases is fragile. OTOH, if I can just search for a URL and break it into two pieces, that’s easy.

        https://www.server.example/path/something%20with%20spaces
        https://www.different.example/path/one%20space
        https://www.third.example/path/spaceless
        
        1. FIND = (?-s)^(https?://\S*/)([^"\s/]*)$
          REPLACE = <a href="$1$2">$2</a>
          MODE = Regular expression
        <a href="https://www.server.example/path/something%20with%20spaces">something%20with%20spaces</a>
        <a href="https://www.different.example/path/one%20space">one%20space</a>
        <a href="https://www.third.example/path/spaceless">spaceless</a>
        

        2 . For this one, I would use @guy038’s generic “change data, but only between start and end markers” regex from this post
        * Generic = (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
        * BSR = > (for the end of the <a href="...">)
        * ESR = </a>
        * FR = %20
        * RR = \x20 (or a literal space
        * => FIND = (?-i:>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
        REPLACE = \x20 (or a literal space)

        Unfortunately, when I did that, my test data became

        <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
        <a href="https://www.different.example/path/one space">one space</a>
        <a href="https://www.third.example/path/spaceless">spaceless</a>
        

        … and you can see that it replaced a %20 that was inside the href portion… I think because used such a small BSR expression. Unfortunately, my attempt at fixing it with BSR = <a[^\s>]*>, to be more specific, said it couldn’t find it at all. And unfortunately, I have to focus on my day job a bit more today, so I cannot continue debugging. But this is the path I’d follow.

        Maybe @guy038 will have time to tell us what I did wrong, or come up with a better BSR to keep the find-region out of the href value. Or maybe I will find some time this evening.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP Offline
          PeterJones @PeterJones
          last edited by PeterJones

          @PeterJones said in Regex pattern needed:

          Or maybe I will find some time this evening.

          Well, it was the next day, but…

          My mistake in yesterday’s modified BSR = <a[^\s>]*> was including \s in the complement character class, which meant it had to be <a...> without any spaces, which obviously cannot match <a href="...">. Once I realized that, it was easy to fix.

          • BSR = <a[^>]*>
          • ESR = </a>
          • FR = %20
          • RR = \x20 (or literal space)
          • FIND = (?-i:<a[^>]*>|(?!\A)\G)(?s:(?!</a>).)*?\K(?-i:%20)
          • REPLACE = \x20
          • Final Transformation of my previous data:
            <a href="https://www.server.example/path/something%20with%20spaces">something with spaces</a>
            <a href="https://www.different.example/path/one%20space">one space</a>
            <a href="https://www.third.example/path/spaceless">spaceless</a>
            
          1 Reply Last reply Reply Quote 1

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors