Community
    • Login

    Regex - Find URLs with Embedded Spaces

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    4 Posts 3 Posters 566 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dick Adams 0D
      Dick Adams 0
      last edited by

      I need to check HTML pages for URLs with embedded spaces. The URL I’ve come up with so far:

      href=".*? +?.*?">
      

      It finds this URL as expected:

      href="url with spaces">
      

      but then it gives a false positive on this subsequent URL:

      href="../../../../../bio/d/y/k/e/dykes_jb.htm">John B. Dykes</a><span class="verbose">
      

      Any suggestions would be greatly appreciated!

      Terry RT PeterJonesP 2 Replies Last reply Reply Quote 0
      • Terry RT
        Terry R @Dick Adams 0
        last edited by

        @Dick-Adams-0
        What you are trying to do is difficult but the forum’s resident regex guru (@guy038 ) has already made some posts here.

        I’m not going to try to give you the exact regex but it will be worthwhile you reading those posts. In particular his 2nd post in that thread which looks to be exactly what you are seeking (with obvious character replacements).

        Terry

        PeterJonesP 1 Reply Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @Dick Adams 0
          last edited by PeterJones

          @Dick-Adams-0 ,

          Your regex probably needs to be more restrictive. I know you tried to restrict it by making it non-greedy ? , but that’s not always enough.

          Try [^"]*? instead of .*? in at least the first and maybe both instances of .*? .

          The first might even want to be [^"\s]*? so that it allows neither quotes nor whitespace characters.

          ----

          Useful References

          • Notepad++ Online User Manual: Searching/Regex
          • FAQ: Where to find other regular expressions (regex) documentation

          ----

          Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

          1 Reply Last reply Reply Quote 1
          • PeterJonesP
            PeterJones @Terry R
            last edited by PeterJones

            @Terry-R said in Regex - Find URLs with Embedded Spaces:

            has already made some posts here.

            And as slightly more of a hint: the search zone would begin with href=" and end with " . (If you tried the simpler begin with " , you would find that it would sometimes match between the end of one URL and the beginning of the next, and it wouldn’t work, which might frustrate you; the zone-matching works best when the start and end markers can be distinguished.)

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors