Community
    • Login

    HELP: finding lines with a RegEX expression which does not include word "html" before given regex condition

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 207 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 2
      2dmnGood4u
      last edited by

      Fellow Notepad++ Users,

      Could you please help me with the following search problem I am having?

      Here is the data I currently have (“before” data):

       herf="https://website.com/index.php/forum/journals/1649-me?start=3350"
       herf="https://website.com/index.php/forum/journals/1649-me.html?start=3350" 
      

      Here is how I would like that data to look (“search result” data):

      ="https://website.com/index.php/forum/journals/1649-me?start=3350
      

      To accomplish this, I have tried using the following Find/Replace expressions and settings

      • Find What = ^(?!.*html)(.*[?]start=)
      • Search Mode = REGULAR EXPRESSION
      • Dot Matches Newline = NOT CHECKED

      As you see there are two lines in my sample, I want to use find what to find the lines with links that do not contain html before ?start= and I’m looking for it in bulk html files that are being backed up from a website (mirrored it). I want this command to be less greedy and only highlight lines with given condition till it finds previous = symbol and do not select contents before it.

      Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @2dmnGood4u
        last edited by PeterJones

        @2dmnGood4u ,

        I am not 100% sure I understand what you really want, but =(?!.*html)([^=]*[?]start=) will match from the = to the start= (but does not match anything after the start=)
        d9d4d7b8-0281-46c4-b1af-67f8184b1510-image.png

        If you also want the rest of the line after the start=, then it would be =(?!.*html)([^=]*[?]start=.*) like the following:
        74034053-2959-4213-a0fe-17ce5bd2ba52-image.png
        (My regex includes the " at the end, which your example “search result data” doesn’t include, but I’m hoping that’s just a typo on your part, and you really wanted the " like I showed in my second, or you didn’t want the 3350" at all, like I showed in my first)

        The problem with your original regex is (1) that it required the match to start at the beginning of the line, but you actually wanted it to start at the = and (2) .*[?]start= will match any number of any character until the start=, whereas (if I understand correctly) you want it to not have any = between the initial = and the start= …

        So I think this is probably what you want (or at least moving in that direction)

        ----

        Useful References

        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation
        1 Reply Last reply Reply Quote 4
        • 2
          2dmnGood4u
          last edited by

          Perfect solution, thanks a lot, you’re the man.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors