HELP: finding lines with a RegEX expression which does not include word "html" before given regex condition
-
Fellow Notepad++ Users,
Could you please help me with the following search problem I am having?
Here is the data I currently have (“before” data):
herf="https://website.com/index.php/forum/journals/1649-me?start=3350" herf="https://website.com/index.php/forum/journals/1649-me.html?start=3350"
Here is how I would like that data to look (“search result” data):
="https://website.com/index.php/forum/journals/1649-me?start=3350
To accomplish this, I have tried using the following Find/Replace expressions and settings
- Find What =
^(?!.*html)(.*[?]start=)
- Search Mode = REGULAR EXPRESSION
- Dot Matches Newline = NOT CHECKED
As you see there are two lines in my sample, I want to use find what to find the lines with links that do not contain
html
before?start=
and I’m looking for it in bulk html files that are being backed up from a website (mirrored it). I want this command to be less greedy and only highlight lines with given condition till it finds previous=
symbol and do not select contents before it.Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?
- Find What =
-
I am not 100% sure I understand what you really want, but
=(?!.*html)([^=]*[?]start=)
will match from the=
to thestart=
(but does not match anything after thestart=
)
If you also want the rest of the line after the
start=
, then it would be=(?!.*html)([^=]*[?]start=.*)
like the following:
(My regex includes the"
at the end, which your example “search result data” doesn’t include, but I’m hoping that’s just a typo on your part, and you really wanted the"
like I showed in my second, or you didn’t want the3350"
at all, like I showed in my first)The problem with your original regex is (1) that it required the match to start at the beginning of the line, but you actually wanted it to start at the
=
and (2).*[?]start=
will match any number of any character until thestart=
, whereas (if I understand correctly) you want it to not have any=
between the initial=
and thestart=
…So I think this is probably what you want (or at least moving in that direction)
----
Useful References