How to find webpages where a particular link was not added
-
I have a website of 400 pages. I use Notepad++ to edit all of them, all together, many times. I added a link to these 400 pages once but I see that that link has not appeared on some webpages. The link that was added to most of the webpages but not all is
<li><a href="otitis.html">Otitis</a></li>
How to find the webpages to which this particular link was not added? -
I think this does what you want:
FIND WHAT =
(?s)\A((?!\Q<li><a href="otitis.html">Otitis</a></li>\E).)*\Z
SEARCH MODE = Regular ExpressionBrief explanation: The
(?s)
makes sure that . will match newline characters. The\A
and\Z
anchor it to the beginning and end of your document. The\Q...\E
say “don’t treat any characters between this as special regex characters, they are just literals”. The(?!...).
says “look for any character that isn’t the start to the phrase that you want to be missing”. The((...)*)
says that there can be zero or more instances of characters that aren’t the start of the phrase. Since the full regex has only “characters that aren’t the start of the phrase” between the start and end of file, then it will only find files that never contain the phrase.I tried with two files, one which had that phrase and one which didn’t, and it only matched the one that didn’t.
-
@PeterJones I purposely avoided adding that link to pages I found without that link, waiting for an answer here. I tried what you suggested above but your RegEx did not find those. It found only
<blah blah blah blah>
whereblah blah blah blah
means strings other than those with<li............>
in them.
I have this line just above what I want to be skipped during the search (what I have typed in my original question above):<li><a href="otalgia.html">Otalgia</a><\li>
.
So, is it possible to find the webpages in which<li><a href="otalgia.html">Otalgia</a><\li>
is not followed by<li><a href="otitis.html">Otitis</a><\li>
?
For your information, the<li><a href="otitis.html">Otitis</a><\li>
is either on the same line or sometimes on the next line of this line:<li><a href="otalgia.html">Otalgia</a><\li>
-
@dr-ramaanand said in How to find webpages where a particular link was not added:
I tried what you suggested above but your RegEx did not find those.
I beg to differ. My regex found every file that did not contain
<li><a href="otitis.html">Otitis</a></li>
It found only
<blah blah blah blah>
where blah means strings other than those with<li............>
in them.What do you expect the Find Results to show when you are looking for text that is not in the file? In this case, it showed the first line of the file, because the “match” of “an entire file that does not contain the excluded phrase” starts with that line (and it’s not going to show the whole file)
I have this line just above what I want to be skipped during the search (what I have typed in my original question above):
<li><a href="otalgia.html">Otalgia</a><\li>
.
So, is it possible to find the webpages in which<li><a href="otalgia.html">Otalgia</a><\li>
is not followed by<li><a href="otitis.html">Otitis</a><\li>
?…
FIND =\Q<li><a href="otalgia.html">Otalgia</a><\li>\E(?!\R\Q<li><a href="otitis.html">Otitis</a><\li>\E)
This finds any file that has literal
<li><a href="otalgia.html">Otalgia</a><\li>
followed by a newline, not followed by literal<li><a href="otitis.html">Otitis</a><\li>
I tried with two files,
match.txt<li><a href="otalgia.html">Otalgia</a><\li> <li><a href="otitis.html">Otitis</a><\li>
missing.txt
<li><a href="otalgia.html">Otalgia</a><\li> <li><a href="outback.html">Outback</a><\li>
with the results:
it only “found” the file “missing.txt”, which had the Otaglia line without the Otitis line.Please note that I intentionally included your typo in my regex and the examples. I believe you will actually want to correct it to search for
</li>
, not the<\li>
that you put in your most recent reply, but I decided to give you what you asked for. -
Yes, that did it @PeterJones
Thank you very much! -
@PeterJones I found some 200 hits in 200 files for what I did with your RegEx. Now, is it possible to reproduce what is searched for, that is
<li><a href=otalgia.html">Otalgia</a></li>
minus the<li><a href=otitis.html">Otitis</a></li>
and then add<li><a href=otitis.html">Otitis</a></li>
on the next line?
Possibly, I can use the RegEx you gave above in the “Find” field and a${0}<li><a href=otitis.html">Otitis</a></li>
in the “Replace All” field. -
@PeterJones If I put
${0}<li><a href=otitis.html">Otitis</a></li>
in the Replace/Replace All field, I can reproduce what was searched for and add<li><a href=otitis.html">Otitis</a></li>
on the same line but how to make the<li><a href=otitis.html">Otitis</a></li>
come on the next line? -
@PeterJones What I finally did is, use the “Replace in Files” with
${0}<li><a href=otitis.html">Otitis</a></li>
in the “Replace with” field, and\Q<li><a href="otalgia.html">Otalgia</a><\li>\E(?!\R\Q<li><a href="otitis.html">Otitis</a><\li>\E)
in the “Find All” field in the Regular expression mode, then I put<li><a href=otalgia.html">Otalgia</a></li><li><a href=otitis.html">Otitis</a></li>
in the “Find All” field and<li><a href=otalgia.html">Otalgia</a></li>\n<li><a href=otitis.html">Otitis</a></li>
in the “Replace in Files” field and used the “Extended” mode of replacement and it was done.
However if the above can be done in one step instead of two, please let me know. -
@dr-ramaanand said in How to find webpages where a particular link was not added:
However if the above can be done in one step instead of two, please let me know.
My philosophy: if it works in two steps and I understand what it’s doing, great! I don’t go looking for super-complicated regex (which I probably wouldn’t understand if I wanted to be able to use it again) to shorten a procedure from two regex to one. If two works, great!
Now, sometimes I do go exploring the super-complicated when I have the spare time, when it’s a mental challenge that is interesting to me, in order to better learn the tools that regex provide. But having someone hand me that complicated combined regex would eliminate the learning portion, and make it pointless to me.
----
Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.
----
Useful References
-
Okay, thank you @PeterJones
-
@PeterJones I could do it in one step with
${0}\r\n<li><a href=otitis.html">Otitis</a></li>
in the Replace All field using the Regular expression mode