<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Regex: How can I find those html files with links that are not identical in different places?]]></title><description><![CDATA[<p dir="auto">I have this link at the beginning of html page:</p>
<p dir="auto"><code>&lt;link rel="canonical" href="https://xxx.com/en/page-AAA.html" /&gt;</code></p>
<p dir="auto">also I have another link on the middle of the file:</p>
<p dir="auto"><code> &lt;a href="https://xxx.com/en/page-AAA.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;</code></p>
<p dir="auto">You see that there are the same links, but in different contexts and places. Compare it.</p>
<p dir="auto">But how can I find those html files with links that are not identical in those different places?</p>
<p dir="auto">Suppose the first link will be: <code>&lt;link rel="canonical" href="https://xxx.com/en/page-CCC.html" /&gt;</code> in this case are not identical those too, so regex should find that file that contains different links.</p>
<p dir="auto">How can I do this with Regex?</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/21610/regex-how-can-i-find-those-html-files-with-links-that-are-not-identical-in-different-places</link><generator>RSS for Node</generator><lastBuildDate>Mon, 13 Apr 2026 18:32:32 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/21610.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 05 Aug 2021 15:27:42 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Regex: How can I find those html files with links that are not identical in different places? on Tue, 10 Aug 2021 09:14:10 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7753">@robin-cruise</a> and <strong>All</strong>,</p>
<p dir="auto">In the search regex <strong><code>(?s)(https://xxx.com/en/)([^"]+)".+?\1(?!\2").+?"</code></strong> :</p>
<ul>
<li>
<p dir="auto">The regex part <strong><code>https://xxx.com/en/</code></strong> looks for the <strong>literal</strong> string <strong><a href="https://xxx.com/en/" rel="nofollow ugc">https://xxx.com/en/</a></strong>, stored as <strong>group <code>1</code></strong></p>
</li>
<li>
<p dir="auto">The regex part <strong><code>([^"]+)"</code></strong> represents the <strong>remainder</strong> of the internet address ( for instance the string <strong>page-AAA.html</strong> ), followed with a <strong>double-quote</strong>, because <strong><code>[^"]+</code></strong> is a <strong>non-null</strong> range of <strong>consecutive</strong> chars, <strong>all different</strong> from <strong><code>"</code></strong>, stored as <strong>group <code>2</code></strong></p>
</li>
<li>
<p dir="auto">Now, the part <strong><code>.+?</code></strong> stands for the <strong>shortest</strong> range of <strong>any</strong> char till…</p>
<ul>
<li>
<p dir="auto">The group <strong><code>1</code></strong> ( <strong><code>\1</code></strong> ). So an <strong>other</strong> string <strong><a href="https://xxx.com/en/" rel="nofollow ugc">https://xxx.com/en/</a></strong></p>
</li>
<li>
<p dir="auto">Which must be followed by <strong><code>.+?"</code></strong>, which represents the <strong>shortest non-null</strong> range of <strong>any</strong> char before a <strong>double-quote</strong>…</p>
</li>
<li>
<p dir="auto">But <em>ONLY IF</em> this range is <strong>different</strong> from <strong><code>\2</code></strong> ( i.e. <strong>different</strong>, for instance, from the string <strong>page-AAA.html</strong> and a <strong><code>"</code></strong> char )</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">Note also that the <strong><code>[^"]+"</code></strong> syntax, without the parentheses, is more <strong>restrictive</strong> than <strong><code>.+?"</code></strong> and must be preferred because of the <strong>negative</strong> look-ahead <strong><code>(?!\2")</code></strong></p>
<p dir="auto">Besst Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/68745</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/68745</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Tue, 10 Aug 2021 09:14:10 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: How can I find those html files with links that are not identical in different places? on Sat, 07 Aug 2021 20:15:18 GMT]]></title><description><![CDATA[<p dir="auto">by the way, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> Can you explain what does this part of your regex do?</p>
<p dir="auto"><code>\1(?!\2")</code></p>
]]></description><link>https://community.notepad-plus-plus.org/post/68690</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/68690</guid><dc:creator><![CDATA[Robin Cruise]]></dc:creator><pubDate>Sat, 07 Aug 2021 20:15:18 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: How can I find those html files with links that are not identical in different places? on Sat, 07 Aug 2021 16:18:27 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> thanks a lot. You are the best !</p>
]]></description><link>https://community.notepad-plus-plus.org/post/68686</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/68686</guid><dc:creator><![CDATA[Robin Cruise]]></dc:creator><pubDate>Sat, 07 Aug 2021 16:18:27 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: How can I find those html files with links that are not identical in different places? on Fri, 06 Aug 2021 14:38:12 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7753">@robin-Cruise</a> and <strong>All</strong>,</p>
<p dir="auto">Let’s suppose you have, at <strong>least</strong>, <strong>two</strong> links of the form <strong><code>https://xxx.com/en/••••••••••.••••</code></strong>, where the part <strong><code>••••••••••.••••</code></strong> is <strong>different</strong>.</p>
<p dir="auto">Then, the regex <strong><code>(?s)(https://xxx.com/en/)([^"]+)".+?\1(?!\2").+?"</code></strong> will match the range between these <strong>two</strong> links, included !</p>
<p dir="auto">Thus, the regex does <strong>not</strong> match anything if <strong>all</strong> the <strong><code>https://xxx.com/en/••••••••••.••••</code></strong> , of <strong>current</strong> file, have the same <strong><code>••••••••••.••••</code></strong> part.</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/68656</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/68656</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Fri, 06 Aug 2021 14:38:12 GMT</pubDate></item></channel></rss>