<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags]]></title><description><![CDATA[<p dir="auto">good day. Is if possible  using regex to find those files that doesn’t contain the same link in 2 different html tags. I have more then 100 html files.</p>
<p dir="auto">I have 2 links:</p>
<p dir="auto"><code>https://mywebsite.com/en/truth.html</code>   and   <code>https://mywebsite.com/en/love.html</code> in two different html tags.</p>
<pre><code>&lt;meta property="og:url" content="https://mywebsite.com/en/truth.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/love.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
<p dir="auto">I use this formula, but isn’t working. It finds both link, even if those are the same. I should find the files that doesn’t contain the same link up and down.</p>
<p dir="auto">FIND: (.matches newline)</p>
<p dir="auto"><code>(&lt;link rel="canonical" href="(.*?)" \/&gt;.*?)(?!(alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href=")).*?("&gt;&lt;img src)</code></p>
<p dir="auto">To explain my formula:</p>
<p dir="auto">This can find the first link from the meta tag: <code>(&lt;link rel="canonical" href="(.*?)" \/&gt;.*?)</code></p>
<p dir="auto">This can find the second link from &lt;img tag: <code>(alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href=").*?("&gt;&lt;img src)</code></p>
<p dir="auto">and I use <code>?!</code> to exclude the second link.</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/21084/regex-find-those-files-that-doesn-t-contain-the-same-link-in-2-different-html-tags</link><generator>RSS for Node</generator><lastBuildDate>Sat, 09 May 2026 20:23:45 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/21084.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 27 Apr 2021 19:41:30 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Sun, 02 May 2021 05:48:10 GMT]]></title><description><![CDATA[<p dir="auto">happy Easter, friends.</p>
<p dir="auto">Another solution can be next one:</p>
<ol>
<li>Select the link you want from canonical line: <code>(&lt;link rel="canonical" href=")(.*?)(" \/&gt;)</code></li>
<li>Select the second link from ru section: <code>(alt="de" \/&gt;&lt;/a&gt;&amp;nbsp; &lt;a href=")(.*?)(&gt;&lt;img src="index_files\/flag_lang_ru)</code></li>
<li>Combine these 2 regex in the same way <code>(.*?)</code> and put <code>(\2)</code> on second link, after <code>(.*?)</code> (this selects the second bracket, so the link in the canonical line)</li>
</ol>
<p dir="auto">So the regex become: <code>(&lt;link rel="canonical" href=")(.*?)(" \/&gt;)(.*?)(alt="de" \/&gt;&lt;/a&gt;&amp;nbsp; &lt;a href=")(.*?)(\2)(&gt;&lt;img src="index_files\/flag_lang_ru)</code></p>
<p dir="auto">eventualy, we can try <code>(?!\2)</code> instead of <code>(\2)</code> and make a <strong>FIND</strong> with <strong>.matches newsline</strong></p>
<p dir="auto">So the regex become: <code>(&lt;link rel="canonical" href=")(.*?)(" \/&gt;)(.*?)(alt="de" \/&gt;&lt;/a&gt;&amp;nbsp; &lt;a href=")(.*?)(?!\2)(&gt;&lt;img src="index_files\/flag_lang_ru)</code></p>
<p dir="auto">Don’t know why is not working. I believe my thinking was correct. :)</p>
<pre><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;

&lt;meta name="copyright" content="me, https://mywebsite.com/"/&gt;
&lt;link rel="sitemap" type="application/rss+xml" href="rss.xml" /&gt; 
&lt;link rel="image_src" type="image/jpeg" href="https://mywebsite.com/icon-facebook.jpg" style="display:none"/&gt;    
&lt;meta itemprop="image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:type"  content="article" /&gt;
&lt;meta property="fb:app_id" content="2156440"/&gt;
&lt;meta property="fb:admins" content="16454242"/&gt;
&lt;meta property="og:url" content="https://mywebsite.com/en/other-car.html"/&gt;

&lt;body&gt;

TEXT TEXT

&lt;div class="search"&gt;
                &lt;div align="left"&gt;

                  &lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;

TEXT TEXT


&lt;div id="pixxell"&gt; &lt;a href="https://mywebsite.com/en/book-miracle.html"&gt;I find a miracle &lt;/div&gt;

TEXT TEXT
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/65516</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65516</guid><dc:creator><![CDATA[Vasile Caraus]]></dc:creator><pubDate>Sun, 02 May 2021 05:48:10 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Thu, 29 Apr 2021 14:01:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65443">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">The problem of this forum is that I cannot edit again the post after couples of minutes, and I forgot to change.</p>
</blockquote>
<p dir="auto">It’s good to have the posting history, exactly as it is.<br />
That way, later posts will make sense.<br />
If earlier posts could change, I think we’d totally have a mess in some of the threads here (and probably this thread is a great example of that).<br />
If you have new/corrected information, just add an additional post.</p>
<p dir="auto">BUT…be aware the those helping you are putting a lot of time/effort into it.<br />
So you really should think hard about what you are posting and try to get it right the first time, to avoid others wasting their time.<br />
Sure, errors happen, but there’s a difference between and honest mistake and someone that just hasn’t bothered to think things through enough.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65461</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65461</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Thu, 29 Apr 2021 14:01:08 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Thu, 29 Apr 2021 06:47:15 GMT]]></title><description><![CDATA[<p dir="auto">hello <a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a>  Was my mistake, should be: <code>&lt;a href="https://mywebsite.com/ru/truth.html</code></p>
<pre><code>&lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
<p dir="auto">so my last regex should be: <code>alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/(?!ru)</code></p>
<p dir="auto">The problem of this forum is that I cannot edit again the post after couples of minutes, and I forgot to change. Of course, I didn’t think anyone would be interested anymore.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65443</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65443</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Thu, 29 Apr 2021 06:47:15 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 21:39:34 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65404">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">I want to check if there is any link that I omitted (in the German section, “de”), and this link for german section is only found in the line with &lt;img src =</p>
</blockquote>
<p dir="auto">At this point (thanks <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a>)  I’m about to throw in the towel (that means give up). I don’t think even with your latest post I’m entirely clear on what you need to do.</p>
<p dir="auto">I understand you have the first line in the set which has the <code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;</code> First question, which part of the link are you testing against. Is it <code>mywebsite.com/en/truth.html</code> or just <code>truth.html</code>?</p>
<p dir="auto">I get that further into the set you have duplicate information, each with a language mentioned, fr, en, es, pt, ar, zh, hi, de, ru. Second question, is it that you only want to test the https reference  with the <code>de</code> portion?</p>
<p dir="auto">A third question. In each html file is there just 1 set of the "link rel=… to … I ask that as even the last example suggested the example was not complete. There are the starting tags <code>&lt;div class="search"&gt;</code> and <code>&lt;div align="left"&gt;</code> yet no close on these tags appear (<code>&lt;/div&gt;</code>). We need to be clear on what content exists if the example isn’t complete. What other data have you excluded? You have already excluded data which you thought irrelevent yet later you realise it was important. Anything which appears in the area starting where the test starts and the end point of the test (so between the 2 https references) is relevant and must be included to give the best chance of supplying a workable solution.</p>
<p dir="auto">Please answer all 3 questions precisely. If unable to, then the towel gets chucked and I’m out. And as <a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> asks, your original regex where you look for the <code>alt="de"</code> string followed by https reference is clearly wrong as the <code>alt='de'</code> appears after the assigned https reference. It’s good that someone else spotted that as I was reading the example multiple times wondering how it should have worked with your regexes.</p>
<p dir="auto">One parting suggestion. It’s almost at the point where making copies of all the files and editing each to remove unneeded portions would greatly simplify the resulting regex to do the actual test. You would leave enough unique information so that the relevant section (if more than 1 in each file)  would show you where to look in the appropriate original file so you could perform the necessary edit to fix the link.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65437</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65437</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 28 Apr 2021 21:39:34 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 21:09:32 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: terry-r">@<bdi>terry-r</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: alan-kilborn">@<bdi>alan-kilborn</bdi></a> and <strong>All</strong>,</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, in <a href="https://community.notepad-plus-plus.org/post/65394">this</a> post, you provided an <strong><code>HTMl</code></strong> text which contained this <strong>very long</strong> line :</p>
<pre><code class="language-html">&lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
<p dir="auto">In order to <strong>better</strong> see all the <strong>contents</strong> of this <strong>loooooong</strong> line, I <strong>split</strong> it into <strong><code>10</code></strong> lines, corresponding to your <strong><code>10</code></strong> languages !</p>
<pre><code class="language-html">&lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; 
&lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
<p dir="auto">Now, if we <strong>isolate</strong> the line, relative to <strong>German</strong>, we get :</p>
<pre><code class="language-html">&lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; 
</code></pre>
<p dir="auto">Apparently, in that line, the link <strong><code>&lt;a href="https://mywebsite.com/de/truth.html"&gt;</code></strong> seems to occur <strong>fisrt</strong> and the part <strong><code>alt="de"</code></strong> seems to occur <strong>later</strong> !</p>
<p dir="auto">Now the regex of your <strong>last</strong> post seems to search, <strong>first</strong> for the <strong><code>alt="de" /&gt;&lt;/a&gt;&amp;nbsp;</code></strong> string, followed with a <strong>space</strong> char and then, for the <strong>beginning</strong> of the <strong><code>a</code></strong> tag : <strong><code>&lt;a href="https://mywebsite.com/(?!de)</code></strong></p>
<p dir="auto">So, exactly the <strong>opposite</strong> that your <strong>previous</strong> example !? Again, I <strong>totally</strong> confused ! Could you provide us an <strong>exact</strong> and <strong>real</strong> example ?</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65436</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65436</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 28 Apr 2021 21:09:32 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 20:23:55 GMT]]></title><description><![CDATA[<p dir="auto">I believe, much more simple was to use the <code>?!</code> operators.</p>
<p dir="auto">Find with Regex:</p>
<p dir="auto"><code>alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/(?!de)</code></p>
<p dir="auto">because the important was not to have the same link for english part. Once it does not have EN it means that it is not the same. ;)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65435</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65435</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Wed, 28 Apr 2021 20:23:55 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 12:23:18 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> said in <a href="/post/65397">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">I’m really sorry but I still don’t understand what is your goal !</p>
</blockquote>
<p dir="auto">I could see early on that this was where this thread was going to go; sometimes you can just “spot them”. :-)<br />
Kudos to <a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> and <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a> for carrying things on…</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65407</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65407</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Wed, 28 Apr 2021 12:23:18 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 11:07:21 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> and <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a></p>
<p dir="auto">There are 2 particular lines, which is not repeated:</p>
<p dir="auto"><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;</code></p>
<p dir="auto">and</p>
<p dir="auto"><code>&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/love.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;</code></p>
<p dir="auto">So, to be much better understood. I translated the site into ten languages.</p>
<p dir="auto">For french section I have <code>href="https://mywebsite.com/fr/truth.html"&gt;</code><br />
For russian section I have <code>href="https://mywebsite.com/ru/truth.html"&gt;</code></p>
<p dir="auto">so on. See those little De / FR / Ru / Hi / Ar …</p>
<p dir="auto">I want to check if there is any link that I omitted (in the German section, “de”), and this link for german section is only found in the line with <code>&lt;img src =</code></p>
<p dir="auto">Note that the links in English and German are the same, only the content of the html files is different.</p>
<p dir="auto">the line with <code>&lt;canonical</code> represents the page in English.  For example: <code>https://mywebsite.com/en/truth.html</code></p>
<p dir="auto">the <code>&lt;img src=.. tag</code> has also a link, but that  link must be <code>https://mywebsite.com/de/truth.html</code> not the same as canonical <code>https://mywebsite.com/en/truth.html</code></p>
<p dir="auto">That is why I must find those files that doesn’t contain the same link on canonical an &lt;img …de&gt; tag. If are identically, means that I miss to translate the german section ( /de/ ). Because I copy the file from english, and translated only the text.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65404</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65404</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Wed, 28 Apr 2021 11:07:21 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 10:19:55 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65395">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">so, the second tag &lt;img src=…&gt; is extracted from the &lt;div class=“search”&gt; section</p>
</blockquote>
<p dir="auto">So hopefully you now understand providing us the most accurate information possible is key to getting a workable solution.</p>
<p dir="auto">With my regex I think all you need to add is <code>&lt;/a&gt;&amp;nbsp; &lt;a href="</code> directly in front of the second <code>https</code> string. I’m not on a PC currently, instead typing on a smartphone. But if you can try adding these characters I think my regex will work. It’s all a matter of getting the regex to consume characters up until the https tag we wish to check against. This adjustment should get you there. That is unless there are other <code>nbsp</code> in between. But then your own regex was using the <code>nbsp</code> as well so I feel confident.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65401</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65401</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 28 Apr 2021 10:19:55 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 09:20:01 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: terry-r">@<bdi>terry-r</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">I’m really <strong>sorry</strong> but I <strong>still</strong> don’t understand what is your <strong>goal</strong> !</p>
<ul>
<li>First, my <strong>last</strong> regex <strong><code>(?s)&lt;link\h+rel="canonical"\h*\Khref="([^"]+)"((?!&lt;link).)+?&lt;a href="(?!\1).+?"</code></strong>, unlike you said, does <strong>not</strong> match anything against your <strong>last</strong> text, even if I <strong>remove</strong> the parts <strong><code>TEXT TEXT</code></strong> !? Anyway, I don’t care about it as the <strong>next</strong> regex version will surely be very <strong>different</strong> !</li>
</ul>
<hr />
<p dir="auto">Now, the <strong>important</strong> points are :</p>
<ul>
<li><strong>Firstly</strong> : Does the <strong><code>HTML</code></strong> text that you provided, and which is repeated, below, represents a	<strong>real</strong> part of you <strong><code>HTML</code></strong> files ?</li>
</ul>
<pre><code class="language-html">&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;

&lt;meta name="copyright" content="me, https://mywebsite.com/"/&gt;
&lt;link rel="sitemap" type="application/rss+xml" href="rss.xml" /&gt; 
&lt;link rel="image_src" type="image/jpeg" href="https://mywebsite.com/icon-facebook.jpg" style="display:none"/&gt;    
&lt;meta itemprop="image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:type"  content="article" /&gt;
&lt;meta property="fb:app_id" content="2156440"/&gt;
&lt;meta property="fb:admins" content="16454242"/&gt;
&lt;meta property="og:url" content="https://mywebsite.com/en/other-car.html"/&gt;

&lt;body&gt;

TEXT TEXT

&lt;div class="search"&gt;
                &lt;div align="left"&gt;

                  &lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;

TEXT TEXT


&lt;div id="pixxell"&gt; &lt;a href="https://mywebsite.com/en/book-miracle.html"&gt;I find a miracle &lt;/div&gt;

TEXT TEXT
</code></pre>
<ul>
<li>
<p dir="auto"><strong>Secondly</strong> : If so, I suppose that the <strong>first</strong> line <strong><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;</code></strong> with the link <strong><code>https://mywebsite.com/en/truth.html</code></strong> is the <strong>first</strong> of the <strong>two</strong> links to consider in the <strong>future</strong> ( <em>correct</em> ! ) regex</p>
</li>
<li>
<p dir="auto"><strong>Thirdly</strong> : I also suppose that <strong>any</strong> of the links, below, after <strong><code>&lt;div class="search"&gt;</code></strong>, and which are <strong>followed</strong> with an <strong><code>&lt;img  src=•••••&gt;</code></strong> tag are taken as a <strong>second</strong> link to be considered in the <strong>future</strong> regex</p>
</li>
</ul>
<pre><code class="language-html">href="https://mywebsite.com/hope.html"&gt;
href="https://mywebsite.com/fr/book.html"&gt;
href="https://mywebsite.com/en/truth.html"&gt;
href="https://mywebsite.com/es/green.html"&gt;
href="https://mywebsite.com/pt/yellow.html"&gt;
href="https://mywebsite.com/ar/truth.html"&gt;
href="https://mywebsite.com/zh/truth.html"&gt;
href="https://mywebsite.com/hi/truth.html"&gt;
href="https://mywebsite.com/de/truth.html"&gt;
href="https://mywebsite.com/ru/truth.html"&gt;
</code></pre>
<p dir="auto"><strong>Fourthly</strong> : As the tag <strong><code>&lt;link rel="canonical"••••" /&gt;</code></strong> contains the link <strong><code>href="https://mywebsite.com/en/truth.html</code></strong>, I suppose that, considering the <strong>list</strong> of links, above, you would like that the <strong>regex</strong> matches :</p>
<ul>
<li>
<p dir="auto"><em>FROM</em> the expression <strong><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;</code></strong> or, at least, <strong>its</strong> link <strong><code>href="https://mywebsite.com/en/truth.html"</code></strong></p>
</li>
<li>
<p dir="auto"><em>TO</em> the first link <strong>different</strong> from <strong><code>href="https://mywebsite.com/en/truth.html"</code></strong>, so, in <em>this</em> example, the <strong>first</strong> link of the <strong>list</strong> <strong><code>href="https://mywebsite.com/hope.html"&gt;</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">As you can see, it’ is generally <strong>much more</strong> difficult to <strong>fully</strong> understand what are the <strong>OP</strong>’s needs than finding out <strong>any</strong> kind of regex ;-))</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65397</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65397</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 28 Apr 2021 09:20:01 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 05:47:21 GMT]]></title><description><![CDATA[<p dir="auto">so, the second tag <code>&lt;img src=...&gt;</code> is extracted from the <code>&lt;div class="search"&gt;</code> section. Must be taken into account this part.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65395</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65395</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Wed, 28 Apr 2021 05:47:21 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 05:44:02 GMT]]></title><description><![CDATA[<p dir="auto">now I see, there is a small problem.</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a>  Your regex seems to be good on my example: <code>(?s)^&lt;link rel.+?https://([^"]+).+?https://(*SKIP)(?!\1)</code>  find only the files whose links are different.</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> Your regex, also, it is good on my example: <code>(?s)&lt;link\h+rel="canonical"\h*\Khref="([^"]+)"((?!&lt;link).)+?&lt;a href="(?!\1).+?"</code></p>
<p dir="auto">But, I put that TEXT TEXT between those 2 tags, that means that those links can be find more than twice. For this reason we have specified exactly the lines to be considered. Please see the entire code I have:</p>
<pre><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;

&lt;meta name="copyright" content="me, https://mywebsite.com/"/&gt;
&lt;link rel="sitemap" type="application/rss+xml" href="rss.xml" /&gt; 
&lt;link rel="image_src" type="image/jpeg" href="https://mywebsite.com/icon-facebook.jpg" style="display:none"/&gt;    
&lt;meta itemprop="image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:image" content="https://mywebsite.com/icon-facebook.jpg"/&gt;
&lt;meta property="og:type"  content="article" /&gt;
&lt;meta property="fb:app_id" content="2156440"/&gt;
&lt;meta property="fb:admins" content="16454242"/&gt;
&lt;meta property="og:url" content="https://mywebsite.com/en/other-car.html"/&gt;

&lt;body&gt;

TEXT TEXT

&lt;div class="search"&gt;
                &lt;div align="left"&gt;

                  &lt;a href="https://mywebsite.com/hope.html"&gt;&lt;img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/fr/book.html"&gt;&lt;img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/truth.html"&gt;&lt;img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/es/green.html"&gt;&lt;img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/pt/yellow.html"&gt;&lt;img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ar/truth.html"&gt;&lt;img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/zh/truth.html"&gt;&lt;img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/hi/truth.html"&gt;&lt;img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/de/truth.html"&gt;&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/ru/truth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;

TEXT TEXT


&lt;div id="pixxell"&gt; &lt;a href="https://mywebsite.com/en/book-miracle.html"&gt;I find a miracle &lt;/div&gt;

TEXT TEXT
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/65394</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65394</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Wed, 28 Apr 2021 05:44:02 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 02:00:12 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a> said in <a href="/post/65389">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">I did not know for sure there would ONLY be 2 https references in each file, the OP wasn’t specific enough.</p>
</blockquote>
<p dir="auto">I note that I did show the OP a previous test I did which had 3 sets i tested against (image). The OP did not mention at that time that he only had 1 set in each file, guess we need the OP to verify if ONLY 1 set in each html file or MANY!</p>
<p dir="auto">So <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> does each html file contain only 1 set of https references (so 2 https references in each file) or many sets that the test must be carried out on.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65391</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65391</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 28 Apr 2021 02:00:12 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 02:10:38 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: terry-r">@<bdi>terry-r</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">Reading the <strong>Terry</strong>’s post made me think that I had <strong>not</strong> considered the possibility of <strong>successive</strong> couples <strong><code>&lt;link rel="canonical" href="</code></strong> – <strong><code>&lt;a href="</code></strong> in a same <strong><code>HTML</code></strong> file !</p>
<p dir="auto">For instance, against the text :</p>
<pre><code class="language-html">&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html"

text text

&lt;a href="https://mywebsite.com/en/truth.html"

text


text


&lt;link rel="canonical" href="https://mywebsite.com/en/love.html"

text text

&lt;a href="https://mywebsite.com/en/love.html"
</code></pre>
<p dir="auto">My <strong>previous</strong> regex would <strong>wrongly</strong> match all text after <strong><code>&lt;link rel="canonical"</code></strong>. Indeed, as each <strong>couple</strong> of links are <strong>identical</strong> ( 2 × <em>truth</em> and 2 × <em>love</em> ), I suppose, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-crainicu">@<bdi>Hellena-crainicu</bdi></a>, that you do <strong>not</strong> want a match, in that <strong>specific</strong> case, too !</p>
<hr />
<p dir="auto">So, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, prefer this <strong>second</strong> version, more <strong>robust</strong> !</p>
<p dir="auto">SEARCH / MARK <strong><code>(?s)&lt;link\h+rel="canonical"\h*\Khref="([^"]+)"((?!&lt;link).)+?&lt;a href="(?!\1).+?"</code></strong></p>
<p dir="auto">As you can see, the <strong>changed</strong> part is <strong><code>((?!&lt;link).)+?</code></strong> which represents the <strong>shortest</strong> range of characters, <strong>not</strong> containing the string <strong><code>&lt;link</code></strong>, at <strong>any</strong> position, globally, between the <strong>first</strong> and <strong>last</strong> <strong><code>href</code></strong> attribute !</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65390</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65390</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 28 Apr 2021 02:10:38 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 01:54:35 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> said in <a href="/post/65388">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">SEARCH (?s)&lt;link\h+rel=“canonical”\h*\Khref=“([^”]+)“.+&lt;a href=”(?!\1).+?"</p>
</blockquote>
<p dir="auto">I used the below example set for my test and got the 3 mismatch hits I created. When I ran your regex on my example set I only got 1 hit. I think I see where your interpretation differed from mine. I did not know for sure there would ONLY be 2 https references in each file, the OP wasn’t specific enough. Now that I see your interpretation I can see that the OP may have suggested that. So certainly if that’s the case I have definitely overworked my regex.</p>
<p dir="auto">Cheers<br />
Terry</p>
<pre><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/love.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
&lt;link rel="canonical" href="https://mywebsite.com/en/ttt.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/ttt.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/sloven.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;

&lt;link rel="canonical" href="https://mywebsite.com/en/lovel.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/lovely.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;

&lt;link rel="canonical" href="https://mywebsite.com/en/lov.html"/&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/lov.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/65389</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65389</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 28 Apr 2021 01:54:35 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 02:22:23 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: hellena-crainicu">@<bdi>hellena-crainicu</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: terry-r">@<bdi>terry-r</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">I don’t think that working on <strong>copies</strong> is necessary ;-)) So, <strong>Hellena</strong>, simply use this <strong>regex</strong> :</p>
<p dir="auto">SEARCH <strong><code>(?s)&lt;link\h+rel="canonical"\h*\Khref="([^"]+)".+&lt;a href="(?!\1).+?"</code></strong></p>
<hr />
<p dir="auto"><strong>Notes</strong> :</p>
<ul>
<li>
<p dir="auto">The <strong><code>\h</code></strong> syntax is equivalent to the <strong><code>[\t\x20\xA0]</code></strong> syntax</p>
</li>
<li>
<p dir="auto">The <strong>group <code>1</code></strong> is the regex <strong><code>[^"]+</code></strong> and represents the link <strong><code>•••••</code></strong> in the expression <strong><code>&lt;link rel="canonical" href="•••••" /&gt;</code></strong></p>
</li>
<li>
<p dir="auto">Due to the <strong><code>&lt;a href="(?!\1).+?"</code></strong>, this link must <strong>not</strong> be present in the <strong><code>href</code></strong> <strong>atttribute</strong> of the <strong><code>&lt;a&gt;</code></strong> tag</p>
</li>
<li>
<p dir="auto">The <strong><code>\K</code></strong> feature <strong>cancels</strong> the match attempt so far ( <strong><code>&lt;link\h+rel="canonical"\h*</code></strong> ) and <strong>resets</strong> the working position of the regex engine at the word <strong><code>href</code></strong>. So, the <strong>overall</strong> regex will catch the range of chars between the <strong>first</strong> <strong><code>href="•••••"</code></strong> expression and the <strong>last</strong> <strong><code>href="•••••"</code></strong>, only !</p>
</li>
</ul>
<hr />
<p dir="auto">Finally, the <strong>main</strong> problem was to be <strong>sure</strong> that the range of chars, between the <strong>two</strong> double-quotes of the <strong>first</strong> link, does end at the <strong>closing</strong> double-quote and not <strong>later</strong>, because of the internal <strong>backtracking</strong> process of the regex engine !</p>
<p dir="auto">For instance, let’s suppose this text, with the <strong>same</strong> link <strong><code>test.com</code></strong></p>
<pre><code class="language-diff">href="test.com" test="value" href="test.com"
</code></pre>
<p dir="auto">Oddly, the regex <strong><code>(?-si)href="(.+?)".+href="(?!\1).+?"</code></strong> <strong>matches</strong> this text. The common sense tells that it shouldn’t as we have the <strong>negative</strong> look-ahead <strong><code>(?!\1)</code></strong> structure !?</p>
<p dir="auto">So <strong>why</strong> ? Let’s try to follow the regex engine process !</p>
<ul>
<li>
<p dir="auto">First the regex engine matches the <strong><code>href="</code></strong> string and catches the <strong>shortest</strong> range of chars till a <strong>double-quote</strong> so the value <strong><code>test.com</code></strong> is stored in group <strong><code>1</code></strong></p>
</li>
<li>
<p dir="auto">Then, it matches the part <strong><code>.+href="</code></strong>. But, as the <strong>second</strong> link is the <strong>same</strong> as the <strong>first</strong> one, the <strong>negative</strong> look-ahead, which follows, prevent from matching the <strong>remainder</strong> range of chars</p>
</li>
<li>
<p dir="auto">Now, that’s the <strong>important</strong> point : the regex engine backtracks and try, <em>by all means</em>, to get a <strong>positive</strong> match attempt !</p>
<ul>
<li>
<p dir="auto">The regex engine moves back to the location right after the <strong>first</strong> <strong><code>href="</code></strong> string and catches an <strong>other</strong> shortest range of chars till a <strong>double-quote</strong>. Thus, this time, the value <strong><code>test.com" test="value</code></strong> is stored as <strong>group <code>1</code></strong> ! Indeed that text is <strong>embedded</strong> between <strong><code>"</code></strong> !</p>
</li>
<li>
<p dir="auto">Then, again, it matches the part <strong><code>.+href="</code></strong>. And, now, as the second link <strong><code>test.com</code></strong> is obviously <strong>different</strong> from the contents of the <strong>group</strong> <code>1</code>** ( <strong><code>test.com" test="value</code></strong> ) the <strong>negative</strong> look-ahead returns <em>TRUE</em> and the overall regex <strong>wrongly</strong> matches the complete text <strong><code>href="test.com" test="value" href="test.com"</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto">We now understand the way to get the <strong>right</strong> regex. We just need to <strong>avoid</strong> that each char between <strong>double-quotes</strong> may be, <strong>themselves</strong>, a <strong><code>"</code></strong> char !</p>
</li>
<li>
<p dir="auto">So, the <strong>second</strong> regex version <strong><code>(?-si)href="([^"]+)".+href="(?!\1).+?"</code></strong>, as <strong>expected</strong>, does <strong>not</strong> find the text</p>
</li>
</ul>
<pre><code class="language-diff">href="test.com" test="value" href="test.com"
</code></pre>
<p dir="auto">And would get this one !</p>
<pre><code class="language-diff">href="test.com" test="value" href="tests.com"
</code></pre>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65388</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65388</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 28 Apr 2021 02:22:23 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Wed, 28 Apr 2021 00:37:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65371">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">I should find the files that doesn’t contain the same link up and down.</p>
</blockquote>
<p dir="auto">I may have found a way to search the html files without having to edit them. Even better is the “Find in Files” search result window will give precise locations as to filename and line number where you need to fix the problems of mismatched https references.</p>
<p dir="auto">It uses some advanced regex functions only available to Notepad++ since version 7.7. Read the excellent postings by <a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a>  <a href="https://community.notepad-plus-plus.org/topic/19632/new-backtracking-control-verbs-feature-available-since-notepad-v7-7">here</a>.</p>
<p dir="auto">In this solution I make use of the <code>(*SKIP)</code> function. The regex is:<br />
<code>(?s)^&lt;link rel.+?https://([^"]+).+?https://(*SKIP)(?!\1)</code><br />
So for a description we have the regex:</p>
<ol>
<li>finding a line starting with <code>&lt;link rel</code></li>
<li>advancing character by character (lazy) until the first <code>https://</code> sequence is found, then capturing it</li>
<li>continuing to capture characters so long as they are NOT the <code>"</code>. This is captured as group 1.</li>
<li>advancing character by character (lazy) until the next (second) <code>https://</code> sequence is found, then capturing it</li>
<li>this is where the fun begins. At this point it can advance to the next sub expression within the regex, the <code>(*SKIP)</code> is passed over going forward with no reaction. The next step is a test that the next few characters do NOT match the first https reference (group 1). Normally if this test fails, the regex will backtrack and attempt to consume more characters until this sub-expression is TRUE. That would normally mean progressing into the next group of <code>link rel...</code> With the <code>(*SKIP)</code> if the regex attempts to backtrack this prevents the backtrack from occurring and thus the regex fails overall.</li>
</ol>
<p dir="auto">So the use of <code>(*SKIP)</code> only allows the 2nd https reference to be tested against the first. A mismatch means success, a match would mean the regex fails and it restarts at the next group of <code>link rel...</code>.</p>
<p dir="auto">Hopefully I have this described correctly and also hopefully it will satisfy your initial request without the need to edit copies.’</p>
<p dir="auto">It did work on an open file I had with 5 groups, 3 of which had the mismatch and they were all correctly identified.</p>
<p dir="auto">So the major assumption is that your group (starting with <code>&lt;link rel</code> and ending just before the next <code>&lt;link rel</code> starts) has at least 2 https references and where the test is between the first and second https reference ONLY!</p>
<p dir="auto">Best of luck, I’d be very interested in knowing if this works for you with real world data so please do post back, good or bad. This is my first attempt at using <code>(*SKIP)</code>.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65387</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65387</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 28 Apr 2021 00:37:09 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Tue, 27 Apr 2021 21:49:59 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65382">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">y the way. you have 2 meta property. i don’t have any in my example :)</p>
</blockquote>
<p dir="auto">By the way. you have 2 meta property. i don’t have any in my example :)</p>
<pre><code>&lt;link rel="canonical" href="https://mywebsite.com/en/truth.html" /&gt;

text text
    
text

&lt;img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="https://mywebsite.com/en/trsuth.html"&gt;&lt;img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /&gt;&lt;/a&gt;
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/65384</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65384</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Tue, 27 Apr 2021 21:49:59 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Tue, 27 Apr 2021 21:50:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65382">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">It will ruin my html code.</p>
</blockquote>
<p dir="auto">It seems you missed my statement that as we will be editing the files, you need to work on “copies” of the files:<br />
<em><strong>So my solution means we will be editing the files somewhat so it should be done on a copy of the html files.</strong></em></p>
<p dir="auto">Terry</p>
<p dir="auto">PS I realise my image showing it working has meta property, but the regexes don’t use that now as I stated replace first regex with your amended detail “link…”<br />
The second regex doesn’t use that either so it isn’t affected by your change in example data.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65383</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65383</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Tue, 27 Apr 2021 21:50:43 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Tue, 27 Apr 2021 21:48:49 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a> said in <a href="/post/65375">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto">(?-s)(.+?https://([^"]+))(?=.+?https)(?!.+?\2)</p>
</blockquote>
<p dir="auto">yes, Terry. Your regex seems to be good, but only if I make the replacement. Now, if I have several html files, I cannot make that replacement just to find something. It will ruin my html code.</p>
<p dir="auto">And, by the way. you have 2 meta property. i don’t have any in my example :)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65382</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65382</guid><dc:creator><![CDATA[Hellena Crainicu]]></dc:creator><pubDate>Tue, 27 Apr 2021 21:48:49 GMT</pubDate></item><item><title><![CDATA[Reply to Regex: Find those files that doesn&#x27;t contain the same link in 2 different html tags on Tue, 27 Apr 2021 21:44:13 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/hellena-crainicu" aria-label="Profile: Hellena-Crainicu">@<bdi>Hellena-Crainicu</bdi></a> said in <a href="/post/65380">Regex: Find those files that doesn't contain the same link in 2 different html tags</a>:</p>
<blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/terry-r" aria-label="Profile: Terry-R">@<bdi>Terry-R</bdi></a>  your regex is not working</p>
</blockquote>
<p dir="auto">It worked for me in a small test, see:<br />
<img src="/assets/uploads/files/1619559622139-664c23c8-7a1a-45a9-8d5d-d7f22901a011-image.png" alt="664c23c8-7a1a-45a9-8d5d-d7f22901a011-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">Note I had 2 “groups”, one with matching https references, the other mismatching. It finds the mismatch and then states the end of the document has been reached signifying no further occurrences. If there were further occurrences it doesn’t show this statement.</p>
<p dir="auto">So not sure why your attempt failed. Does you real data not match the example, or do you have more then 2 https references for each group. Was it my first regex that failed or second. Try the solution on just 1 file inside of notepad++ so you can monitor it’s progress by using “Find” rather than replace for some steps.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/65381</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/65381</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Tue, 27 Apr 2021 21:44:13 GMT</pubDate></item></channel></rss>