Regex: How can I find those html files with links that are not identical in different places?
- 
 I have this link at the beginning of html page: <link rel="canonical" href="https://xxx.com/en/page-AAA.html" />also I have another link on the middle of the file: <a href="https://xxx.com/en/page-AAA.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>You see that there are the same links, but in different contexts and places. Compare it. But how can I find those html files with links that are not identical in those different places? Suppose the first link will be: <link rel="canonical" href="https://xxx.com/en/page-CCC.html" />in this case are not identical those too, so regex should find that file that contains different links.How can I do this with Regex? 
- 
 Hi, @robin-Cruise and All, Let’s suppose you have, at least, two links of the form https://xxx.com/en/••••••••••.••••, where the part••••••••••.••••is different.Then, the regex (?s)(https://xxx.com/en/)([^"]+)".+?\1(?!\2").+?"will match the range between these two links, included !Thus, the regex does not match anything if all the https://xxx.com/en/••••••••••.••••, of current file, have the same••••••••••.••••part.Best Regards, guy038 
- 
 This post is deleted!
- 
 @guy038 thanks a lot. You are the best ! 
- 
 by the way, @guy038 Can you explain what does this part of your regex do? \1(?!\2")
- 
 Hello, @robin-cruise and All, In the search regex (?s)(https://xxx.com/en/)([^"]+)".+?\1(?!\2").+?":- 
The regex part https://xxx.com/en/looks for the literal string https://xxx.com/en/, stored as group1
- 
The regex part ([^"]+)"represents the remainder of the internet address ( for instance the string page-AAA.html ), followed with a double-quote, because[^"]+is a non-null range of consecutive chars, all different from", stored as group2
- 
Now, the part .+?stands for the shortest range of any char till…- 
The group 1(\1). So an other string https://xxx.com/en/
- 
Which must be followed by .+?", which represents the shortest non-null range of any char before a double-quote…
- 
But ONLY IF this range is different from \2( i.e. different, for instance, from the string page-AAA.html and a"char )
 
- 
 Note also that the [^"]+"syntax, without the parentheses, is more restrictive than.+?"and must be preferred because of the negative look-ahead(?!\2")Besst Regards, guy038 
- 
