Find duplicate variable declaration Regex
-
I’m looking to build a regex that will find duplicate string/variables. An example of the input as follows:
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="PORT" value="12345"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="SERVER_NAME" value="Server2"/>The returned find should match “DATABASE_NAME” and “SERVER_NAME” present both twice.
My first attempt as follows, doesn’t quite cut it
(?m)<SymbolVar name="(\S+).\R(?s).?\K\1
-
@Stuart-Dyer
(?s-i)<SymbolVar\s+name\s*=\s*"([^"]+)".+<SymbolVar\s+name\s*=\s*\K"\1"should do it.It will highlight every duplicate instance (that is, it won’t highlight the first occurrence of any name, only the first.

I should note that I use a bunch of
\s+where you could probably just use a simple space and a bunch of\s*that can likely be omitted. That’s mostly just because I like to play it safe with my regexes and not assume that insignificant whitespace will always be the same. -
@Stuart-Dyer ,
@Mark-Olson has come with a solution, but I find your description of the issue to be vague and improperly defined, and if @Mark-Olson 's answer is not what you wanted, then I suggest you give a better description of your desired before and after results, before someone can adequately come up with an answer. -
Hello, @stuart-dyer, @mark-olson, @lycan-thrope and All,
Oh… @stuart-dyer, I think I’ve found an very easy way to visualize all the duplicated values of any attribute, of your
HTMLtext , in one go !
So, let’s start with this INPUT text :
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB3"/>- Duplicate your entire text, right AFTER a separation line of, at least,
3sharp characters. If you text already contains some#characters , just use an other separator !
So, from our simple example, we get this text :
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> ################ <SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB2"/>-
Open the Mark dialog (
Ctrl + M) -
Untick all box options
-
Check the
Purge for each searchoption -
Possibly, check the
Bookmark lineandWrap aroundoptions -
Type in, in the
Find whatzone, this simple regex to mark all the duplicated values of anyattribute:- MARK
(?s)="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
- MARK
-
Click on
Mark Allbutton
Et voilà !
Note : You can easily adapt this regex to visualize the duplicate values of a specific attribute ! For instance :
-
The regex
(?s)name="\K(.+?)(?=".*###)(?=".+?\1.+?\1)would mark all the duplicate values of thenameattribute, only -
The regex
(?s)value="\K(.+?)(?=".*###)(?=".+?\1.+?\1)would mark all the duplicate values of thevalueattribute, only
When you finish working of your
HTMLfile, simply delete the last part, with the separator line of#chars !
How this regex works ?
-
As our search is a multi-lignes one, we use the
(?s)modifier to represents text as an unique line -
As we want to aim the values of the attributes, we first search for the string
=" -
Then, due to the
\Ksyntax, the current search is cancelled and the regex engine searches for any text till the nearest"character excluded, but ONLY IF two conditions are respected :-
The line of
#charactes must be always located AFTER the current search, due to the(?=".*###)look-head -
At least, two other occurrences, of the searched value
\1, must be present, AFTER the current location, due to the(?=".+?\1.+?\1)look-ahead. Indeed, as we now duplicated our current file, all unique values of attributes, in this modified file, appear two times ONLY and so will not be marked !
-
Best Regards,
guy038
- Duplicate your entire text, right AFTER a separation line of, at least,
-