Find duplicate variable declaration Regex
-
I’m looking to build a regex that will find duplicate string/variables. An example of the input as follows:
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="PORT" value="12345"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="SERVER_NAME" value="Server2"/>
The returned find should match “DATABASE_NAME” and “SERVER_NAME” present both twice.
My first attempt as follows, doesn’t quite cut it
(?m)<SymbolVar name="(\S+).\R(?s).?\K\1
-
@Stuart-Dyer
(?s-i)<SymbolVar\s+name\s*=\s*"([^"]+)".+<SymbolVar\s+name\s*=\s*\K"\1"
should do it.It will highlight every duplicate instance (that is, it won’t highlight the first occurrence of any name, only the first.
I should note that I use a bunch of
\s+
where you could probably just use a simple space and a bunch of\s*
that can likely be omitted. That’s mostly just because I like to play it safe with my regexes and not assume that insignificant whitespace will always be the same. -
@Stuart-Dyer ,
@Mark-Olson has come with a solution, but I find your description of the issue to be vague and improperly defined, and if @Mark-Olson 's answer is not what you wanted, then I suggest you give a better description of your desired before and after results, before someone can adequately come up with an answer. -
Hello, @stuart-dyer, @mark-olson, @lycan-thrope and All,
Oh… @stuart-dyer, I think I’ve found an very easy way to visualize all the duplicated values of any attribute, of your
HTML
text , in one go !
So, let’s start with this INPUT text :
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB3"/>
- Duplicate your entire text, right AFTER a separation line of, at least,
3
sharp characters. If you text already contains some#
characters , just use an other separator !
So, from our simple example, we get this text :
<SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> ################ <SymbolVar name="DATABASE_NAME" value="DB1"/> <SymbolVar name="DATABASE_NAME" value="DB2"/> <SymbolVar name="ABCD" value="123"/> <SymbolVar name="PORT" value="34"/> <SymbolVar name="SERVER_NAME" value="Server1"/> <SymbolVar name="EFGH" value="123"/> <SymbolVar name="IJKL" value="456"/> <SymbolVar name="SERVER_NAME" value="Server2"/> <SymbolVar name="SERVER_NAME" value="Server3"/> <SymbolVar name="MNOP" value="456"/> <SymbolVar name="DATABASE_NAME" value="DB2"/>
-
Open the Mark dialog (
Ctrl + M
) -
Untick all box options
-
Check the
Purge for each search
option -
Possibly, check the
Bookmark line
andWrap around
options -
Type in, in the
Find what
zone, this simple regex to mark all the duplicated values of anyattribute
:- MARK
(?s)="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
- MARK
-
Click on
Mark All
button
Et voilà !
Note : You can easily adapt this regex to visualize the duplicate values of a specific attribute ! For instance :
-
The regex
(?s)name="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
would mark all the duplicate values of thename
attribute, only -
The regex
(?s)value="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
would mark all the duplicate values of thevalue
attribute, only
When you finish working of your
HTML
file, simply delete the last part, with the separator line of#
chars !
How this regex works ?
-
As our search is a multi-lignes one, we use the
(?s)
modifier to represents text as an unique line -
As we want to aim the values of the attributes, we first search for the string
="
-
Then, due to the
\K
syntax, the current search is cancelled and the regex engine searches for any text till the nearest"
character excluded, but ONLY IF two conditions are respected :-
The line of
#
charactes must be always located AFTER the current search, due to the(?=".*###)
look-head -
At least, two other occurrences, of the searched value
\1
, must be present, AFTER the current location, due to the(?=".+?\1.+?\1)
look-ahead. Indeed, as we now duplicated our current file, all unique values of attributes, in this modified file, appear two times ONLY and so will not be marked !
-
Best Regards,
guy038
- Duplicate your entire text, right AFTER a separation line of, at least,
-