Community
    • Login

    Find duplicate variable declaration Regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 705 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stuart DyerS
      Stuart Dyer
      last edited by

      I’m looking to build a regex that will find duplicate string/variables. An example of the input as follows:

           <SymbolVar name="DATABASE_NAME" value="DB1"/>
           <SymbolVar name="DATABASE_NAME" value="DB2"/>
           <SymbolVar name="PORT" value="12345"/>
           <SymbolVar name="SERVER_NAME" value="Server1"/>
           <SymbolVar name="SERVER_NAME" value="Server2"/>
      

      The returned find should match “DATABASE_NAME” and “SERVER_NAME” present both twice.

      My first attempt as follows, doesn’t quite cut it

      (?m)<SymbolVar name="(\S+).\R(?s).?\K\1

      Mark OlsonM Lycan ThropeL 2 Replies Last reply Reply Quote 0
      • Mark OlsonM
        Mark Olson @Stuart Dyer
        last edited by Mark Olson

        @Stuart-Dyer
        (?s-i)<SymbolVar\s+name\s*=\s*"([^"]+)".+<SymbolVar\s+name\s*=\s*\K"\1" should do it.

        It will highlight every duplicate instance (that is, it won’t highlight the first occurrence of any name, only the first.

        b380be1f-3b9c-407d-8e29-e9b84d9754c7-image.png

        I should note that I use a bunch of \s+ where you could probably just use a simple space and a bunch of \s* that can likely be omitted. That’s mostly just because I like to play it safe with my regexes and not assume that insignificant whitespace will always be the same.

        1 Reply Last reply Reply Quote 1
        • Lycan ThropeL
          Lycan Thrope @Stuart Dyer
          last edited by Lycan Thrope

          @Stuart-Dyer ,
          @Mark-Olson has come with a solution, but I find your description of the issue to be vague and improperly defined, and if @Mark-Olson 's answer is not what you wanted, then I suggest you give a better description of your desired before and after results, before someone can adequately come up with an answer.

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hello, @stuart-dyer, @mark-olson, @lycan-thrope and All,

            Oh… @stuart-dyer, I think I’ve found an very easy way to visualize all the duplicated values of any attribute, of your HTML text , in one go !


            So, let’s start with this INPUT text :

                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB3"/>
            
            • Duplicate your entire text, right AFTER a separation line of, at least, 3 sharp characters. If you text already contains some # characters , just use an other separator !

            So, from our simple example, we get this text :

                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
            ################
                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
            
            • Open the Mark dialog ( Ctrl + M )

            • Untick all box options

            • Check the Purge for each search option

            • Possibly, check the Bookmark line and Wrap around options

            • Type in, in the Find what zone, this simple regex to mark all the duplicated values of any attribute :

              • MARK (?s)="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
            • Click on Mark All button

            Et voilà !


            Note : You can easily adapt this regex to visualize the duplicate values of a specific attribute ! For instance :

            • The regex (?s)name="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the name attribute, only

            • The regex (?s)value="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the value attribute, only

            When you finish working of your HTML file, simply delete the last part, with the separator line of # chars !


            How this regex works ?

            • As our search is a multi-lignes one, we use the (?s) modifier to represents text as an unique line

            • As we want to aim the values of the attributes, we first search for the string ="

            • Then, due to the \K syntax, the current search is cancelled and the regex engine searches for any text till the nearest " character excluded, but ONLY IF two conditions are respected :

              • The line of # charactes must be always located AFTER the current search, due to the (?=".*###) look-head

              • At least, two other occurrences, of the searched value \1, must be present, AFTER the current location, due to the (?=".+?\1.+?\1) look-ahead. Indeed, as we now duplicated our current file, all unique values of attributes, in this modified file, appear two times ONLY and so will not be marked !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 1
            • Stuart DyerS
              Stuart Dyer
              last edited by

              Thanks @Mark-Olson / @guy038

              Both solutions work a treat !

              Appreciate the fast turn around

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors