Community
    • Login

    Find duplicate variable declaration Regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 1.4k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stuart DyerS Offline
      Stuart Dyer
      last edited by

      I’m looking to build a regex that will find duplicate string/variables. An example of the input as follows:

           <SymbolVar name="DATABASE_NAME" value="DB1"/>
           <SymbolVar name="DATABASE_NAME" value="DB2"/>
           <SymbolVar name="PORT" value="12345"/>
           <SymbolVar name="SERVER_NAME" value="Server1"/>
           <SymbolVar name="SERVER_NAME" value="Server2"/>
      

      The returned find should match “DATABASE_NAME” and “SERVER_NAME” present both twice.

      My first attempt as follows, doesn’t quite cut it

      (?m)<SymbolVar name="(\S+).\R(?s).?\K\1

      Mark OlsonM Lycan ThropeL 2 Replies Last reply Reply Quote 0
      • Mark OlsonM Offline
        Mark Olson @Stuart Dyer
        last edited by Mark Olson

        @Stuart-Dyer
        (?s-i)<SymbolVar\s+name\s*=\s*"([^"]+)".+<SymbolVar\s+name\s*=\s*\K"\1" should do it.

        It will highlight every duplicate instance (that is, it won’t highlight the first occurrence of any name, only the first.

        b380be1f-3b9c-407d-8e29-e9b84d9754c7-image.png

        I should note that I use a bunch of \s+ where you could probably just use a simple space and a bunch of \s* that can likely be omitted. That’s mostly just because I like to play it safe with my regexes and not assume that insignificant whitespace will always be the same.

        1 Reply Last reply Reply Quote 1
        • Lycan ThropeL Offline
          Lycan Thrope @Stuart Dyer
          last edited by Lycan Thrope

          @Stuart-Dyer ,
          @Mark-Olson has come with a solution, but I find your description of the issue to be vague and improperly defined, and if @Mark-Olson 's answer is not what you wanted, then I suggest you give a better description of your desired before and after results, before someone can adequately come up with an answer.

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hello, @stuart-dyer, @mark-olson, @lycan-thrope and All,

            Oh… @stuart-dyer, I think I’ve found an very easy way to visualize all the duplicated values of any attribute, of your HTML text , in one go !


            So, let’s start with this INPUT text :

                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB3"/>
            
            • Duplicate your entire text, right AFTER a separation line of, at least, 3 sharp characters. If you text already contains some # characters , just use an other separator !

            So, from our simple example, we get this text :

                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
            ################
                 <SymbolVar name="DATABASE_NAME" value="DB1"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
                 <SymbolVar name="ABCD" value="123"/>
                 <SymbolVar name="PORT" value="34"/>
                 <SymbolVar name="SERVER_NAME" value="Server1"/>
                 <SymbolVar name="EFGH" value="123"/>
                 <SymbolVar name="IJKL" value="456"/>
                 <SymbolVar name="SERVER_NAME" value="Server2"/>
                 <SymbolVar name="SERVER_NAME" value="Server3"/>
                 <SymbolVar name="MNOP" value="456"/>
                 <SymbolVar name="DATABASE_NAME" value="DB2"/>
            
            • Open the Mark dialog ( Ctrl + M )

            • Untick all box options

            • Check the Purge for each search option

            • Possibly, check the Bookmark line and Wrap around options

            • Type in, in the Find what zone, this simple regex to mark all the duplicated values of any attribute :

              • MARK (?s)="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
            • Click on Mark All button

            Et voilà !


            Note : You can easily adapt this regex to visualize the duplicate values of a specific attribute ! For instance :

            • The regex (?s)name="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the name attribute, only

            • The regex (?s)value="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the value attribute, only

            When you finish working of your HTML file, simply delete the last part, with the separator line of # chars !


            How this regex works ?

            • As our search is a multi-lignes one, we use the (?s) modifier to represents text as an unique line

            • As we want to aim the values of the attributes, we first search for the string ="

            • Then, due to the \K syntax, the current search is cancelled and the regex engine searches for any text till the nearest " character excluded, but ONLY IF two conditions are respected :

              • The line of # charactes must be always located AFTER the current search, due to the (?=".*###) look-head

              • At least, two other occurrences, of the searched value \1, must be present, AFTER the current location, due to the (?=".+?\1.+?\1) look-ahead. Indeed, as we now duplicated our current file, all unique values of attributes, in this modified file, appear two times ONLY and so will not be marked !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 1
            • Stuart DyerS Offline
              Stuart Dyer
              last edited by

              Thanks @Mark-Olson / @guy038

              Both solutions work a treat !

              Appreciate the fast turn around

              1 Reply Last reply Reply Quote 1

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors