• Login
Community
  • Login

Find duplicate variable declaration Regex

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 4 Posters 725 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Stuart Dyer
    last edited by Jun 5, 2023, 5:07 AM

    I’m looking to build a regex that will find duplicate string/variables. An example of the input as follows:

         <SymbolVar name="DATABASE_NAME" value="DB1"/>
         <SymbolVar name="DATABASE_NAME" value="DB2"/>
         <SymbolVar name="PORT" value="12345"/>
         <SymbolVar name="SERVER_NAME" value="Server1"/>
         <SymbolVar name="SERVER_NAME" value="Server2"/>
    

    The returned find should match “DATABASE_NAME” and “SERVER_NAME” present both twice.

    My first attempt as follows, doesn’t quite cut it

    (?m)<SymbolVar name="(\S+).\R(?s).?\K\1

    M L 2 Replies Last reply Jun 5, 2023, 6:13 AM Reply Quote 0
    • M
      Mark Olson @Stuart Dyer
      last edited by Mark Olson Jun 5, 2023, 6:15 AM Jun 5, 2023, 6:13 AM

      @Stuart-Dyer
      (?s-i)<SymbolVar\s+name\s*=\s*"([^"]+)".+<SymbolVar\s+name\s*=\s*\K"\1" should do it.

      It will highlight every duplicate instance (that is, it won’t highlight the first occurrence of any name, only the first.

      b380be1f-3b9c-407d-8e29-e9b84d9754c7-image.png

      I should note that I use a bunch of \s+ where you could probably just use a simple space and a bunch of \s* that can likely be omitted. That’s mostly just because I like to play it safe with my regexes and not assume that insignificant whitespace will always be the same.

      1 Reply Last reply Reply Quote 1
      • L
        Lycan Thrope @Stuart Dyer
        last edited by Lycan Thrope Jun 5, 2023, 8:57 PM Jun 5, 2023, 8:56 PM

        @Stuart-Dyer ,
        @Mark-Olson has come with a solution, but I find your description of the issue to be vague and improperly defined, and if @Mark-Olson 's answer is not what you wanted, then I suggest you give a better description of your desired before and after results, before someone can adequately come up with an answer.

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Jun 6, 2023, 4:05 AM Jun 6, 2023, 3:57 AM

          Hello, @stuart-dyer, @mark-olson, @lycan-thrope and All,

          Oh… @stuart-dyer, I think I’ve found an very easy way to visualize all the duplicated values of any attribute, of your HTML text , in one go !


          So, let’s start with this INPUT text :

               <SymbolVar name="DATABASE_NAME" value="DB1"/>
               <SymbolVar name="DATABASE_NAME" value="DB2"/>
               <SymbolVar name="ABCD" value="123"/>
               <SymbolVar name="PORT" value="34"/>
               <SymbolVar name="SERVER_NAME" value="Server1"/>
               <SymbolVar name="EFGH" value="123"/>
               <SymbolVar name="IJKL" value="456"/>
               <SymbolVar name="SERVER_NAME" value="Server2"/>
               <SymbolVar name="SERVER_NAME" value="Server3"/>
               <SymbolVar name="MNOP" value="456"/>
               <SymbolVar name="DATABASE_NAME" value="DB3"/>
          
          • Duplicate your entire text, right AFTER a separation line of, at least, 3 sharp characters. If you text already contains some # characters , just use an other separator !

          So, from our simple example, we get this text :

               <SymbolVar name="DATABASE_NAME" value="DB1"/>
               <SymbolVar name="DATABASE_NAME" value="DB2"/>
               <SymbolVar name="ABCD" value="123"/>
               <SymbolVar name="PORT" value="34"/>
               <SymbolVar name="SERVER_NAME" value="Server1"/>
               <SymbolVar name="EFGH" value="123"/>
               <SymbolVar name="IJKL" value="456"/>
               <SymbolVar name="SERVER_NAME" value="Server2"/>
               <SymbolVar name="SERVER_NAME" value="Server3"/>
               <SymbolVar name="MNOP" value="456"/>
               <SymbolVar name="DATABASE_NAME" value="DB2"/>
          ################
               <SymbolVar name="DATABASE_NAME" value="DB1"/>
               <SymbolVar name="DATABASE_NAME" value="DB2"/>
               <SymbolVar name="ABCD" value="123"/>
               <SymbolVar name="PORT" value="34"/>
               <SymbolVar name="SERVER_NAME" value="Server1"/>
               <SymbolVar name="EFGH" value="123"/>
               <SymbolVar name="IJKL" value="456"/>
               <SymbolVar name="SERVER_NAME" value="Server2"/>
               <SymbolVar name="SERVER_NAME" value="Server3"/>
               <SymbolVar name="MNOP" value="456"/>
               <SymbolVar name="DATABASE_NAME" value="DB2"/>
          
          • Open the Mark dialog ( Ctrl + M )

          • Untick all box options

          • Check the Purge for each search option

          • Possibly, check the Bookmark line and Wrap around options

          • Type in, in the Find what zone, this simple regex to mark all the duplicated values of any attribute :

            • MARK (?s)="\K(.+?)(?=".*###)(?=".+?\1.+?\1)
          • Click on Mark All button

          Et voilà !


          Note : You can easily adapt this regex to visualize the duplicate values of a specific attribute ! For instance :

          • The regex (?s)name="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the name attribute, only

          • The regex (?s)value="\K(.+?)(?=".*###)(?=".+?\1.+?\1) would mark all the duplicate values of the value attribute, only

          When you finish working of your HTML file, simply delete the last part, with the separator line of # chars !


          How this regex works ?

          • As our search is a multi-lignes one, we use the (?s) modifier to represents text as an unique line

          • As we want to aim the values of the attributes, we first search for the string ="

          • Then, due to the \K syntax, the current search is cancelled and the regex engine searches for any text till the nearest " character excluded, but ONLY IF two conditions are respected :

            • The line of # charactes must be always located AFTER the current search, due to the (?=".*###) look-head

            • At least, two other occurrences, of the searched value \1, must be present, AFTER the current location, due to the (?=".+?\1.+?\1) look-ahead. Indeed, as we now duplicated our current file, all unique values of attributes, in this modified file, appear two times ONLY and so will not be marked !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • S
            Stuart Dyer
            last edited by Jun 7, 2023, 4:21 AM

            Thanks @Mark-Olson / @guy038

            Both solutions work a treat !

            Appreciate the fast turn around

            1 Reply Last reply Reply Quote 1
            3 out of 5
            • First post
              3/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors