Community
    • Login

    [Notepad++] Remove everything except a specific string using regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regexregex group replacesql
    5 Posts 4 Posters 844 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Francesco VespariF
      Francesco Vespari
      last edited by

      Hello everyone,
      I need support with regular expressions.

      I have a very large query from which I need to extract only the names of the tables used in the query. The tables have different names, but fortunately, they all follow the same pattern: database.schema.table_name.

      The table_name is always composed of 6 characters in total: 3 letters followed by 3 numbers. So the complete pattern is:

      database.schema.[A-Za-z]{3}\d{3}.
      

      I’d like to remove everything that is between one pattern and the next, replacing it with nothing, effectively removing joins and various conditions that are in between.

      For example, if I have:

      SELECT *
      FROM database.schema.abc123
      INNER JOIN database.schema.def456 on id1 = id2
      LEFT JOIN database.schema.ghi789 on id2 = id3
      WHERE 1=1
      

      I would like to have as a result:

      database.schema.abc123
      database.schema.def456
      database.schema.ghi789
      

      I made a first attempt using the regular expression:

      database.schema.[A-Za-z]{3}\d{3}(.|\s)*?database.schema.[A-Za-z]{3}\d{3}
      

      but it rightly includes in the selection also the string that I need to keep.

      How can I set the regular expression to select only what is between two strings (different from each other but with the same pattern) excluding the strings of interest from the selection?

      Thanks everyone for the support. 😊

      mkupperM PeterJonesP 2 Replies Last reply Reply Quote 1
      • mkupperM
        mkupper @Francesco Vespari
        last edited by

        @Francesco-Vespari I would do it as two separate search/replace passes.

        The first removes all lines except those that start with FROM.
        Search: (?-i)^((?!FROM ).*\R)+
        Replace: (blank or nothing)

        With that I’m doing (?!FROM ).*\R which removes one line at a time but wrapped that in an extra ()+ as Notepad++ tends to take its time doing a delete regardless of how large it is. The ()+ deletes multiple consecutive lines that don’t start with FROM

        Once you have that then your original regular expression should work well as there is no extraneous noise in the file.

        It can be done in one search/replace but it’s much harder to understand and maintain than the two-pass method that first removes the noise.

        1 Reply Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @Francesco Vespari
          last edited by

          @Francesco-Vespari said in [Notepad++] Remove everything except a specific string using regex:

          How can I set the regular expression to select only what is between two strings (different from each other but with the same pattern) excluding the strings of interest from the selection?

          I’ll focus on the general idea, giving hints as to how to do it for your specific case. But there are a plethora of ways. For example, if in A1BA2 you wanted to keep the “good” A1 and A2 and delete just the “bad” section B. There would be two common ways I’d do such a task:

          • you could capture the A1 and A2 into groups, and use those groups (rather than empty string) in the replacement: FIND = (A1)B(A2) , REPLACE = $1$2
          • you could use a combination of \K control flow escape (which says the stuff to the left of this symbol must match, but I don’t want it replaced by the replacement) and a lookahead assertion: FIND = A1\KB(?=A2) , REPLACE = empty

          (A1, B, and A2 are meant to be placeholders, which would need to be real regex

          For your specific case, the B would be your (.|\s)*? and the A1 and A2 would be the surrounding stuff.

          A few other specific notes on your attempted regex:

          database.schema.[A-Za-z]{3}\d{3}(.|\s)*?database.schema.[A-Za-z]{3}\d{3}

          • the . between the tokens will match anything, not just a literal dot character. so database-schema- would also match; use \. to match literal dot.
          • (.|\s)*? could be written as just .*? – . already matches a normal space. If you want that to also match across lines, you can use the . matches newline checkmarked, or prefix your whole regex with (?s)
          • since the A1 and A2 portions of your regex are the same regex code, you can make use of numbered recursion – so my example regex could be (A1)\KB(?=(?1)) which puts the regex A1 into group#1, then the (?1) inside the lookahead says “use the same regex as for group#1, but use it here”
          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @francesco-vespari, @mkupper, @peterjones and all,

            Here is my alternate solution :

            SEARCH (?-is)^(?:(?!data).)+\R|(?:(?!data).)+|(database\.schema\.[A-Za-z]{3}[0-9]{3})

            REPLACE ?1$1

            This solution keeps any empty line which already exists in the present text :

            So, for example, given this INPUT text :

            SELECT *
            FROM database.schema.ABC123
            INNER JOIN database.schema.def456 on id1 = id2
            LEFT JOIN database.schema.ghi789 on id2 = id3
            
            WHERE 1=1
            
            SELECT *
            FROM database.schema.ABC123
            INNER JOIN database.schema.def456 on id1 = id2
            LEFT JOIN database.schema.ghi789 on id2 = id3
            WHERE 1=1
            
            
            
            
            
            SELECT *
            FROM database.schema.ABC123
            INNER JOIN database.schema.def456 on id1 = id2
            LEFT JOIN database.schema.ghi789 on id2 = id3
            WHERE 1=1
            

            we would get this OUTPUT one :

            database.schema.ABC123
            database.schema.def456
            database.schema.ghi789
            
            
            database.schema.ABC123
            database.schema.def456
            database.schema.ghi789
            
            
            
            
            
            database.schema.ABC123
            database.schema.def456
            database.schema.ghi789
            

            Notes :

            • The first part of the search regex, after the modifiers (?-is) is ^(?:(?!data).)+\R which matches any complete line, with its line-break, ONLY IF it does NOT contain the string data, with this case, at any position of current line

            • The second alternative (?:(?!data).)+ matches any part of line, which does NOT contain the string data, with this case

            • Until now, just note that no group has been defined because of the two non-capturing syntaxes (?:...

            • Finally, the third alternative is the string that we want to keep, i.e. the string (database\.schema\.[A-Za-z]{3}[0-9]{3}), with this exact case, which is stored as group 1

            • As we use a conditional replacement ?1$1, which means : ONLY rewrite the group 1 if this group is defined, we are just left with the expected text !

            Best Regards,

            guy038

            Francesco VespariF 1 Reply Last reply Reply Quote 3
            • Francesco VespariF
              Francesco Vespari @guy038
              last edited by

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors