[Notepad++] Remove everything except a specific string using regex
-
Hello everyone,
I need support with regular expressions.I have a very large query from which I need to extract only the names of the tables used in the query. The tables have different names, but fortunately, they all follow the same pattern: database.schema.table_name.
The table_name is always composed of 6 characters in total: 3 letters followed by 3 numbers. So the complete pattern is:
database.schema.[A-Za-z]{3}\d{3}.I’d like to remove everything that is between one pattern and the next, replacing it with nothing, effectively removing joins and various conditions that are in between.
For example, if I have:
SELECT * FROM database.schema.abc123 INNER JOIN database.schema.def456 on id1 = id2 LEFT JOIN database.schema.ghi789 on id2 = id3 WHERE 1=1I would like to have as a result:
database.schema.abc123 database.schema.def456 database.schema.ghi789I made a first attempt using the regular expression:
database.schema.[A-Za-z]{3}\d{3}(.|\s)*?database.schema.[A-Za-z]{3}\d{3}but it rightly includes in the selection also the string that I need to keep.
How can I set the regular expression to select only what is between two strings (different from each other but with the same pattern) excluding the strings of interest from the selection?
Thanks everyone for the support. 😊
-
@Francesco-Vespari I would do it as two separate search/replace passes.
The first removes all lines except those that start with
FROM.
Search:(?-i)^((?!FROM ).*\R)+
Replace:(blank or nothing)With that I’m doing
(?!FROM ).*\Rwhich removes one line at a time but wrapped that in an extra()+as Notepad++ tends to take its time doing a delete regardless of how large it is. The()+deletes multiple consecutive lines that don’t start withFROMOnce you have that then your original regular expression should work well as there is no extraneous noise in the file.
It can be done in one search/replace but it’s much harder to understand and maintain than the two-pass method that first removes the noise.
-
@Francesco-Vespari said in [Notepad++] Remove everything except a specific string using regex:
How can I set the regular expression to select only what is between two strings (different from each other but with the same pattern) excluding the strings of interest from the selection?
I’ll focus on the general idea, giving hints as to how to do it for your specific case. But there are a plethora of ways. For example, if in
A1BA2you wanted to keep the “good”A1andA2and delete just the “bad” sectionB. There would be two common ways I’d do such a task:- you could capture the A1 and A2 into groups, and use those groups (rather than empty string) in the replacement: FIND =
(A1)B(A2), REPLACE =$1$2 - you could use a combination of
\Kcontrol flow escape (which says the stuff to the left of this symbol must match, but I don’t want it replaced by the replacement) and a lookahead assertion: FIND =A1\KB(?=A2), REPLACE = empty
(
A1,B, andA2are meant to be placeholders, which would need to be real regexFor your specific case, the
Bwould be your(.|\s)*?and theA1andA2would be the surrounding stuff.A few other specific notes on your attempted regex:
database.schema.[A-Za-z]{3}\d{3}(.|\s)*?database.schema.[A-Za-z]{3}\d{3}- the
.between the tokens will match anything, not just a literal dot character. sodatabase-schema-would also match; use\.to match literal dot. (.|\s)*?could be written as just.*?–.already matches a normal space. If you want that to also match across lines, you can use the. matches newlinecheckmarked, or prefix your whole regex with(?s)- since the
A1andA2portions of your regex are the same regex code, you can make use of numbered recursion – so my example regex could be(A1)\KB(?=(?1))which puts the regexA1into group#1, then the(?1)inside the lookahead says “use the same regex as for group#1, but use it here”
- you could capture the A1 and A2 into groups, and use those groups (rather than empty string) in the replacement: FIND =
-
Hello, @francesco-vespari, @mkupper, @peterjones and all,
Here is my alternate solution :
SEARCH
(?-is)^(?:(?!data).)+\R|(?:(?!data).)+|(database\.schema\.[A-Za-z]{3}[0-9]{3})REPLACE
?1$1This solution keeps any empty line which already exists in the present text :
So, for example, given this INPUT text :
SELECT * FROM database.schema.ABC123 INNER JOIN database.schema.def456 on id1 = id2 LEFT JOIN database.schema.ghi789 on id2 = id3 WHERE 1=1 SELECT * FROM database.schema.ABC123 INNER JOIN database.schema.def456 on id1 = id2 LEFT JOIN database.schema.ghi789 on id2 = id3 WHERE 1=1 SELECT * FROM database.schema.ABC123 INNER JOIN database.schema.def456 on id1 = id2 LEFT JOIN database.schema.ghi789 on id2 = id3 WHERE 1=1we would get this OUTPUT one :
database.schema.ABC123 database.schema.def456 database.schema.ghi789 database.schema.ABC123 database.schema.def456 database.schema.ghi789 database.schema.ABC123 database.schema.def456 database.schema.ghi789
Notes :
-
The first part of the search regex, after the modifiers
(?-is)is^(?:(?!data).)+\Rwhich matches any complete line, with its line-break, ONLY IF it does NOT contain the stringdata, with this case, at any position of current line -
The second alternative
(?:(?!data).)+matches any part of line, which does NOT contain the stringdata, with this case -
Until now, just note that no group has been defined because of the two non-capturing syntaxes
(?:... -
Finally, the third alternative is the string that we want to keep, i.e. the string
(database\.schema\.[A-Za-z]{3}[0-9]{3}), with this exact case, which is stored as group1 -
As we use a conditional replacement
?1$1, which means : ONLY rewrite the group1if this group is defined, we are just left with the expected text !
Best Regards,
guy038
-
-
This post is deleted!