Regex search/replace wildcard
-
I need to do a search and replace: I need to replace something like “roundingWidth=**” with “roundingWidth=20”, where the asterisks could be a 1- or 2-digit number.
I have struggled and failed trying to come up with a regular expression to do this.
Regex is extremely powerful, but damned if I can wrap my head around it! I think some examples like this will help get me started.
Thanks,
-
assuming that the double quotes are part of the text a regex could look like
"roundingWidth=\d{1,2}"
\d is for any digit
{1,2} means either one digit or two digits.Cheers
Claudia -
-
Thank you, that did the trick! I have another one for you please:
I need to replace the following line with the one after that. The asterisk could be text of arbitrary length but will always be between quote marks.
alignment=“*”
alignment=“middleLeft”Thanks!
-
I had to smile because you, maybe, nearly posted the solution.
In regex a . (dot) represents a single character and together with the * (asterisks)
it can be used to match variable length chars.So you search for
alignment=".*"
and replace with
alignment="middleLeft"
but this assumes that alignment= is the only text in the line.
If this isn’t the case and there is additional text with quotes you might replace the search withalignment=".*?"
The difference is that the first is greedy and tries to match as much as possible whereas the latter is non-greedy
and matches a less as possible.Cheers
Claudia -
Rather than using
alignment=".?*"
I normally use the more restrictive
alignment="[^"]*"
so it only matches non-double-quotes. For most of the places I use this sort of search in a replace-all I go the step further and use
alignment="[^"\r\n]*"
to restrict it to matching strings that do not include a line break. The simple use of .?* would probably be OK but the time to type a few extra characters is negligible compared to my being confident that a replace-all will only change the places I want.
-
This is great, thank you! Four varations of of a similar task - this will help a lot in my regex education
-
Hello, Ryan, Claudia, AdrianHHH and All,
Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :
a.*z matches a lowercase letter a, followed by the LONGEST range of characters, even EMPTY, till a lowercase letter z a.*?z matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z a.+z matches a lowercase letter a, followed by the LONGEST range of characters, NON empty, till a lowercase letter z a.+?z matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty, till a lowercase letter z
Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !
AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !
-
(?-s)alignment=".*?"
-
alignment="[^"\r\n]*"
Similarly, the two syntaxes, below, are strictly identical, too :
-
(?s)alignment=".*?"
-
alignment="[^"]*"
The reason is that you reach a final UNIQUE character ( a quote mark )
"
Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :
-
The regex
123.+?5
, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5 -
And the regex, almost identical,
123.+?56
, which-
Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit
-
But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56
-
So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex
123.+?5
finds four occurrences, whereas the the second regex123.+?56
would, only, find two occurrences !
Here is a summary example :
Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :
q missing q missing pq missing p missing p missing pq missing q missing p missing abcdlmprst abcdlmprst abcdlmrst abcdlmpqrst abcdlmqrst abcdlmqrst abcdlmrst abcdlmpqrst abcdlmprst abcdlmqrst
Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the
[^...]
structure :Regex A :
(?-s)ab.+p
Regex B :(?-s)ab.+q
Regex C :(?-s)ab.+pq
Regex D :
(?-s)ab.+?p
Regex E :(?-s)ab.+?q
Regex F :(?-s)ab.+?pq
Regex G :
ab[^p\r\n]+p
Regex H :ab[^q\r\n]+q
Regex I :
ab[^p\r\n]+?p
Regex J :ab[^q\r\n]+?q
Regex K :
ab[^q\r\n]+p
Regex L :ab[^p\r\n]+q
Regex M :
ab[^q\r\n]+?p
Regex N :ab[^p\r\n]+?q
Regex O :
ab[^p\r\n]+pq
Regex P :ab[^q\r\n]+pq
Regex Q :ab[^pq\r\n]+pq
Regex R :
ab[^p\r\n]+?pq
Regex S :ab[^q\r\n]+?pq
Regex T :ab[^pq\r\n]+?pq
Here are the results, where each match is indicated by a range of dashes
q missing q missing pq missing p missing p missing pq missing q missing p missing abcdlmprst abcdlmprst abcdlmrst abcdlmpqrst abcdlmqrst abcdlmqrst abcdlmrst abcdlmpqrst abcdlmprst abcdlmqrst A --------------------------------------------------------------------------------------------------------------- B ---------------------------------------------------------------------------------------------------------------------------- C -------------------------------------------------------------------------------------------------- D , G , I ------- ------- ------------------- --------------------------------------------- ------- E , H , J ---------------------------------------------- ------- ------- -------------------- -------------------- F ---------------------------------------------- ---------------------------------------------- K --------------------------------------------- ------------------- ------- L -------------------- ------- M ------- ------- ------------------- ------------------- ------- N ------- ------- ------- O , R -------------------- ---------------------------------------------- P , S ---------------------------------------------- -------------------- Q , T -------------------- --------------------
Just notice that, as I said, above :
-
The regex D,
(?-s)ab.+?p
, DOES have an equivalent regex G,ab[^p\r\n]+p
, with the[^.....]
structure -
The regex E,
(?-s)ab.+?q
, DOES have an equivalent regex H,ab[^q\r\n]+q
, with the[^.....]
structure
but :
- The regex F,
(?-s)ab.+?pq
, does NOT have an equivalent regex, containing the[^.....]
structure
Note, also, that :
-
The regexes O,
ab[^p\r\n]+pq
, and R,ab[^p\r\n]+?pq
are equivalent -
The regexes P,
ab[^q\r\n]+pq
, and S,ab[^q\r\n]+?pq
are equivalent -
The regexes Q,
ab[^pq\r\n]+pq
, and T,ab[^pq\r\n]+?pq
are equivalent
Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !
Best Regards,
guy038
P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :
http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by
Notepad++
, since its6.0
version, at the TWO addresses below :http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
-
The FIRST link explains the syntax, of regular expressions, in the SEARCH part
-
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part
You may, also, look for valuable informations, on the sites, below :
-