Regex search/replace wildcard

Ryan Birtles

I need to do a search and replace: I need to replace something like “roundingWidth=**” with “roundingWidth=20”, where the asterisks could be a 1- or 2-digit number.

I have struggled and failed trying to come up with a regular expression to do this.

Regex is extremely powerful, but damned if I can wrap my head around it! I think some examples like this will help get me started.

Thanks,

Claudia Frank

@Ryan-Birtles

assuming that the double quotes are part of the text a regex could look like

"roundingWidth=\d{1,2}"

\d is for any digit
{1,2} means either one digit or two digits.

Cheers
Claudia

Jim Dailey

@Ryan-Birtles
This should help you: http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html.

Ryan Birtles

Thank you, that did the trick! I have another one for you please:

I need to replace the following line with the one after that. The asterisk could be text of arbitrary length but will always be between quote marks.

alignment=“*”
alignment=“middleLeft”

Thanks!

Claudia Frank

@Ryan-Birtles

I had to smile because you, maybe, nearly posted the solution.
In regex a . (dot) represents a single character and together with the * (asterisks)
it can be used to match variable length chars.

So you search for

alignment=".*"

and replace with

alignment="middleLeft"

but this assumes that alignment= is the only text in the line.
If this isn’t the case and there is additional text with quotes you might replace the search with

alignment=".*?"

The difference is that the first is greedy and tries to match as much as possible whereas the latter is non-greedy
and matches a less as possible.

Cheers
Claudia

AdrianHHH

Rather than using

alignment=".?*"

I normally use the more restrictive

alignment="[^"]*"

so it only matches non-double-quotes. For most of the places I use this sort of search in a replace-all I go the step further and use

alignment="[^"\r\n]*"

to restrict it to matching strings that do not include a line break. The simple use of .?* would probably be OK but the time to type a few extra characters is negligible compared to my being confident that a replace-all will only change the places I want.

Ryan Birtles

This is great, thank you! Four varations of of a similar task - this will help a lot in my regex education

guy038

Hello, Ryan, Claudia, AdrianHHH and All,

Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :

a.*z   matches a lowercase letter a, followed by the LONGEST  range of characters, even EMPTY, till a lowercase letter z
a.*?z  matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z
a.+z   matches a lowercase letter a, followed by the LONGEST  range of characters, NON empty,  till a lowercase letter z
a.+?z  matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty,  till a lowercase letter z

Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !

AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !

(?-s)alignment=".*?"
alignment="[^"\r\n]*"

Similarly, the two syntaxes, below, are strictly identical, too :

(?s)alignment=".*?"
alignment="[^"]*"

The reason is that you reach a final UNIQUE character ( a quote mark ) "

Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :

The regex 123.+?5, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5
And the regex, almost identical, 123.+?56, which
- Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit
- But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56

So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex 123.+?5 finds four occurrences, whereas the the second regex 123.+?56 would, only, find two occurrences !

Here is a summary example :

Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :

q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing

abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst

Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the [^...] structure :

Regex A : (?-s)ab.+p
Regex B : (?-s)ab.+q
Regex C : (?-s)ab.+pq

Regex D : (?-s)ab.+?p
Regex E : (?-s)ab.+?q
Regex F : (?-s)ab.+?pq

Regex G : ab[^p\r\n]+p
Regex H : ab[^q\r\n]+q

Regex I : ab[^p\r\n]+?p
Regex J : ab[^q\r\n]+?q

Regex K : ab[^q\r\n]+p
Regex L : ab[^p\r\n]+q

Regex M : ab[^q\r\n]+?p
Regex N : ab[^p\r\n]+?q

Regex O : ab[^p\r\n]+pq
Regex P : ab[^q\r\n]+pq
Regex Q : ab[^pq\r\n]+pq

Regex R : ab[^p\r\n]+?pq
Regex S : ab[^q\r\n]+?pq
Regex T : ab[^pq\r\n]+?pq

Here are the results, where each match is indicated by a range of dashes

             q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing

             abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst

A            ---------------------------------------------------------------------------------------------------------------
B            ----------------------------------------------------------------------------------------------------------------------------
C            --------------------------------------------------------------------------------------------------
D , G , I    -------      -------      -------------------       ---------------------------------------------       -------
E , H , J    ----------------------------------------------      -------      -------      --------------------      --------------------
F            ----------------------------------------------      ----------------------------------------------
K            ---------------------------------------------                                 -------------------       -------
L                                                                --------------------                                             -------
M            -------      -------      -------------------                                 -------------------       -------
N                                                                -------      -------                                             -------
O , R                                  --------------------      ----------------------------------------------
P , S        ----------------------------------------------                                --------------------
Q , T                                  --------------------                                --------------------

Just notice that, as I said, above :

The regex D, (?-s)ab.+?p, DOES have an equivalent regex G, ab[^p\r\n]+p, with the [^.....] structure
The regex E, (?-s)ab.+?q, DOES have an equivalent regex H, ab[^q\r\n]+q, with the [^.....] structure

but :

The regex F, (?-s)ab.+?pq, does NOT have an equivalent regex, containing the [^.....] structure

Note, also, that :

The regexes O, ab[^p\r\n]+pq, and R, ab[^p\r\n]+?pq are equivalent
The regexes P, ab[^q\r\n]+pq, and S, ab[^q\r\n]+?pq are equivalent
The regexes Q, ab[^pq\r\n]+pq, and T, ab[^pq\r\n]+?pq are equivalent

Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !

Best Regards,

guy038

P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :

http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

You may, also, look for valuable informations, on the sites, below :

http://www.regular-expressions.info

http://www.rexegg.com

http://perldoc.perl.org/perlre.html