Regex search/replace wildcard



  • I need to do a search and replace: I need to replace something like “roundingWidth=**” with “roundingWidth=20”, where the asterisks could be a 1- or 2-digit number.

    I have struggled and failed trying to come up with a regular expression to do this.

    Regex is extremely powerful, but damned if I can wrap my head around it! I think some examples like this will help get me started.

    Thanks,



  • @Ryan-Birtles

    assuming that the double quotes are part of the text a regex could look like

    "roundingWidth=\d{1,2}"
    

    \d is for any digit
    {1,2} means either one digit or two digits.

    Cheers
    Claudia





  • Thank you, that did the trick! I have another one for you please:

    I need to replace the following line with the one after that. The asterisk could be text of arbitrary length but will always be between quote marks.

    alignment="*"
    alignment=“middleLeft”

    Thanks!



  • @Ryan-Birtles

    I had to smile because you, maybe, nearly posted the solution.
    In regex a . (dot) represents a single character and together with the * (asterisks)
    it can be used to match variable length chars.

    So you search for

    alignment=".*"
    

    and replace with

    alignment="middleLeft"
    

    but this assumes that alignment= is the only text in the line.
    If this isn’t the case and there is additional text with quotes you might replace the search with

    alignment=".*?"
    

    The difference is that the first is greedy and tries to match as much as possible whereas the latter is non-greedy
    and matches a less as possible.

    Cheers
    Claudia



  • Rather than using

    alignment=".?*"
    

    I normally use the more restrictive

    alignment="[^"]*"
    

    so it only matches non-double-quotes. For most of the places I use this sort of search in a replace-all I go the step further and use

    alignment="[^"\r\n]*"
    

    to restrict it to matching strings that do not include a line break. The simple use of .?* would probably be OK but the time to type a few extra characters is negligible compared to my being confident that a replace-all will only change the places I want.



  • This is great, thank you! Four varations of of a similar task - this will help a lot in my regex education



  • Hello, Ryan, Claudia, AdrianHHH and All,

    Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :

    a.*z   matches a lowercase letter a, followed by the LONGEST  range of characters, even EMPTY, till a lowercase letter z
    a.*?z  matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z
    a.+z   matches a lowercase letter a, followed by the LONGEST  range of characters, NON empty,  till a lowercase letter z
    a.+?z  matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty,  till a lowercase letter z
    

    Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !


    AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !

    • (?-s)alignment=".*?"

    • alignment="[^"\r\n]*"

    Similarly, the two syntaxes, below, are strictly identical, too :

    • (?s)alignment=".*?"

    • alignment="[^"]*"

    The reason is that you reach a final UNIQUE character ( a quote mark ) "


    Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :

    • The regex 123.+?5, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5

    • And the regex, almost identical, 123.+?56, which

      • Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit

      • But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56

    So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex 123.+?5 finds four occurrences, whereas the the second regex 123.+?56 would, only, find two occurrences !


    Here is a summary example :

    Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :

    q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
    
    abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
    

    Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the [^...] structure :

    Regex A : (?-s)ab.+p
    Regex B : (?-s)ab.+q
    Regex C : (?-s)ab.+pq

    Regex D : (?-s)ab.+?p
    Regex E : (?-s)ab.+?q
    Regex F : (?-s)ab.+?pq

    Regex G : ab[^p\r\n]+p
    Regex H : ab[^q\r\n]+q

    Regex I : ab[^p\r\n]+?p
    Regex J : ab[^q\r\n]+?q

    Regex K : ab[^q\r\n]+p
    Regex L : ab[^p\r\n]+q

    Regex M : ab[^q\r\n]+?p
    Regex N : ab[^p\r\n]+?q

    Regex O : ab[^p\r\n]+pq
    Regex P : ab[^q\r\n]+pq
    Regex Q : ab[^pq\r\n]+pq

    Regex R : ab[^p\r\n]+?pq
    Regex S : ab[^q\r\n]+?pq
    Regex T : ab[^pq\r\n]+?pq

    Here are the results, where each match is indicated by a range of dashes

                 q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
    
                 abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
    
    A            ---------------------------------------------------------------------------------------------------------------
    B            ----------------------------------------------------------------------------------------------------------------------------
    C            --------------------------------------------------------------------------------------------------
    D , G , I    -------      -------      -------------------       ---------------------------------------------       -------
    E , H , J    ----------------------------------------------      -------      -------      --------------------      --------------------
    F            ----------------------------------------------      ----------------------------------------------
    K            ---------------------------------------------                                 -------------------       -------
    L                                                                --------------------                                             -------
    M            -------      -------      -------------------                                 -------------------       -------
    N                                                                -------      -------                                             -------
    O , R                                  --------------------      ----------------------------------------------
    P , S        ----------------------------------------------                                --------------------
    Q , T                                  --------------------                                --------------------
    

    Just notice that, as I said, above :

    • The regex D, (?-s)ab.+?p, DOES have an equivalent regex G, ab[^p\r\n]+p, with the [^.....] structure

    • The regex E, (?-s)ab.+?q, DOES have an equivalent regex H, ab[^q\r\n]+q, with the [^.....] structure

    but :

    • The regex F, (?-s)ab.+?pq, does NOT have an equivalent regex, containing the [^.....] structure

    Note, also, that :

    • The regexes O, ab[^p\r\n]+pq, and R, ab[^p\r\n]+?pq are equivalent

    • The regexes P, ab[^q\r\n]+pq, and S, ab[^q\r\n]+?pq are equivalent

    • The regexes Q, ab[^pq\r\n]+pq, and T, ab[^pq\r\n]+?pq are equivalent

    Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !

    Best Regards,

    guy038

    P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :

    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

    In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


    You may, also, look for valuable informations, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html


Log in to reply