Community
    • Login

    Regex search/replace wildcard

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 5 Posters 31.1k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ryan BirtlesR Offline
      Ryan Birtles
      last edited by

      I need to do a search and replace: I need to replace something like “roundingWidth=**” with “roundingWidth=20”, where the asterisks could be a 1- or 2-digit number.

      I have struggled and failed trying to come up with a regular expression to do this.

      Regex is extremely powerful, but damned if I can wrap my head around it! I think some examples like this will help get me started.

      Thanks,

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC Offline
        Claudia Frank @Ryan Birtles
        last edited by

        @Ryan-Birtles

        assuming that the double quotes are part of the text a regex could look like

        "roundingWidth=\d{1,2}"
        

        \d is for any digit
        {1,2} means either one digit or two digits.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • Jim DaileyJ Offline
          Jim Dailey
          last edited by Jim Dailey

          @Ryan-Birtles
          This should help you: http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html.

          1 Reply Last reply Reply Quote 0
          • Ryan BirtlesR Offline
            Ryan Birtles
            last edited by

            Thank you, that did the trick! I have another one for you please:

            I need to replace the following line with the one after that. The asterisk could be text of arbitrary length but will always be between quote marks.

            alignment=“*”
            alignment=“middleLeft”

            Thanks!

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC Offline
              Claudia Frank @Ryan Birtles
              last edited by Claudia Frank

              @Ryan-Birtles

              I had to smile because you, maybe, nearly posted the solution.
              In regex a . (dot) represents a single character and together with the * (asterisks)
              it can be used to match variable length chars.

              So you search for

              alignment=".*"
              

              and replace with

              alignment="middleLeft"
              

              but this assumes that alignment= is the only text in the line.
              If this isn’t the case and there is additional text with quotes you might replace the search with

              alignment=".*?"
              

              The difference is that the first is greedy and tries to match as much as possible whereas the latter is non-greedy
              and matches a less as possible.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • AdrianHHHA Offline
                AdrianHHH
                last edited by

                Rather than using

                alignment=".?*"
                

                I normally use the more restrictive

                alignment="[^"]*"
                

                so it only matches non-double-quotes. For most of the places I use this sort of search in a replace-all I go the step further and use

                alignment="[^"\r\n]*"
                

                to restrict it to matching strings that do not include a line break. The simple use of .?* would probably be OK but the time to type a few extra characters is negligible compared to my being confident that a replace-all will only change the places I want.

                1 Reply Last reply Reply Quote 0
                • Ryan BirtlesR Offline
                  Ryan Birtles
                  last edited by

                  This is great, thank you! Four varations of of a similar task - this will help a lot in my regex education

                  1 Reply Last reply Reply Quote 0
                  • guy038G Online
                    guy038
                    last edited by guy038

                    Hello, Ryan, Claudia, AdrianHHH and All,

                    Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :

                    a.*z   matches a lowercase letter a, followed by the LONGEST  range of characters, even EMPTY, till a lowercase letter z
                    a.*?z  matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z
                    a.+z   matches a lowercase letter a, followed by the LONGEST  range of characters, NON empty,  till a lowercase letter z
                    a.+?z  matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty,  till a lowercase letter z
                    

                    Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !


                    AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !

                    • (?-s)alignment=".*?"

                    • alignment="[^"\r\n]*"

                    Similarly, the two syntaxes, below, are strictly identical, too :

                    • (?s)alignment=".*?"

                    • alignment="[^"]*"

                    The reason is that you reach a final UNIQUE character ( a quote mark ) "


                    Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :

                    • The regex 123.+?5, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5

                    • And the regex, almost identical, 123.+?56, which

                      • Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit

                      • But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56

                    So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex 123.+?5 finds four occurrences, whereas the the second regex 123.+?56 would, only, find two occurrences !


                    Here is a summary example :

                    Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :

                    q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
                    
                    abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
                    

                    Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the [^...] structure :

                    Regex A : (?-s)ab.+p
                    Regex B : (?-s)ab.+q
                    Regex C : (?-s)ab.+pq

                    Regex D : (?-s)ab.+?p
                    Regex E : (?-s)ab.+?q
                    Regex F : (?-s)ab.+?pq

                    Regex G : ab[^p\r\n]+p
                    Regex H : ab[^q\r\n]+q

                    Regex I : ab[^p\r\n]+?p
                    Regex J : ab[^q\r\n]+?q

                    Regex K : ab[^q\r\n]+p
                    Regex L : ab[^p\r\n]+q

                    Regex M : ab[^q\r\n]+?p
                    Regex N : ab[^p\r\n]+?q

                    Regex O : ab[^p\r\n]+pq
                    Regex P : ab[^q\r\n]+pq
                    Regex Q : ab[^pq\r\n]+pq

                    Regex R : ab[^p\r\n]+?pq
                    Regex S : ab[^q\r\n]+?pq
                    Regex T : ab[^pq\r\n]+?pq

                    Here are the results, where each match is indicated by a range of dashes

                                 q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
                    
                                 abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
                    
                    A            ---------------------------------------------------------------------------------------------------------------
                    B            ----------------------------------------------------------------------------------------------------------------------------
                    C            --------------------------------------------------------------------------------------------------
                    D , G , I    -------      -------      -------------------       ---------------------------------------------       -------
                    E , H , J    ----------------------------------------------      -------      -------      --------------------      --------------------
                    F            ----------------------------------------------      ----------------------------------------------
                    K            ---------------------------------------------                                 -------------------       -------
                    L                                                                --------------------                                             -------
                    M            -------      -------      -------------------                                 -------------------       -------
                    N                                                                -------      -------                                             -------
                    O , R                                  --------------------      ----------------------------------------------
                    P , S        ----------------------------------------------                                --------------------
                    Q , T                                  --------------------                                --------------------
                    

                    Just notice that, as I said, above :

                    • The regex D, (?-s)ab.+?p, DOES have an equivalent regex G, ab[^p\r\n]+p, with the [^.....] structure

                    • The regex E, (?-s)ab.+?q, DOES have an equivalent regex H, ab[^q\r\n]+q, with the [^.....] structure

                    but :

                    • The regex F, (?-s)ab.+?pq, does NOT have an equivalent regex, containing the [^.....] structure

                    Note, also, that :

                    • The regexes O, ab[^p\r\n]+pq, and R, ab[^p\r\n]+?pq are equivalent

                    • The regexes P, ab[^q\r\n]+pq, and S, ab[^q\r\n]+?pq are equivalent

                    • The regexes Q, ab[^pq\r\n]+pq, and T, ab[^pq\r\n]+?pq are equivalent

                    Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !

                    Best Regards,

                    guy038

                    P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :

                    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                    In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                    You may, also, look for valuable informations, on the sites, below :

                    http://www.regular-expressions.info

                    http://www.rexegg.com

                    http://perldoc.perl.org/perlre.html

                    1 Reply Last reply Reply Quote 0

                    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                    With your input, this post could be even better 💗

                    Register Login
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors