Community
    • Login

    Regex search/replace wildcard

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 5 Posters 30.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ryan BirtlesR
      Ryan Birtles
      last edited by

      I need to do a search and replace: I need to replace something like “roundingWidth=**” with “roundingWidth=20”, where the asterisks could be a 1- or 2-digit number.

      I have struggled and failed trying to come up with a regular expression to do this.

      Regex is extremely powerful, but damned if I can wrap my head around it! I think some examples like this will help get me started.

      Thanks,

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @Ryan Birtles
        last edited by

        @Ryan-Birtles

        assuming that the double quotes are part of the text a regex could look like

        "roundingWidth=\d{1,2}"
        

        \d is for any digit
        {1,2} means either one digit or two digits.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • Jim DaileyJ
          Jim Dailey
          last edited by Jim Dailey

          @Ryan-Birtles
          This should help you: http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html.

          1 Reply Last reply Reply Quote 0
          • Ryan BirtlesR
            Ryan Birtles
            last edited by

            Thank you, that did the trick! I have another one for you please:

            I need to replace the following line with the one after that. The asterisk could be text of arbitrary length but will always be between quote marks.

            alignment=“*”
            alignment=“middleLeft”

            Thanks!

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank @Ryan Birtles
              last edited by Claudia Frank

              @Ryan-Birtles

              I had to smile because you, maybe, nearly posted the solution.
              In regex a . (dot) represents a single character and together with the * (asterisks)
              it can be used to match variable length chars.

              So you search for

              alignment=".*"
              

              and replace with

              alignment="middleLeft"
              

              but this assumes that alignment= is the only text in the line.
              If this isn’t the case and there is additional text with quotes you might replace the search with

              alignment=".*?"
              

              The difference is that the first is greedy and tries to match as much as possible whereas the latter is non-greedy
              and matches a less as possible.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • AdrianHHHA
                AdrianHHH
                last edited by

                Rather than using

                alignment=".?*"
                

                I normally use the more restrictive

                alignment="[^"]*"
                

                so it only matches non-double-quotes. For most of the places I use this sort of search in a replace-all I go the step further and use

                alignment="[^"\r\n]*"
                

                to restrict it to matching strings that do not include a line break. The simple use of .?* would probably be OK but the time to type a few extra characters is negligible compared to my being confident that a replace-all will only change the places I want.

                1 Reply Last reply Reply Quote 0
                • Ryan BirtlesR
                  Ryan Birtles
                  last edited by

                  This is great, thank you! Four varations of of a similar task - this will help a lot in my regex education

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, Ryan, Claudia, AdrianHHH and All,

                    Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :

                    a.*z   matches a lowercase letter a, followed by the LONGEST  range of characters, even EMPTY, till a lowercase letter z
                    a.*?z  matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z
                    a.+z   matches a lowercase letter a, followed by the LONGEST  range of characters, NON empty,  till a lowercase letter z
                    a.+?z  matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty,  till a lowercase letter z
                    

                    Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !


                    AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !

                    • (?-s)alignment=".*?"

                    • alignment="[^"\r\n]*"

                    Similarly, the two syntaxes, below, are strictly identical, too :

                    • (?s)alignment=".*?"

                    • alignment="[^"]*"

                    The reason is that you reach a final UNIQUE character ( a quote mark ) "


                    Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :

                    • The regex 123.+?5, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5

                    • And the regex, almost identical, 123.+?56, which

                      • Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit

                      • But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56

                    So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex 123.+?5 finds four occurrences, whereas the the second regex 123.+?56 would, only, find two occurrences !


                    Here is a summary example :

                    Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :

                    q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
                    
                    abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
                    

                    Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the [^...] structure :

                    Regex A : (?-s)ab.+p
                    Regex B : (?-s)ab.+q
                    Regex C : (?-s)ab.+pq

                    Regex D : (?-s)ab.+?p
                    Regex E : (?-s)ab.+?q
                    Regex F : (?-s)ab.+?pq

                    Regex G : ab[^p\r\n]+p
                    Regex H : ab[^q\r\n]+q

                    Regex I : ab[^p\r\n]+?p
                    Regex J : ab[^q\r\n]+?q

                    Regex K : ab[^q\r\n]+p
                    Regex L : ab[^p\r\n]+q

                    Regex M : ab[^q\r\n]+?p
                    Regex N : ab[^p\r\n]+?q

                    Regex O : ab[^p\r\n]+pq
                    Regex P : ab[^q\r\n]+pq
                    Regex Q : ab[^pq\r\n]+pq

                    Regex R : ab[^p\r\n]+?pq
                    Regex S : ab[^q\r\n]+?pq
                    Regex T : ab[^pq\r\n]+?pq

                    Here are the results, where each match is indicated by a range of dashes

                                 q missing    q missing    pq missing                p missing    p missing    pq missing                q missing    p missing
                    
                                 abcdlmprst   abcdlmprst   abcdlmrst   abcdlmpqrst   abcdlmqrst   abcdlmqrst   abcdlmrst   abcdlmpqrst   abcdlmprst   abcdlmqrst
                    
                    A            ---------------------------------------------------------------------------------------------------------------
                    B            ----------------------------------------------------------------------------------------------------------------------------
                    C            --------------------------------------------------------------------------------------------------
                    D , G , I    -------      -------      -------------------       ---------------------------------------------       -------
                    E , H , J    ----------------------------------------------      -------      -------      --------------------      --------------------
                    F            ----------------------------------------------      ----------------------------------------------
                    K            ---------------------------------------------                                 -------------------       -------
                    L                                                                --------------------                                             -------
                    M            -------      -------      -------------------                                 -------------------       -------
                    N                                                                -------      -------                                             -------
                    O , R                                  --------------------      ----------------------------------------------
                    P , S        ----------------------------------------------                                --------------------
                    Q , T                                  --------------------                                --------------------
                    

                    Just notice that, as I said, above :

                    • The regex D, (?-s)ab.+?p, DOES have an equivalent regex G, ab[^p\r\n]+p, with the [^.....] structure

                    • The regex E, (?-s)ab.+?q, DOES have an equivalent regex H, ab[^q\r\n]+q, with the [^.....] structure

                    but :

                    • The regex F, (?-s)ab.+?pq, does NOT have an equivalent regex, containing the [^.....] structure

                    Note, also, that :

                    • The regexes O, ab[^p\r\n]+pq, and R, ab[^p\r\n]+?pq are equivalent

                    • The regexes P, ab[^q\r\n]+pq, and S, ab[^q\r\n]+?pq are equivalent

                    • The regexes Q, ab[^pq\r\n]+pq, and T, ab[^pq\r\n]+?pq are equivalent

                    Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !

                    Best Regards,

                    guy038

                    P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :

                    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                    In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                    You may, also, look for valuable informations, on the sites, below :

                    http://www.regular-expressions.info

                    http://www.rexegg.com

                    http://perldoc.perl.org/perlre.html

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors