Community
    • Login

    Search and replace using named capturing group in regular expression

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 6 Posters 21.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • akbarmunirA
      akbarmunir
      last edited by

      I want to search and replace text in a file using the named capturing group functionality of regular expressions. As an example, in the string “this is a test”, the search string “(this is)(?<name1> a )” matches the "this is a " successfully. But I am not sure how do I refer to this named capturing group “name1” in the replace text box. I have tried “\k<name1>”, “${name1}”, “${name1}” and multiple other combinations, but all have failed.

      Can someone please help me in identifying the correct syntax for giving the named capturing group in the replace text box.

      Thanks,
      Akbar.

      1 Reply Last reply Reply Quote 1
      • dailD
        dail
        last edited by

        Try using $+{name1}. I think that \g and \k are just used within the find field to back reference a named group.

        1 Reply Last reply Reply Quote 1
        • Владислав ЛасскийВ
          Владислав Ласский
          last edited by

          Hello,
          I tried your solution but … Fail :( Maybe i’m don’t unerstand your post or i use wrong syntax. Where i can read actual regular expression documentation ?
          Tnx.

          1 Reply Last reply Reply Quote 0
          • gerdb42G
            gerdb42
            last edited by

            Documentation about the RegEx engine used in NPP can be found here (the Search part):

            http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

            and here (the Replace part):

            http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hello, @akbarmunir, @Владислав-Ласский, and All,

              I’ll ONLY refer to the syntaxes, relative to named capturing groups, found in search and replacement, used by the Boost regex engine of N++ !


              A) Named capturing groups :

              • Each, of the two syntaxes (?<Name>.....) or (?'Name'.....), represents a named capturing group

              • The name must be made up of words characters only ( \w ) and must not exceed 32 characters

              • The name, of a capturing group, is sensible to case ! For instance, the capturing groups (?<Digits>\d\d) and (?<digitS>\d\d\d) represent two different groups

              • If a regex contains two or more named capturing groups with a same name, only the first one is taken in account, and all the subsequent groups are ignored


              B) Back-references to previous named capturing groups :

              • Each, of the six syntaxes \g{Name}, \g<Name>, \g'Name', \k{Name}, \k<Name>, \k'Name', represents a back-reference to the named capturing group, of name = “Name”, which must be located BEFORE, in the regex

              So, as there are two forms of named capturing groups and six forms of back-references, the 12 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by the SAME, non null range of digits !

              (?<Test>\d+)ABC\g{Test} , (?<Test>\d+)ABC\g<Test> , (?<Test>\d+)ABC\g'Test'

              (?<Test>\d+)ABC\k{Test} , (?<Test>\d+)ABC\k<Test> , (?<Test>\d+)ABC\k'Test'

              (?'Test'\d+)ABC\g{Test} , (?'Test'\d+)ABC\g<Test> , (?'Test'\d+)ABC\g'Test'

              (?'Test'\d+)ABC\k{Test} , (?'Test'\d+)ABC\k<Test> , (?'Test'\d+)ABC\k'Test'

              So, ANY of these 12 syntaxes, matches the four lines, below :

              1ABC1
              12345ABC12345
              456ABC456
              789ABC789
              

              C) Subroutine calls to a named capturing group :

              • Each of the two syntaxes (?&Name) or (?P>Name) represents a subroutine call to the regex pattern of the named capturing group, of name = “Name”, which may be located BEFORE or AFTER, in the regex

              So, as there are two forms of named capturing groups and two forms of subroutine calls, the 4 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by non null ranges of digits !

              (?<Test>\d+)ABC(?&Test) , (?<Test>\d+)ABC(?P>Test)
              (?'Test'\d+)ABC(?&Test) , (?'Test'\d+)ABC(?P>Test)

              So, ANY of the 4 syntaxes matches the nine lines below :

              1ABC1
              12345ABC12345
              456ABC456
              789ABC789
              
              456ABC789
              789ABC456
              0ABC123456789
              0123456789ABC1
              111ABC999
              

              And, as the subroutine call can be located BEFORE its associated named capturing group, the 4 syntaxes, below, are also valid ones and would find the nine lines above, too !

              (?&Test)ABC(?<Test>\d+) , (?P>Test)ABC(?<Test>\d+)
              (?&Test)ABC(?'Test'\d+) , (?P>Test)ABC(?'Test'\d+)


              D) Reference to named capturing groups, in replacement :

              In replacement, any named group (?<Name>.....) or (?'Name'.....), of the search part, can be re-used with the UNIQUE named syntax :

              $+{Name}


              It’s important to fully understand the fundamental difference between a back-reference and a subroutine call to a group, named or not :

              • A back-reference, to a group, represents the present match of this group

              • A subroutine call, to a group, represents the regex pattern of this group

              For instance, the 15 regexes, below :

              • (?-i)(?<Test>\d+)ABC\g{Test}

              • (?-i)(?<Test>\d+)ABC\g<Test>

              • (?-i)(?<Test>\d+)ABC\g'Test'

              • (?-i)(?<Test>\d+)ABC\k{Test}

              • (?-i)(?<Test>\d+)ABC\k<Test>

              • (?-i)(?<Test>\d+)ABC\k'Test'

              • (?-i)(?'Test'\d+)ABC\g{Test}

              • (?-i)(?'Test'\d+)ABC\g<Test>

              • (?-i)(?'Test'\d+)ABC\g'Test'

              • (?-i)(?'Test'\d+)ABC\k{Test}

              • (?-i)(?'Test'\d+)ABC\k<Test>

              • (?-i)(?'Test'\d+)ABC\k'Test'

              • (?-i)(?<Test>\d+)ABC\1

              • (?-i)(?'Test'\d+)ABC\1

              • (?-i)(\d+)ABC\1

              Would match the first fourth lines, of the above example ( In other words, the numbers, surrounding the string ABC, have to be identical. Indeed, the back-references are, simply, a reference to the present number, preceding the string ABC )

              Whereas the 7 regexes, below :

              • (?-i)(?<Test>\d+)ABC(?&Test)

              • (?-i)(?'Test'\d+)ABC(?&Test)

              • (?-i)(?<Test>\d+)ABC(?P>Test)

              • (?-i)(?'Test'\d+)ABC(?P>Test)

              • (?<Test>\d+)ABC(?1)

              • (?'Test'\d+)ABC(?1)

              • (\d+)ABC(?1)

              Would match the 9 lines of the above example ( In other words, the numbers, surrounding the string ABC, may be different. Indeed, the subroutine calls (?&Test) and (?P>Test) are strictly equal to \d+, the pattern of the group Test ! )

              Best Regards,

              guy038

              P.S. :

              When a subroutine call is located inside the parentheses of the group to which it refers, it operates as a recursive pattern. But this is an other story … :-)

              1 Reply Last reply Reply Quote 0
              • Gunar MayerG
                Gunar Mayer
                last edited by

                @gerdb42 said:

                http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                Hello
                How can I change many lines in one time — BUT in many files.
                For example 10000 files in a folder.
                For example I search 3 lines and instead of this lines I need to put in 10 different lines.
                In former times this was possible with the program Homesite (include regex / wildcard and so on), but they stopped and destroy UTF-8 documents.

                Can I find MANY lines with REGEX like above ???
                The greater problem seemed to be the inclusion of many lines, because the “Search / Replaces in files”-function allows only 1 line.

                Please help
                Mayer

                gerdb42G 1 Reply Last reply Reply Quote 0
                • gerdb42G
                  gerdb42 @Gunar Mayer
                  last edited by

                  “Search / Replaces in files”-function allows only 1 line.

                  Who said so? If you know where line breaks will occur, try \R-Pattern as Placeholder. Or check option . finds \r and \n.

                  In replacement, insert \r\n at places where you want line breaks.

                  When using . finds \r and \n pay special attention to greedy/non greedy repeats.

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors