Search and replace using named capturing group in regular expression

akbarmunir

I want to search and replace text in a file using the named capturing group functionality of regular expressions. As an example, in the string “this is a test”, the search string “(this is)(?<name1> a )” matches the "this is a " successfully. But I am not sure how do I refer to this named capturing group “name1” in the replace text box. I have tried “\k<name1>”, “${name1}”, “${name1}” and multiple other combinations, but all have failed.

Can someone please help me in identifying the correct syntax for giving the named capturing group in the replace text box.

Thanks,
Akbar.

dail

Try using $+{name1}. I think that \g and \k are just used within the find field to back reference a named group.

Владислав Ласский

Hello,
I tried your solution but … Fail :( Maybe i’m don’t unerstand your post or i use wrong syntax. Where i can read actual regular expression documentation ?
Tnx.

gerdb42

Documentation about the RegEx engine used in NPP can be found here (the Search part):

http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

and here (the Replace part):

http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

guy038

Hello, @akbarmunir, @Владислав-Ласский, and All,

I’ll ONLY refer to the syntaxes, relative to named capturing groups, found in search and replacement, used by the Boost regex engine of N++ !

A) Named capturing groups :

Each, of the two syntaxes (?<Name>.....) or (?'Name'.....), represents a named capturing group
The name must be made up of words characters only ( \w ) and must not exceed 32 characters
The name, of a capturing group, is sensible to case ! For instance, the capturing groups (?<Digits>\d\d) and (?<digitS>\d\d\d) represent two different groups
If a regex contains two or more named capturing groups with a same name, only the first one is taken in account, and all the subsequent groups are ignored

B) Back-references to previous named capturing groups :

Each, of the six syntaxes \g{Name}, \g<Name>, \g'Name', \k{Name}, \k<Name>, \k'Name', represents a back-reference to the named capturing group, of name = “Name”, which must be located BEFORE, in the regex

So, as there are two forms of named capturing groups and six forms of back-references, the 12 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by the SAME, non null range of digits !

(?<Test>\d+)ABC\g{Test} , (?<Test>\d+)ABC\g<Test> , (?<Test>\d+)ABC\g'Test'

(?<Test>\d+)ABC\k{Test} , (?<Test>\d+)ABC\k<Test> , (?<Test>\d+)ABC\k'Test'

(?'Test'\d+)ABC\g{Test} , (?'Test'\d+)ABC\g<Test> , (?'Test'\d+)ABC\g'Test'

(?'Test'\d+)ABC\k{Test} , (?'Test'\d+)ABC\k<Test> , (?'Test'\d+)ABC\k'Test'

So, ANY of these 12 syntaxes, matches the four lines, below :

1ABC1
12345ABC12345
456ABC456
789ABC789

C) Subroutine calls to a named capturing group :

Each of the two syntaxes (?&Name) or (?P>Name) represents a subroutine call to the regex pattern of the named capturing group, of name = “Name”, which may be located BEFORE or AFTER, in the regex

So, as there are two forms of named capturing groups and two forms of subroutine calls, the 4 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by non null ranges of digits !

(?<Test>\d+)ABC(?&Test) , (?<Test>\d+)ABC(?P>Test)
(?'Test'\d+)ABC(?&Test) , (?'Test'\d+)ABC(?P>Test)

So, ANY of the 4 syntaxes matches the nine lines below :

1ABC1
12345ABC12345
456ABC456
789ABC789

456ABC789
789ABC456
0ABC123456789
0123456789ABC1
111ABC999

And, as the subroutine call can be located BEFORE its associated named capturing group, the 4 syntaxes, below, are also valid ones and would find the nine lines above, too !

(?&Test)ABC(?<Test>\d+) , (?P>Test)ABC(?<Test>\d+)
(?&Test)ABC(?'Test'\d+) , (?P>Test)ABC(?'Test'\d+)

D) Reference to named capturing groups, in replacement :

In replacement, any named group (?<Name>.....) or (?'Name'.....), of the search part, can be re-used with the UNIQUE named syntax :

$+{Name}

It’s important to fully understand the fundamental difference between a back-reference and a subroutine call to a group, named or not :

A back-reference, to a group, represents the present match of this group
A subroutine call, to a group, represents the regex pattern of this group

For instance, the 15 regexes, below :

(?-i)(?<Test>\d+)ABC\g{Test}
(?-i)(?<Test>\d+)ABC\g<Test>
(?-i)(?<Test>\d+)ABC\g'Test'
(?-i)(?<Test>\d+)ABC\k{Test}
(?-i)(?<Test>\d+)ABC\k<Test>
(?-i)(?<Test>\d+)ABC\k'Test'
(?-i)(?'Test'\d+)ABC\g{Test}
(?-i)(?'Test'\d+)ABC\g<Test>
(?-i)(?'Test'\d+)ABC\g'Test'
(?-i)(?'Test'\d+)ABC\k{Test}
(?-i)(?'Test'\d+)ABC\k<Test>
(?-i)(?'Test'\d+)ABC\k'Test'
(?-i)(?<Test>\d+)ABC\1
(?-i)(?'Test'\d+)ABC\1
(?-i)(\d+)ABC\1

Would match the first fourth lines, of the above example ( In other words, the numbers, surrounding the string ABC, have to be identical. Indeed, the back-references are, simply, a reference to the present number, preceding the string ABC )

Whereas the 7 regexes, below :

(?-i)(?<Test>\d+)ABC(?&Test)
(?-i)(?'Test'\d+)ABC(?&Test)
(?-i)(?<Test>\d+)ABC(?P>Test)
(?-i)(?'Test'\d+)ABC(?P>Test)
(?<Test>\d+)ABC(?1)
(?'Test'\d+)ABC(?1)
(\d+)ABC(?1)

Would match the 9 lines of the above example ( In other words, the numbers, surrounding the string ABC, may be different. Indeed, the subroutine calls (?&Test) and (?P>Test) are strictly equal to \d+, the pattern of the group Test ! )

Best Regards,

guy038

P.S. :

When a subroutine call is located inside the parentheses of the group to which it refers, it operates as a recursive pattern. But this is an other story … :-)

Gunar Mayer

@gerdb42 said:

http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

Hello
How can I change many lines in one time — BUT in many files.
For example 10000 files in a folder.
For example I search 3 lines and instead of this lines I need to put in 10 different lines.
In former times this was possible with the program Homesite (include regex / wildcard and so on), but they stopped and destroy UTF-8 documents.

Can I find MANY lines with REGEX like above ???
The greater problem seemed to be the inclusion of many lines, because the “Search / Replaces in files”-function allows only 1 line.

Please help
Mayer

gerdb42

“Search / Replaces in files”-function allows only 1 line.

Who said so? If you know where line breaks will occur, try \R-Pattern as Placeholder. Or check option . finds \r and \n.

In replacement, insert \r\n at places where you want line breaks.

When using . finds \r and \n pay special attention to greedy/non greedy repeats.