Search and replace using named capturing group in regular expression
-
I want to search and replace text in a file using the named capturing group functionality of regular expressions. As an example, in the string “this is a test”, the search string “(this is)(?<name1> a )” matches the "this is a " successfully. But I am not sure how do I refer to this named capturing group “name1” in the replace text box. I have tried “\k<name1>”, “${name1}”, “${name1}” and multiple other combinations, but all have failed.
Can someone please help me in identifying the correct syntax for giving the named capturing group in the replace text box.
Thanks,
Akbar. -
Try using
$+{name1}
. I think that\g
and\k
are just used within the find field to back reference a named group. -
Hello,
I tried your solution but … Fail :( Maybe i’m don’t unerstand your post or i use wrong syntax. Where i can read actual regular expression documentation ?
Tnx. -
Documentation about the RegEx engine used in NPP can be found here (the Search part):
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
and here (the Replace part):
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
-
Hello, @akbarmunir, @Владислав-Ласский, and All,
I’ll ONLY refer to the syntaxes, relative to
named capturing
groups, found in search and replacement, used by the Boost regex engine of N++ !
A) Named capturing groups :
-
Each, of the two syntaxes
(?<Name>.....)
or(?'Name'.....)
, represents a named capturing group -
The name must be made up of words characters only (
\w
) and must not exceed 32 characters -
The name, of a capturing group, is sensible to case ! For instance, the capturing groups
(?<Digits>\d\d)
and(?<digitS>\d\d\d)
represent two different groups -
If a regex contains two or more named capturing groups with a same name, only the first one is taken in account, and all the subsequent groups are ignored
B) Back-references to previous named capturing groups :
- Each, of the six syntaxes
\g{Name}
,\g<Name>
,\g'Name'
,\k{Name}
,\k<Name>
,\k'Name'
, represents a back-reference to the named capturing group, of name = “Name”, which must be located BEFORE, in the regex
So, as there are two forms of named capturing groups and six forms of back-references, the 12 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by the SAME, non null range of digits !
(?<Test>\d+)ABC\g{Test}
,(?<Test>\d+)ABC\g<Test>
,(?<Test>\d+)ABC\g'Test'
(?<Test>\d+)ABC\k{Test}
,(?<Test>\d+)ABC\k<Test>
,(?<Test>\d+)ABC\k'Test'
(?'Test'\d+)ABC\g{Test}
,(?'Test'\d+)ABC\g<Test>
,(?'Test'\d+)ABC\g'Test'
(?'Test'\d+)ABC\k{Test}
,(?'Test'\d+)ABC\k<Test>
,(?'Test'\d+)ABC\k'Test'
So, ANY of these 12 syntaxes, matches the four lines, below :
1ABC1 12345ABC12345 456ABC456 789ABC789
C) Subroutine calls to a named capturing group :
- Each of the two syntaxes
(?&Name)
or(?P>Name)
represents a subroutine call to the regex pattern of the named capturing group, of name = “Name”, which may be located BEFORE or AFTER, in the regex
So, as there are two forms of named capturing groups and two forms of subroutine calls, the 4 possible syntaxes, below, using the named capturing group Test, would find, for instance, the string ABC, surrounded by non null ranges of digits !
(?<Test>\d+)ABC(?&Test)
,(?<Test>\d+)ABC(?P>Test)
(?'Test'\d+)ABC(?&Test)
,(?'Test'\d+)ABC(?P>Test)
So, ANY of the 4 syntaxes matches the nine lines below :
1ABC1 12345ABC12345 456ABC456 789ABC789 456ABC789 789ABC456 0ABC123456789 0123456789ABC1 111ABC999
And, as the subroutine call can be located BEFORE its associated named capturing group, the 4 syntaxes, below, are also valid ones and would find the nine lines above, too !
(?&Test)ABC(?<Test>\d+)
,(?P>Test)ABC(?<Test>\d+)
(?&Test)ABC(?'Test'\d+)
,(?P>Test)ABC(?'Test'\d+)
D) Reference to named capturing groups, in replacement :
In replacement, any named group
(?<Name>.....)
or(?'Name'.....)
, of the search part, can be re-used with the UNIQUE named syntax :$+{Name}
It’s important to fully understand the fundamental difference between a back-reference and a subroutine call to a group, named or not :
-
A back-reference, to a group, represents the present match of this group
-
A subroutine call, to a group, represents the regex pattern of this group
For instance, the
15
regexes, below :-
(?-i)(?<Test>\d+)ABC\g{Test}
-
(?-i)(?<Test>\d+)ABC\g<Test>
-
(?-i)(?<Test>\d+)ABC\g'Test'
-
(?-i)(?<Test>\d+)ABC\k{Test}
-
(?-i)(?<Test>\d+)ABC\k<Test>
-
(?-i)(?<Test>\d+)ABC\k'Test'
-
(?-i)(?'Test'\d+)ABC\g{Test}
-
(?-i)(?'Test'\d+)ABC\g<Test>
-
(?-i)(?'Test'\d+)ABC\g'Test'
-
(?-i)(?'Test'\d+)ABC\k{Test}
-
(?-i)(?'Test'\d+)ABC\k<Test>
-
(?-i)(?'Test'\d+)ABC\k'Test'
-
(?-i)(?<Test>\d+)ABC\1
-
(?-i)(?'Test'\d+)ABC\1
-
(?-i)(\d+)ABC\1
Would match the first fourth lines, of the above example ( In other words, the numbers, surrounding the string ABC, have to be identical. Indeed, the back-references are, simply, a reference to the present number, preceding the string ABC )
Whereas the
7
regexes, below :-
(?-i)(?<Test>\d+)ABC(?&Test)
-
(?-i)(?'Test'\d+)ABC(?&Test)
-
(?-i)(?<Test>\d+)ABC(?P>Test)
-
(?-i)(?'Test'\d+)ABC(?P>Test)
-
(?<Test>\d+)ABC(?1)
-
(?'Test'\d+)ABC(?1)
-
(\d+)ABC(?1)
Would match the
9
lines of the above example ( In other words, the numbers, surrounding the string ABC, may be different. Indeed, the subroutine calls(?&Test)
and(?P>Test)
are strictly equal to\d+
, the pattern of the group Test ! )Best Regards,
guy038
P.S. :
When a subroutine call is located inside the parentheses of the group to which it refers, it operates as a recursive pattern. But this is an other story … :-)
-
-
@gerdb42 said:
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
Hello
How can I change many lines in one time — BUT in many files.
For example 10000 files in a folder.
For example I search 3 lines and instead of this lines I need to put in 10 different lines.
In former times this was possible with the program Homesite (include regex / wildcard and so on), but they stopped and destroy UTF-8 documents.Can I find MANY lines with REGEX like above ???
The greater problem seemed to be the inclusion of many lines, because the “Search / Replaces in files”-function allows only 1 line.Please help
Mayer -
“Search / Replaces in files”-function allows only 1 line.
Who said so? If you know where line breaks will occur, try
\R
-Pattern as Placeholder. Or check option. finds \r and \n
.In replacement, insert
\r\n
at places where you want line breaks.When using
. finds \r and \n
pay special attention to greedy/non greedy repeats.