what are list Regex available "named capture group" for find and replace in current version
-
-
Hello, @jergen-ross-estaco, @peterjones and All,
First, I advice you to read these two posts, first :
https://community.notepad-plus-plus.org/post/15930
https://community.notepad-plus-plus.org/post/52715
As you can see, in replacement, the following regexes :
-
\g{size} \g{size_type} | \g{file}\g{type}\r\n\t\g{path}\r\n
-
\k{size} \k{size_type} | \k{file}\k{type}\r\n\t\k{path}\r\n
-
\g<size> \g<size_type> | \g<file>\g<type>\r\n\t\g<path>\r\n
-
\k<size> \k<size_type> | \k<file>\k<type>\r\n\t\k<path>\r\n
are totaly invalid !
The only BOOST valid syntax, for named capturing groups, in the replacement regex, is :
$+{size} $+{size_type} | $+{file}$+{type}\r\n\t$+{path}\r\n
Here is, below, your search regex, slighly modified, using the free-spacing mode
(?x)
, which can easily be modified, if necessary :In some parts, I used non-capturing groups, in order that you may use numbered groups, instead of the named groups, in the replacement regex
SEARCH
(?x-s) # FREE-SPACING mode + DOT matches STANDARD chars ONLY, not EOL ^ [\t\x20]* # OPTIONAL range of TAB/SPACE chars (?<size> # BEGINNING of the first NAMED group "size" [ or (?'size' ] (?: # NON-CAPTURING group [+-]? # OPTIONAL sign + or - (?: # NON-CAPTURING group [0-9]{1,3} # From 1 to 3 DIGITS (?: # NON-CAPTURING group ,[0-9]{3} # COMMA followed with THREE digits )* # END, REPEATED from 0 to MORE | # OR \d+ # ONE or MORE digits ) # END (?: # NON-CAPTURING group \. [0-9]+ # a DOT followed with some DIGITS )? # END of the OPTIONAL part ) # END | # OR \d* \. \d+ # OPTIONAL INTEGER part followed with a DOT and some DIGITS ) # END of the first NAMED group [\t\x20]* # OPTIONAL range of TAB/SPACE chars (?<size_type> # BEGINNING of the second NAMED group "size_type" [ or (?'size_type' ] (?i) gb|mb|m|g # String 'gb' or 'mb' or 'm' or 'g', INSENSITIVE ) # END of the second NAMED group [\t\x20]* # OPTIONAL TAB or SPACE characters (?<path> # BEGINNING of the third NAMED group "path" [ or (?'path' ] (?i:C|D): .+ \\ # EXACT string 'C' or 'D' followed by the GREATEST range of STANDARD chars till the LAST ANTI-SLASH of CURRENT line ) # END of the third NAMED group (?<file> # BEGINNING of the fourth NAMED group "file" [ or (?'file' ] .+ # The GREATEST range of STANDARD chars till... ) # END of the fourth NAMED group (?<type> # BEGINNING of the fifth NAMED group "type" [ or (?'type' ] \. .+ # The LAST '.' char of CURRENT line, followed with the GREATEST range of STANDARD chars, till the END of LINE ) # END of the fifth NAMED group
REPLACE
$+{size} $+{size_type} | $+{file}$+{type}\r\n\t$+{path}\r\n
or
REPLACE
\1 \2 | \4\5\r\n\t\3\r\n
Note that, without the free-spacing mode
(?x)
, the search regex becomes :(?-s)^[\t\x20]*(?<size>(?:[+-]?(?:[0-9]{1,3}(?:,[0-9]{3})*|\d+)(?:\.[0-9]+)?)|\d*\.\d+)[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D):.+\\)(?<file>.+)(?<type>\..+)
So, from this INPUT text :
5.7 GB D:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv
whatever the search and replacement regex used, you should always get this OUTPUT text :
5.7 GB | Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv D:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\
If you decide to use the regex with the free-spacing mode, simply select all the text between
(?x-s)
and# END of the fifth NAMED group
(2,045
bytes ). Note that the maximum of bytes, for the search field, is limited to2,046
bytes !
In summary, the
6
syntaxes\g{name}
,\g<name>
,\g'name'
,\k{name}
,\k<name>
and\k'name'
are not deprecated but can only be used in the search part ( not in replacement ) !Best Regards
guy038
-
-
@guy038 said in what are list Regex available "named capture group" for find and replace in current version:
As you can see, in replacement, the following regexes … are totally invalid
I should have remembered that. We even added a very clear note in the replacement section of the user manual:
Please note: the \g… and \k… backreference syntaxes only work in the search expression, and are not designed or intended to work in the substitution/replacement expression.
There is a similar note in the searching section’s “numbered backreference”:
Numbered Backreference: These syntaxes match the ℕth capture group earlier in the same expression. (Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.)
… but apparently not in the [“named backreference”](Named Backreference)… I guess it needs to be spelled out there, too, even thought it’s just a few lines down in the docs.
-
@guy038 and @PeterJones.
First, I advice you to read these two posts, first :
I have already read since I tired searched before this post
I didn’t read non-capturing, am I missed read?,
also I didn’t know\x20
instead\s
.\x20
is necessary aka space character? but why\s
?you wrote “x” from
?x-s
(?x-s) # FREE-SPACING mode + DOT matches STANDARD chars ONLY, not EOL
but you didn’t write “x” from
(?-s)
(?-s)^[\t\x20]*(?<size>(?:[+-]?(?:[0-9]{1,3}(?:,[0-9]{3})*|\d+)(?:\.[0-9]+)?)|\d*\.\d+)[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D):.+\\)(?<file>.+)(?<type>\..+)
I like that name group with start group subexp
example like my regex previous:
(?<name>subexp)
and your regex
(?<name>(subexp))
btw, now I can’t figure out when search can’t match part line
4.0 GB C:\pagefile.sys 1.5 GB C:\hiberfil.sys
also can’t match file type will skipped line because they don’t have extension like
100.0 MB C:\Users\Username\AppData\Local\Google\Chrome Beta\User Data\Default\Cache\Cache_Data\data_3
you can help me?
If you can’t? please disregard this, fine.If you decide to use the regex with the free-spacing mode, simply select all the text between (?x-s) and # END of the fifth NAMED group ( 2,045 bytes ). Note that the maximum of bytes, for the search field, is limited to 2,046 bytes !
How I do check get bytes in search field?
anyway, thank you for help and answer.
-
Hi, @jergen-ross-estaco,
Before answering your questions, in a next post, could you tell me which of the four cases, below, will never happen ?
A 5.7 GB D:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv B 5.7GB D:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv C 5.7 GBD:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv D 5.7GBD:\Movies by Jen\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP\Memory.2022.1080p.WEBRip.DD5.1.x264-NOGRP.mkv
I also revisited my post and change the part of the regex regarding the size, as below :
(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})*|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))
For instance, this regex matches all the following cases, from
A
toZ
and@
A 1 E +1 I -1 B 12345 F +12345 J -12345 C 1234.0123 G +1234.0123 K -1234.0123 D 123456.789 H +123456.789 L -123456.789 M 1,234 P +1,234 S -1,234 N 12,345.01 Q +12,345.01 T -12,345.01 O 1,234,567.0123 R +1,234,567.0123 U -1,234,567.0123 V .0 X +.0 Z -.0 W .01234 Y +.01234 @ -.01234
In the same way, could you tell me which cases are sure to not happen ?
I’ll provide you a correct and final regex solution, very soon !
See you later,
Best Regards,
guy038
-
@guy038, @PeterJones and all.
I trying what I said my explain. you know I’m deaf because slow learn, mistake grammar and sentence or bad? but I hope u understand me.
Before answering your questions, in a next post, could you tell me which of the four cases, below, will never happen ?
I tested use regex to four cases are look fine working because ‘
[\t\x20]
’ if no space will no failed and next step. but look similar ‘\s
’.I also revisited my post and change the part of the regex regarding the size, as below :
(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})*|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))
For instance, this regex matches all the following cases, from A to Z and @
…
In the same way, could you tell me which cases are sure to not happen ?‘
A
,E
,I
,M-Z
and@
’ are correctnot all, ‘
B-C
,F-H
, andJ-L
’ are match two group that “123” and any digit. if when you use find and replace “foo” will result is “foofoo” meanwhile “123” and “4.0123” to “foo” “foo”input and output:
A 1 E +1 I -1 B 12345 F +12345 J -12345 C 1234.0123 G +1234.0123 K -1234.0123 D 123456.789 H +123456.789 L -123456.789 M 1,234 P +1,234 S -1,234 N 12,345.01 Q +12,345.01 T -12,345.01 O 1,234,567.0123 R +1,234,567.0123 U -1,234,567.0123 V .0 X +.0 Z -.0 W .01234 Y +.01234 @ -.01234 A foo E foo I foo B foofoo F foofoo J foofoo C foofoo G foofoo K foofoo D foofoo H foofoo L foofoo M foo P foo S foo N foo Q foo T foo O foo R foo U foo V foo X foo Z foo W foo Y foo @ foo
I fixed replaced ‘
*
’ to ‘+
’ from where is end of comma and three digit group ‘(?:,[0-9]{3})+
’ will matched “1234.0123”.(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))
if
*
is zero or more if no comma or more will match passed.
+
is one or more which required at least one comma or more if no comma will next step to ‘|\d+
’ is alternative match any number and decimal if don’t have comma separated number instead.input and output result: all matched without double “foo” also Replace “number” to “foo” are corrected
A 1 E +1 I -1 B 12345 F +12345 J -12345 C 1234.0123 G +1234.0123 K -1234.0123 D 123456.789 H +123456.789 L -123456.789 M 1,234 P +1,234 S -1,234 N 12,345.01 Q +12,345.01 T -12,345.01 O 1,234,567.0123 R +1,234,567.0123 U -1,234,567.0123 V .0 X +.0 Z -.0 W .01234 Y +.01234 @ -.01234 A foo E foo I foo B foo F foo J foo C foo G foo K foo D foo H foo L foo M foo P foo S foo N foo Q foo T foo O foo R foo U foo V foo X foo Z foo W foo Y foo @ foo
btw, I changed that size put my regex.
I realized mentioned that
btw, now I can’t figure out when search can’t match part line
4.0 GB C:\pagefile.sys 1.5 GB C:\hiberfil.sys
I mean There is no folder or it’s drive letter but I need figure out match ‘
c:/
’ for ‘(?<path>)
’ and now,
I have solved my myself that removed ‘:
’ before dot character from ‘(?i:C|D):.+\\)
’ because already “+
” is one or more, this is at least match one character which is matched ‘\
’ will start current position after ‘\
’ when ‘\\
’ will searching ‘\
’ character until start current position but not found match ‘\
’ error , so removed ‘:
’ become when ‘+
’ is matched to ‘:
’ will start current position after ‘:
’, found match ‘\
’ from ‘C:\
’ It will worked match zero folder or more level. (c:\ or c:\folder1\subfolder\subfolder2\...
)Regex:
(?<path>(?i:C|D).+\\)
and again I mentioned that
also can’t match file type will skipped line because they don’t have extension like
100.0 MB C:\Users\Username\AppData\Local\Google\Chrome Beta\User Data\Default\Cache\Cache_Data\data_3
I added lookaround and alternative ‘
(?<file>.+(?=\.)|.+)(?<type>(?:\..+)?$)
’ will match filename if have extension or not.(?<file>...)
.+(?=\.)
is one or more any character before positive lookahead with match ‘.
’ dot character. It’s match filename before dot character as file extension if no dot character will next step to ‘|.+
’ is alternative match filename if don’t have extension instead.(?<type>...)
\..+)?$
is optional dot character and one or more any character. as optional file extension.
I finally made my regex :
(?-xs)^[\t\x20]*(?<size>[+-]?(?:(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.\d+)?|\.\d+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D).+\\)(?<file>.+(?=\.)|.+)(?<type>(?:\..+)?$) or (?-xs)^[\t\x20]*(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D).+\\)(?<file>.+(?=\.)|.+)(?<type>(?:\..+)?$)
completed found all matched lines 100/100 in my input text.
4.0 GB C:\pagefile.sys 1.8 GB C:\Users\Jhecrose\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG.mp4 1.5 GB C:\hiberfil.sys 900.1 MB C:\Games\MBAACC\0002.p 851.8 MB C:\$MFT 569.6 MB C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\1504837401.vpk
after replaced, result output:
4.0 GB | pagefile.sys C:\ 1.8 GB | Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG.mp4 C:\Users\Username\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\ 1.5 GB | hiberfil.sys C:\ 900.1 MB | 0002.p C:\Games\MBAACC\ 851.8 MB | $MFT C:\ 569.6 MB | 1504837401.vpk C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\
I’m sorry if you confuse when you read this.
thank you, so much! this case is close now but I’m waiting for my pervious question. -
Hello, @jergen-ross-estaco, @peterjones and All,
I ended up with this search regex :
SEARCH
(?-s)^[\t\x20]*(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D):.*\\)(?<file>.+)(?<type>\..+)?
This regex is slightly shorter than yours :
-
I removed the useless
x
modifier, at beginning of the regex, as the regex is mono line ! -
I kept your regexes regarding the
<size>
and<size_type>
named groups -
I modified the
<path>
,<file>
and<type>
named groups as below :-
I used a
*
after the part(?i:C|D):
, needed when files are located right under the root -
I added a
?
, right after the named group(?<type>\..+)
, for the case of files without extension
-
Like you, my regex version matches all the cases of your INPUT file :
4.0 GB C:\pagefile.sys 1.8 GB C:\Users\Jhecrose\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG.mp4 1.5 GB C:\hiberfil.sys 900.1 MB C:\Games\MBAACC\0002.p 851.8 MB C:\$MFT 569.6 MB C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\1504837401.vpk
Now, regarding your question :
How I do check get bytes in search field?
It’s quite easy : Select a range of characters and, simply, look right after the indication
Sel :
, in the status bar, at bottom of the Notepad++ window !Best Regards
guy038
-
-
Hi @guy038,
(?<type>\..+)?
Same I tried but I have problem
(?<type>)
won’t match instead both file and extension will match(?<file>)
cause this is one.+
will start current position to end of line will(?<type>)
is not found match which is searching to current end of line or won’t search back to steps character.try test what’s going happen:
Replace:File:\x20\4\r\ntype:\x20\5\r\n
expected behavior:
I used my regex Find:
(?-s)^[\t\x20]*(?<size>[+-]?(?:(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.\d+)?|\.\d+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D).+\\)(?<file>.+(?=\.)|.+)(?<type>(?:\..+)?)$
File: The.Sea.Beast.2022.1080p.WEBRip.x264-RARBG type: .mp4 File: The.Unbearable.Weight.of.Massive.Talent.2022.1080p.WEBRip.x264-RARBG type: .mp4 File: Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG type: .mp4 File: hiberfil type: .sys
actual behavior:
used your regex:
File: The.Sea.Beast.2022.1080p.WEBRip.x264-RARBG.mp4 type: File: The.Unbearable.Weight.of.Massive.Talent.2022.1080p.WEBRip.x264-RARBG.mp4 type: File: Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG.mp4 type: File: hiberfil.sys type:
you have tried debugger in regex? better this, I find out happen regex getting error or problem. btw, I’m using two regex editor are:
- for regex101.com, quick hilghlight match and group and very easily debugger show one by one steps is useful
- for notepad++, before use test output depends their boost engine.
It’s quite easy : Select a range of characters and, simply, look right after the indication Sel :, in the status bar, at bottom of the Notepad++ window !
uh, I was just looking
sel :
is just number?! or is that bytes? I didn’t seebytes
-
Hi, @jergen-ross-estaco, @peterjones and All,
You are perfectly right about my regex : it did not respect the named groups :-((
I finally succeeded to build a correct search regex, (
2
characters longer than yours, if I subsitute the\d
by the[0-9]
syntax ! )It uses a particular feature, not very-well known : the
Branch Reset
mechanism, with the(?|pattern_1|pattern_2|....|pattern_N)
syntaxRefer to this link for further explanations :
So, given this INPUT text :
4.0 GB C:\pagefile.sys 1.8 GB C:\Users\Jhecrose\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG.mp4 1.5 GB C:\hiberfil.sys 900.1 MB C:\Games\MBAACC\0002.p 851.8 MB C:\$MFT 569.6 MB C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\1504837401.vpk 1.8 MB C:\pqr.tuv.xyz.123 1.8 MB C:\xyz.123 1.8 MB C:\xyz 1.8 MB C:\abc\def\ghi\pqr.tuv.xyz.123 1.8 MB C:\abc\def\ghi\xyz.123 1.8 MB C:\abc\def\ghi\xyz
The following regex S/R :
SEARCH
(?-s)^[\t\x20]*(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D):.*\\)(?|(?<file>.+)(?<type>\..+)|([^.\r\n]+))
REPLACE
Path : \3\r\nFile : \4\r\nExtension : \5\r\n
will give this OUTPUT text :
Path : C:\ File : pagefile Extension : .sys Path : C:\Users\Jhecrose\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\ File : Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG Extension : .mp4 Path : C:\ File : hiberfil Extension : .sys Path : C:\Games\MBAACC\ File : 0002 Extension : .p Path : C:\ File : $MFT Extension : Path : C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\ File : 1504837401 Extension : .vpk Path : C:\ File : pqr.tuv.xyz Extension : .123 Path : C:\ File : xyz Extension : .123 Path : C:\ File : xyz Extension : Path : C:\abc\def\ghi\ File : pqr.tuv.xyz Extension : .123 Path : C:\abc\def\ghi\ File : xyz Extension : .123 Path : C:\abc\def\ghi\ File : xyz Extension :
If you prefer to use the named groups in replacement, the following regex S/R :
SEARCH
(?-s)^[\t\x20]*(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<path>(?i:C|D):.*\\)(?|(?<file>.+)(?<type>\..+)|([^.\r\n]+))
REPLACE
Path : $+{path}\r\nFile : $+{file}\r\nExtension : $+{type}\r\n
will return the same OUTPUT text :
Path : C:\ File : pagefile Extension : .sys Path : C:\Users\Jhecrose\Downloads\uTorrent\Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG\ File : Paws.of.Fury.The.Legend.of.Hank.2022.1080p.WEBRip.x264-RARBG Extension : .mp4 Path : C:\ File : hiberfil Extension : .sys Path : C:\Games\MBAACC\ File : 0002 Extension : .p Path : C:\ File : $MFT Extension : Path : C:\Program Files (x86)\Steam\steamapps\common\Left 4 Dead 2\left4dead2\addons\workshop\ File : 1504837401 Extension : .vpk Path : C:\ File : pqr.tuv.xyz Extension : .123 Path : C:\ File : xyz Extension : .123 Path : C:\ File : xyz Extension : Path : C:\abc\def\ghi\ File : pqr.tuv.xyz Extension : .123 Path : C:\abc\def\ghi\ File : xyz Extension : .123 Path : C:\abc\def\ghi\ File : xyz Extension :
To end with, in the status bar, the
Sel :
field gives you the number of characters ( not bytes) of the current selection !BR
guy038
P.S. :
So, in the part
(?|(?<file>.+)(?<type>\..+)|([^.\r\n]+))
:-
(?<file>.+)
represents the named groupfile
( group4
) -
(?<type>\..+)
represents the named grouptype
( group5
) -
([^.\r\n]+)
represents the group4
, too, due to the branch reset syntax
-
-
@guy038 said in what are list Regex available "named capture group" for find and replace in current version:
Refer to this link for further explanations :
https://www.boost.org/doc/libs/1_78_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.branch_resetWhy not this LINK as well?: