@PeterJones thanks a lot for the nuances. Indeed, I first wondered about the difference from the group indexing starting at 1. Then also about the difference from the quantifier ( {n} where n is an integer >= 1 https://www.regular-expressions.info/refquick.html).
Thanks for the $0 group placeholder mention, I wondered about that too, now I understand what it captures.
I understand the regex as this:
Find:
Put everything that preceeds the occurence of interest into a group (1st group referenced by the placeholder with the starting index at 1 ($1) — though there is a placeholder 0 ($0) which references the whole set/string instead of any subgroup of it)).
Exclude the occurence of interest from the that group, but state is a the search delimiter for the regex just outside the group.
Replace with:
Capture the group with it’s placeholder (make a copy of it and store it: $1 = foo / ^((?:.?foo){0}.?) for the 1st occurence (N+1) with index 0).
Use the 2nd/next occurence as external delimiter reference to stop the regex search at (^((?:.?foo){0}.?)
foo).
Then append the new value (XOO) to the copied unchanged group.
I think I see what you mean when considering there must always be a 2nd /next occurence for the regex to work so it can’t be starting at zero? While in the background the engine uses a zero based indexing for the 1st element of the occurences series.
0 is the 1st element in the indexes series, 1 is the 2nd and so on.
While for the groups placeholders, 0 isn’t an ordinal reference, it’s an arbitrary reference to the set. The ordinal reference starting at 1 in this case.
I need to check the doc and do more practice to get over the confusing parts!
The quantifier also starting at 1 though index 0 is still valid but return no value (or the whole set but with empty values)?
For example:
19 empty string matches:
0.gif
[A-Z]{0}
goo A greAS gir PE
https://regex101.com/r/dYnJmE/1
/
[A-Z]{0}
/
gm
Match a single character present in the list below [A-Z]
{0} matches the previous token exactly zero times (causes token to be ignored)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
0-0 empty string
1-1 empty string
2-2 empty string
3-3 empty string
4-4 empty string
5-5 empty string
6-6 empty string
7-7 empty string
8-8 empty string
9-9 empty string
10-10 empty string
11-11 empty string
12-12 empty string
13-13 empty string
14-14 empty string
15-15 empty string
16-16 empty string
17-17 empty string
18-18 empty string
No match/invalid:
1.gif
[A-Z]{}
goo A greAS gir PE
https://regex101.com/r/CtqQ0D/1
/
[A-Z]{}
/
gm
Match a single character present in the list below [A-Z]
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
{}
matches the characters {} literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Your regular expression does not match the subject string.
5 matches:
2.gif
[A-Z]{1}
goo A greAS gir PE
https://regex101.com/r/MImsNL/1
/
[A-Z]{1}
/
gm
Match a single character present in the list below [A-Z]
{1} matches the previous token exactly one time (meaningless quantifier)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
4-5 A
9-10 A
10-11 S
16-17 P
17-18 E
2 matches:
3.gif
[A-Z]{2}
goo A greAS gir PE
https://regex101.com/r/p1WOWQ/1
/
[A-Z]{2}
/
gm
Match a single character present in the list below [A-Z]
{2} matches the previous token exactly 2 times
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
9-11 AS
16-18 PE