Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1

Fra

I’m trying to simplify my regex for this task:

I have this input text:

foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo

The regex replaces the 2nd occurence of this string ‘foo’:

foo

with this new string:

XOO

Currently I’ve used this regex which works:
Find what (with Regular Expression radio button on):

^(.*?foo.*?)foo

Replace with:

$1XOO

Producing this output:
https://regex101.com/r/UmF5wv/1

foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo
foo tar hash 1 XOO gash gesh 3 foo toly 3 vor foo
foo sar wash 1 XOO rash nesh 3 foo koly 3 sor foo

If needing the 3rd occurence instead, just adding the 2nd occurence into the group used with the placeholder works this way:

Input:

foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo

Output:
https://regex101.com/r/hmcZ1l/1

foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo
foo tar hash 1 foo gash gesh 3 XOO toly 3 vor foo
foo sar wash 1 foo rash nesh 3 XOO koly 3 sor foo

Find:

^(.*?foo.*?foo.*?)foo

Replace:

$1XOO

My problem is it can become long with many occurences.

Is there a simpler/shorter way to do it, maybe with some sort of indexing if available?

PeterJones

@Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

^(.*?foo.*?foo.*?)foo

That regex behaves the same as ^((?:.*?foo){2}.*?)foo

so if you want to keep the first 5, and do the replacement on the N+1=6th, it would be ^((?:.*?foo){5}.*?)foo

=>

[URL] regex101.com [.../URL]

Please note that regex101.com doesn’t use the same regex engine as Notepad++'s Boost regex, so there can sometimes be differences in results. (Not in this instance, but in general, one shouldn’t assume that a regex will work at some website and in some unrelated app unless you know that they use the same engine, or it’s a simple enough one that it’s only using the syntax common to both engines.)

----

Useful References

Fra

@PeterJones Nice solution, with minimal editing needed, just what I was looking for!
Thanks for the refs too, I’ll check them out asap!

Nice too the indexing starts from zero:

{0} = 1st occurrence.

^((?:.*?foo){0}.*?)foo

Output:

XOO bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo

{1} = 2nd occurrence.

^((?:.*?foo){1}.*?)foo

Output:

foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo

{2} = 3rd occurrence.

^((?:.*?foo){2}.*?)foo

Output:

foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo poly 3 for foo poly 3 for foo

{3} = 4th occurrence.

^((?:.*?foo){3}.*?)foo

Output:

foo bar bash 1 foo dash mesh 3 foo poly 3 for XOO poly 3 for foo poly 3 for foo

{4} = 5th occurrence.

^((?:.*?foo){4}.*?)foo

Output:

foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for XOO poly 3 for foo

{5} = 6th occurrence.

^((?:.*?foo){5}.*?)foo

Output:

foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for XOO

and so on.

PeterJones

@Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

Nice too the indexing starts from zero:

Personally, I would say that’s a dangerous way to think about it, and it will confuse you as you learn more about Boost regex and capture groups.

The {ℕ} “multiplying operator”, as described in the documentation, is saying there must be ℕ matches of whatever came before the operator: so, in the case of my regex, it is saying "there must be ℕ occurrences of something followed by foo, and all ℕ of those are put into capture group #1; group#1 must be followed by the (ℕ+1)th occurrence of foo, and it’s only that last that is replaced (because you included $1 in the replacement, which means the contents of group#1.

The problem with thinking of 0-indexing in this case is because it will confuse you as you learn more about capture groups. Because capture groups are numbered starting at 1, so (a)(b)(c) will put a into group#1, b into group#2, and c in group#3 – a replacement of $3$1$2 would be cab. Further, “group#1” (referenced as $0 in the replacement) is not one of the captured groups, but is, in fact, the entire match, so replacing with $3$1$2//$0 would give cab//abc. Then you’ll get yourself into trouble thinking that it’s 0-based, because it’s really and truly 1-based.

Fra

@PeterJones thanks a lot for the nuances. Indeed, I first wondered about the difference from the group indexing starting at 1. Then also about the difference from the quantifier ( {n} where n is an integer >= 1 https://www.regular-expressions.info/refquick.html).
Thanks for the $0 group placeholder mention, I wondered about that too, now I understand what it captures.

I understand the regex as this:

Find:

Put everything that preceeds the occurence of interest into a group (1st group referenced by the placeholder with the starting index at 1 ($1) — though there is a placeholder 0 ($0) which references the whole set/string instead of any subgroup of it)).
Exclude the occurence of interest from the that group, but state is a the search delimiter for the regex just outside the group.

Replace with:

Capture the group with it’s placeholder (make a copy of it and store it: $1 = foo / ^((?:.?foo){0}.?) for the 1st occurence (N+1) with index 0).
Use the 2nd/next occurence as external delimiter reference to stop the regex search at (^((?:.?foo){0}.?)foo).
Then append the new value (XOO) to the copied unchanged group.

I think I see what you mean when considering there must always be a 2nd /next occurence for the regex to work so it can’t be starting at zero? While in the background the engine uses a zero based indexing for the 1st element of the occurences series.
0 is the 1st element in the indexes series, 1 is the 2nd and so on.
While for the groups placeholders, 0 isn’t an ordinal reference, it’s an arbitrary reference to the set. The ordinal reference starting at 1 in this case.

I need to check the doc and do more practice to get over the confusing parts!

The quantifier also starting at 1 though index 0 is still valid but return no value (or the whole set but with empty values)?

For example:

19 empty string matches:

[A-Z]{0}
goo A greAS gir PE

https://regex101.com/r/dYnJmE/1

/
[A-Z]{0}
/
gm
Match a single character present in the list below [A-Z]
{0} matches the previous token exactly zero times (causes token to be ignored)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags 
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)


0-0	empty string
1-1	empty string
2-2	empty string
3-3	empty string
4-4	empty string
5-5	empty string
6-6	empty string
7-7	empty string
8-8	empty string
9-9	empty string
10-10	empty string
11-11	empty string
12-12	empty string
13-13	empty string
14-14	empty string
15-15	empty string
16-16	empty string
17-17	empty string
18-18	empty string

No match/invalid:

[A-Z]{}
goo A greAS gir PE

https://regex101.com/r/CtqQ0D/1

/
[A-Z]{}
/
gm
Match a single character present in the list below [A-Z]
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
{}
 matches the characters {} literally (case sensitive)
Global pattern flags 
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Your regular expression does not match the subject string.

5 matches:

[A-Z]{1}
goo A greAS gir PE

https://regex101.com/r/MImsNL/1

/
[A-Z]{1}
/
gm
Match a single character present in the list below [A-Z]
{1} matches the previous token exactly one time (meaningless quantifier)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags 
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)


4-5	A
9-10	A
10-11	S
16-17	P
17-18	E

2 matches:

[A-Z]{2}
goo A greAS gir PE

https://regex101.com/r/p1WOWQ/1

/
[A-Z]{2}
/
gm
Match a single character present in the list below [A-Z]
{2} matches the previous token exactly 2 times
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Global pattern flags 
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)


9-11	AS
16-18	PE