Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1
-
I’m trying to simplify my regex for this task:
I have this input text:
foo bar bash 1 foo dash mesh 3 foo poly 3 for foo foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
The regex replaces the 2nd occurence of this string ‘foo’:
foo
with this new string:
XOO
Currently I’ve used this regex which works:
Find what (with Regular Expression radio button on):^(.*?foo.*?)foo
Replace with:
$1XOO
Producing this output:
https://regex101.com/r/UmF5wv/1foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo foo tar hash 1 XOO gash gesh 3 foo toly 3 vor foo foo sar wash 1 XOO rash nesh 3 foo koly 3 sor foo
If needing the 3rd occurence instead, just adding the 2nd occurence into the group used with the placeholder works this way:
Input:
foo bar bash 1 foo dash mesh 3 foo poly 3 for foo foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
Output:
https://regex101.com/r/hmcZ1l/1foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo foo tar hash 1 foo gash gesh 3 XOO toly 3 vor foo foo sar wash 1 foo rash nesh 3 XOO koly 3 sor foo
Find:
^(.*?foo.*?foo.*?)foo
Replace:
$1XOO
My problem is it can become long with many occurences.
Is there a simpler/shorter way to do it, maybe with some sort of indexing if available?
-
@Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:
^(.*?foo.*?foo.*?)foo
That regex behaves the same as
^((?:.*?foo){2}.*?)foo
so if you want to keep the first 5, and do the replacement on the N+1=6th, it would be
^((?:.*?foo){5}.*?)foo
=>
[URL] regex101.com [.../URL]
Please note that
regex101.com
doesn’t use the same regex engine as Notepad++'s Boost regex, so there can sometimes be differences in results. (Not in this instance, but in general, one shouldn’t assume that a regex will work at some website and in some unrelated app unless you know that they use the same engine, or it’s a simple enough one that it’s only using the syntax common to both engines.)----
Useful References
-
@PeterJones Nice solution, with minimal editing needed, just what I was looking for!
Thanks for the refs too, I’ll check them out asap!Nice too the indexing starts from zero:
{0} = 1st occurrence.
^((?:.*?foo){0}.*?)foo
Output:
XOO bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
{1} = 2nd occurrence.
^((?:.*?foo){1}.*?)foo
Output:
foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
{2} = 3rd occurrence.
^((?:.*?foo){2}.*?)foo
Output:
foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo poly 3 for foo poly 3 for foo
{3} = 4th occurrence.
^((?:.*?foo){3}.*?)foo
Output:
foo bar bash 1 foo dash mesh 3 foo poly 3 for XOO poly 3 for foo poly 3 for foo
{4} = 5th occurrence.
^((?:.*?foo){4}.*?)foo
Output:
foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for XOO poly 3 for foo
{5} = 6th occurrence.
^((?:.*?foo){5}.*?)foo
Output:
foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for XOO
and so on.
-
@Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:
Nice too the indexing starts from zero:
Personally, I would say that’s a dangerous way to think about it, and it will confuse you as you learn more about Boost regex and capture groups.
The
{ℕ}
“multiplying operator”, as described in the documentation, is saying there must be ℕ matches of whatever came before the operator: so, in the case of my regex, it is saying "there must be ℕ occurrences of something followed byfoo
, and all ℕ of those are put into capture group #1; group#1 must be followed by the (ℕ+1)th occurrence offoo
, and it’s only that last that is replaced (because you included$1
in the replacement, which means the contents of group#1.The problem with thinking of 0-indexing in this case is because it will confuse you as you learn more about capture groups. Because capture groups are numbered starting at 1, so
(a)(b)(c)
will puta
into group#1,b
into group#2, andc
in group#3 – a replacement of$3$1$2
would becab
. Further, “group#1” (referenced as$0
in the replacement) is not one of the captured groups, but is, in fact, the entire match, so replacing with$3$1$2//$0
would givecab//abc
. Then you’ll get yourself into trouble thinking that it’s 0-based, because it’s really and truly 1-based. -
@PeterJones thanks a lot for the nuances. Indeed, I first wondered about the difference from the group indexing starting at 1. Then also about the difference from the quantifier ( {n} where n is an integer >= 1 https://www.regular-expressions.info/refquick.html).
Thanks for the $0 group placeholder mention, I wondered about that too, now I understand what it captures.I understand the regex as this:
Find:
- Put everything that preceeds the occurence of interest into a group (1st group referenced by the placeholder with the starting index at 1 ($1) — though there is a placeholder 0 ($0) which references the whole set/string instead of any subgroup of it)).
- Exclude the occurence of interest from the that group, but state is a the search delimiter for the regex just outside the group.
Replace with:
- Capture the group with it’s placeholder (make a copy of it and store it: $1 = foo / ^((?:.?foo){0}.?) for the 1st occurence (N+1) with index 0).
- Use the 2nd/next occurence as external delimiter reference to stop the regex search at (^((?:.?foo){0}.?)foo).
- Then append the new value (XOO) to the copied unchanged group.
I think I see what you mean when considering there must always be a 2nd /next occurence for the regex to work so it can’t be starting at zero? While in the background the engine uses a zero based indexing for the 1st element of the occurences series.
0 is the 1st element in the indexes series, 1 is the 2nd and so on.
While for the groups placeholders, 0 isn’t an ordinal reference, it’s an arbitrary reference to the set. The ordinal reference starting at 1 in this case.I need to check the doc and do more practice to get over the confusing parts!
The quantifier also starting at 1 though index 0 is still valid but return no value (or the whole set but with empty values)?
For example:
19 empty string matches:
[A-Z]{0} goo A greAS gir PE
https://regex101.com/r/dYnJmE/1
/ [A-Z]{0} / gm Match a single character present in the list below [A-Z] {0} matches the previous token exactly zero times (causes token to be ignored) A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) Global pattern flags g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
0-0 empty string 1-1 empty string 2-2 empty string 3-3 empty string 4-4 empty string 5-5 empty string 6-6 empty string 7-7 empty string 8-8 empty string 9-9 empty string 10-10 empty string 11-11 empty string 12-12 empty string 13-13 empty string 14-14 empty string 15-15 empty string 16-16 empty string 17-17 empty string 18-18 empty string
No match/invalid:
[A-Z]{} goo A greAS gir PE
https://regex101.com/r/CtqQ0D/1
/ [A-Z]{} / gm Match a single character present in the list below [A-Z] A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) {} matches the characters {} literally (case sensitive) Global pattern flags g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Your regular expression does not match the subject string.
5 matches:
[A-Z]{1} goo A greAS gir PE
https://regex101.com/r/MImsNL/1
/ [A-Z]{1} / gm Match a single character present in the list below [A-Z] {1} matches the previous token exactly one time (meaningless quantifier) A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) Global pattern flags g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
4-5 A 9-10 A 10-11 S 16-17 P 17-18 E
2 matches:
[A-Z]{2} goo A greAS gir PE
https://regex101.com/r/p1WOWQ/1
/ [A-Z]{2} / gm Match a single character present in the list below [A-Z] {2} matches the previous token exactly 2 times A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) Global pattern flags g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
9-11 AS 16-18 PE