• Login
Community
  • Login

Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1

Scheduled Pinned Locked Moved General Discussion
5 Posts 2 Posters 205 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F
    Fra
    last edited by Jun 25, 2025, 7:00 PM

    I’m trying to simplify my regex for this task:

    I have this input text:

    foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
    foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
    foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
    

    The regex replaces the 2nd occurence of this string ‘foo’:

    foo
    

    with this new string:

    XOO
    

    Currently I’ve used this regex which works:
    Find what (with Regular Expression radio button on):

    ^(.*?foo.*?)foo
    

    Replace with:

    $1XOO
    

    Producing this output:
    https://regex101.com/r/UmF5wv/1

    foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo
    foo tar hash 1 XOO gash gesh 3 foo toly 3 vor foo
    foo sar wash 1 XOO rash nesh 3 foo koly 3 sor foo
    

    If needing the 3rd occurence instead, just adding the 2nd occurence into the group used with the placeholder works this way:

    Input:

    foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
    foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
    foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
    

    Output:
    https://regex101.com/r/hmcZ1l/1

    foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo
    foo tar hash 1 foo gash gesh 3 XOO toly 3 vor foo
    foo sar wash 1 foo rash nesh 3 XOO koly 3 sor foo
    

    Find:

    ^(.*?foo.*?foo.*?)foo
    

    Replace:

    $1XOO
    

    My problem is it can become long with many occurences.

    Is there a simpler/shorter way to do it, maybe with some sort of indexing if available?

    P 1 Reply Last reply Jun 25, 2025, 7:39 PM Reply Quote 0
    • P
      PeterJones @Fra
      last edited by PeterJones Jun 25, 2025, 7:41 PM Jun 25, 2025, 7:39 PM

      @Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

      ^(.*?foo.*?foo.*?)foo

      That regex behaves the same as ^((?:.*?foo){2}.*?)foo

      so if you want to keep the first 5, and do the replacement on the N+1=6th, it would be ^((?:.*?foo){5}.*?)foo

      4720ec42-6c84-4fb9-ad2d-05ca37827ec2-image.png
      =>
      ff24f7d9-5f85-4a76-90c2-7dfece068dd7-image.png

      [URL] regex101.com [.../URL]

      Please note that regex101.com doesn’t use the same regex engine as Notepad++'s Boost regex, so there can sometimes be differences in results. (Not in this instance, but in general, one shouldn’t assume that a regex will work at some website and in some unrelated app unless you know that they use the same engine, or it’s a simple enough one that it’s only using the syntax common to both engines.)

      ----

      Useful References

      • Notepad++ Online User Manual: Searching/Regex
      • FAQ: Where to find other regular expressions (regex) documentation
      F 1 Reply Last reply Jun 25, 2025, 8:01 PM Reply Quote 1
      • F
        Fra @PeterJones
        last edited by Jun 25, 2025, 8:01 PM

        @PeterJones Nice solution, with minimal editing needed, just what I was looking for!
        Thanks for the refs too, I’ll check them out asap!

        Nice too the indexing starts from zero:

        {0} = 1st occurrence.

        ^((?:.*?foo){0}.*?)foo
        

        Output:

        XOO bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
        

        {1} = 2nd occurrence.

        ^((?:.*?foo){1}.*?)foo
        

        Output:

        foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
        

        {2} = 3rd occurrence.

        ^((?:.*?foo){2}.*?)foo
        

        Output:

        foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo poly 3 for foo poly 3 for foo
        

        {3} = 4th occurrence.

        ^((?:.*?foo){3}.*?)foo
        

        Output:

        foo bar bash 1 foo dash mesh 3 foo poly 3 for XOO poly 3 for foo poly 3 for foo
        

        {4} = 5th occurrence.

        ^((?:.*?foo){4}.*?)foo
        

        Output:

        foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for XOO poly 3 for foo
        

        {5} = 6th occurrence.

        ^((?:.*?foo){5}.*?)foo
        

        Output:

        foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for XOO
        

        and so on.

        P 1 Reply Last reply Jun 25, 2025, 8:12 PM Reply Quote 0
        • P
          PeterJones @Fra
          last edited by Jun 25, 2025, 8:12 PM

          @Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

          Nice too the indexing starts from zero:

          Personally, I would say that’s a dangerous way to think about it, and it will confuse you as you learn more about Boost regex and capture groups.

          The {ℕ} “multiplying operator ”, as described in the documentation, is saying there must be ℕ matches of whatever came before the operator: so, in the case of my regex, it is saying "there must be ℕ occurrences of something followed by foo, and all ℕ of those are put into capture group #1; group#1 must be followed by the (ℕ+1)th occurrence of foo, and it’s only that last that is replaced (because you included $1 in the replacement, which means the contents of group#1.

          The problem with thinking of 0-indexing in this case is because it will confuse you as you learn more about capture groups. Because capture groups are numbered starting at 1, so (a)(b)(c) will put a into group#1, b into group#2, and c in group#3 – a replacement of $3$1$2 would be cab. Further, “group#1” (referenced as $0 in the replacement) is not one of the captured groups, but is, in fact, the entire match, so replacing with $3$1$2//$0 would give cab//abc. Then you’ll get yourself into trouble thinking that it’s 0-based, because it’s really and truly 1-based.

          F 1 Reply Last reply Jun 25, 2025, 10:26 PM Reply Quote 3
          • F
            Fra @PeterJones
            last edited by Jun 25, 2025, 10:26 PM

            @PeterJones thanks a lot for the nuances. Indeed, I first wondered about the difference from the group indexing starting at 1. Then also about the difference from the quantifier ( {n} where n is an integer >= 1 https://www.regular-expressions.info/refquick.html ).
            Thanks for the $0 group placeholder mention, I wondered about that too, now I understand what it captures.

            I understand the regex as this:

            Find:

            1. Put everything that preceeds the occurence of interest into a group (1st group referenced by the placeholder with the starting index at 1 ($1) — though there is a placeholder 0 ($0) which references the whole set/string instead of any subgroup of it)).
            2. Exclude the occurence of interest from the that group, but state is a the search delimiter for the regex just outside the group.

            Replace with:

            1. Capture the group with it’s placeholder (make a copy of it and store it: $1 = foo / ^((?:.?foo){0}.?) for the 1st occurence (N+1) with index 0).
            2. Use the 2nd/next occurence as external delimiter reference to stop the regex search at (^((?:.?foo){0}.?)foo).
            3. Then append the new value (XOO) to the copied unchanged group.

            I think I see what you mean when considering there must always be a 2nd /next occurence for the regex to work so it can’t be starting at zero? While in the background the engine uses a zero based indexing for the 1st element of the occurences series.
            0 is the 1st element in the indexes series, 1 is the 2nd and so on.
            While for the groups placeholders, 0 isn’t an ordinal reference, it’s an arbitrary reference to the set. The ordinal reference starting at 1 in this case.

            I need to check the doc and do more practice to get over the confusing parts!

            The quantifier also starting at 1 though index 0 is still valid but return no value (or the whole set but with empty values)?

            For example:

            19 empty string matches:

            0.gif

            [A-Z]{0}
            goo A greAS gir PE
            

            https://regex101.com/r/dYnJmE/1

            /
            [A-Z]{0}
            /
            gm
            Match a single character present in the list below [A-Z]
            {0} matches the previous token exactly zero times (causes token to be ignored)
            A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
            Global pattern flags 
            g modifier: global. All matches (don't return after first match)
            m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
            
            
            0-0	empty string
            1-1	empty string
            2-2	empty string
            3-3	empty string
            4-4	empty string
            5-5	empty string
            6-6	empty string
            7-7	empty string
            8-8	empty string
            9-9	empty string
            10-10	empty string
            11-11	empty string
            12-12	empty string
            13-13	empty string
            14-14	empty string
            15-15	empty string
            16-16	empty string
            17-17	empty string
            18-18	empty string
            
            
            No match/invalid:

            1.gif

            [A-Z]{}
            goo A greAS gir PE
            

            https://regex101.com/r/CtqQ0D/1

            /
            [A-Z]{}
            /
            gm
            Match a single character present in the list below [A-Z]
            A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
            {}
             matches the characters {} literally (case sensitive)
            Global pattern flags 
            g modifier: global. All matches (don't return after first match)
            m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
            
            Your regular expression does not match the subject string.
            
            5 matches:

            2.gif

            [A-Z]{1}
            goo A greAS gir PE
            

            https://regex101.com/r/MImsNL/1

            /
            [A-Z]{1}
            /
            gm
            Match a single character present in the list below [A-Z]
            {1} matches the previous token exactly one time (meaningless quantifier)
            A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
            Global pattern flags 
            g modifier: global. All matches (don't return after first match)
            m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
            
            
            4-5	A
            9-10	A
            10-11	S
            16-17	P
            17-18	E
            
            2 matches:

            3.gif

            [A-Z]{2}
            goo A greAS gir PE
            

            https://regex101.com/r/p1WOWQ/1

            /
            [A-Z]{2}
            /
            gm
            Match a single character present in the list below [A-Z]
            {2} matches the previous token exactly 2 times
            A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
            Global pattern flags 
            g modifier: global. All matches (don't return after first match)
            m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
            
            
            9-11	AS
            16-18	PE
            
            1 Reply Last reply Reply Quote 0
            2 out of 5
            • First post
              2/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors