Community
    • Login

    Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1

    Scheduled Pinned Locked Moved General Discussion
    5 Posts 2 Posters 65 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • FraF
      Fra
      last edited by

      I’m trying to simplify my regex for this task:

      I have this input text:

      foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
      foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
      foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
      

      The regex replaces the 2nd occurence of this string ‘foo’:

      foo
      

      with this new string:

      XOO
      

      Currently I’ve used this regex which works:
      Find what (with Regular Expression radio button on):

      ^(.*?foo.*?)foo
      

      Replace with:

      $1XOO
      

      Producing this output:
      https://regex101.com/r/UmF5wv/1

      foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo
      foo tar hash 1 XOO gash gesh 3 foo toly 3 vor foo
      foo sar wash 1 XOO rash nesh 3 foo koly 3 sor foo
      

      If needing the 3rd occurence instead, just adding the 2nd occurence into the group used with the placeholder works this way:

      Input:

      foo bar bash 1 foo dash mesh 3 foo poly 3 for foo
      foo tar hash 1 foo gash gesh 3 foo toly 3 vor foo
      foo sar wash 1 foo rash nesh 3 foo koly 3 sor foo
      

      Output:
      https://regex101.com/r/hmcZ1l/1

      foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo
      foo tar hash 1 foo gash gesh 3 XOO toly 3 vor foo
      foo sar wash 1 foo rash nesh 3 XOO koly 3 sor foo
      

      Find:

      ^(.*?foo.*?foo.*?)foo
      

      Replace:

      $1XOO
      

      My problem is it can become long with many occurences.

      Is there a simpler/shorter way to do it, maybe with some sort of indexing if available?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Fra
        last edited by PeterJones

        @Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

        ^(.*?foo.*?foo.*?)foo

        That regex behaves the same as ^((?:.*?foo){2}.*?)foo

        so if you want to keep the first 5, and do the replacement on the N+1=6th, it would be ^((?:.*?foo){5}.*?)foo

        4720ec42-6c84-4fb9-ad2d-05ca37827ec2-image.png
        =>
        ff24f7d9-5f85-4a76-90c2-7dfece068dd7-image.png

        [URL] regex101.com [.../URL]

        Please note that regex101.com doesn’t use the same regex engine as Notepad++'s Boost regex, so there can sometimes be differences in results. (Not in this instance, but in general, one shouldn’t assume that a regex will work at some website and in some unrelated app unless you know that they use the same engine, or it’s a simple enough one that it’s only using the syntax common to both engines.)

        ----

        Useful References

        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation
        FraF 1 Reply Last reply Reply Quote 1
        • FraF
          Fra @PeterJones
          last edited by

          @PeterJones Nice solution, with minimal editing needed, just what I was looking for!
          Thanks for the refs too, I’ll check them out asap!

          Nice too the indexing starts from zero:

          {0} = 1st occurrence.

          ^((?:.*?foo){0}.*?)foo
          

          Output:

          XOO bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
          

          {1} = 2nd occurrence.

          ^((?:.*?foo){1}.*?)foo
          

          Output:

          foo bar bash 1 XOO dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for foo
          

          {2} = 3rd occurrence.

          ^((?:.*?foo){2}.*?)foo
          

          Output:

          foo bar bash 1 foo dash mesh 3 XOO poly 3 for foo poly 3 for foo poly 3 for foo
          

          {3} = 4th occurrence.

          ^((?:.*?foo){3}.*?)foo
          

          Output:

          foo bar bash 1 foo dash mesh 3 foo poly 3 for XOO poly 3 for foo poly 3 for foo
          

          {4} = 5th occurrence.

          ^((?:.*?foo){4}.*?)foo
          

          Output:

          foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for XOO poly 3 for foo
          

          {5} = 6th occurrence.

          ^((?:.*?foo){5}.*?)foo
          

          Output:

          foo bar bash 1 foo dash mesh 3 foo poly 3 for foo poly 3 for foo poly 3 for XOO
          

          and so on.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Fra
            last edited by

            @Fra said in Replace 2nd occurrence in string per line, then nth occurrence Npp v8.8.1:

            Nice too the indexing starts from zero:

            Personally, I would say that’s a dangerous way to think about it, and it will confuse you as you learn more about Boost regex and capture groups.

            The {ℕ} “multiplying operator”, as described in the documentation, is saying there must be ℕ matches of whatever came before the operator: so, in the case of my regex, it is saying "there must be ℕ occurrences of something followed by foo, and all ℕ of those are put into capture group #1; group#1 must be followed by the (ℕ+1)th occurrence of foo, and it’s only that last that is replaced (because you included $1 in the replacement, which means the contents of group#1.

            The problem with thinking of 0-indexing in this case is because it will confuse you as you learn more about capture groups. Because capture groups are numbered starting at 1, so (a)(b)(c) will put a into group#1, b into group#2, and c in group#3 – a replacement of $3$1$2 would be cab. Further, “group#1” (referenced as $0 in the replacement) is not one of the captured groups, but is, in fact, the entire match, so replacing with $3$1$2//$0 would give cab//abc. Then you’ll get yourself into trouble thinking that it’s 0-based, because it’s really and truly 1-based.

            FraF 1 Reply Last reply Reply Quote 3
            • FraF
              Fra @PeterJones
              last edited by

              @PeterJones thanks a lot for the nuances. Indeed, I first wondered about the difference from the group indexing starting at 1. Then also about the difference from the quantifier ( {n} where n is an integer >= 1 https://www.regular-expressions.info/refquick.html).
              Thanks for the $0 group placeholder mention, I wondered about that too, now I understand what it captures.

              I understand the regex as this:

              Find:

              1. Put everything that preceeds the occurence of interest into a group (1st group referenced by the placeholder with the starting index at 1 ($1) — though there is a placeholder 0 ($0) which references the whole set/string instead of any subgroup of it)).
              2. Exclude the occurence of interest from the that group, but state is a the search delimiter for the regex just outside the group.

              Replace with:

              1. Capture the group with it’s placeholder (make a copy of it and store it: $1 = foo / ^((?:.?foo){0}.?) for the 1st occurence (N+1) with index 0).
              2. Use the 2nd/next occurence as external delimiter reference to stop the regex search at (^((?:.?foo){0}.?)foo).
              3. Then append the new value (XOO) to the copied unchanged group.

              I think I see what you mean when considering there must always be a 2nd /next occurence for the regex to work so it can’t be starting at zero? While in the background the engine uses a zero based indexing for the 1st element of the occurences series.
              0 is the 1st element in the indexes series, 1 is the 2nd and so on.
              While for the groups placeholders, 0 isn’t an ordinal reference, it’s an arbitrary reference to the set. The ordinal reference starting at 1 in this case.

              I need to check the doc and do more practice to get over the confusing parts!

              The quantifier also starting at 1 though index 0 is still valid but return no value (or the whole set but with empty values)?

              For example:

              19 empty string matches:

              0.gif

              [A-Z]{0}
              goo A greAS gir PE
              

              https://regex101.com/r/dYnJmE/1

              /
              [A-Z]{0}
              /
              gm
              Match a single character present in the list below [A-Z]
              {0} matches the previous token exactly zero times (causes token to be ignored)
              A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
              Global pattern flags 
              g modifier: global. All matches (don't return after first match)
              m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
              
              
              0-0	empty string
              1-1	empty string
              2-2	empty string
              3-3	empty string
              4-4	empty string
              5-5	empty string
              6-6	empty string
              7-7	empty string
              8-8	empty string
              9-9	empty string
              10-10	empty string
              11-11	empty string
              12-12	empty string
              13-13	empty string
              14-14	empty string
              15-15	empty string
              16-16	empty string
              17-17	empty string
              18-18	empty string
              
              
              No match/invalid:

              1.gif

              [A-Z]{}
              goo A greAS gir PE
              

              https://regex101.com/r/CtqQ0D/1

              /
              [A-Z]{}
              /
              gm
              Match a single character present in the list below [A-Z]
              A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
              {}
               matches the characters {} literally (case sensitive)
              Global pattern flags 
              g modifier: global. All matches (don't return after first match)
              m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
              
              Your regular expression does not match the subject string.
              
              5 matches:

              2.gif

              [A-Z]{1}
              goo A greAS gir PE
              

              https://regex101.com/r/MImsNL/1

              /
              [A-Z]{1}
              /
              gm
              Match a single character present in the list below [A-Z]
              {1} matches the previous token exactly one time (meaningless quantifier)
              A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
              Global pattern flags 
              g modifier: global. All matches (don't return after first match)
              m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
              
              
              4-5	A
              9-10	A
              10-11	S
              16-17	P
              17-18	E
              
              2 matches:

              3.gif

              [A-Z]{2}
              goo A greAS gir PE
              

              https://regex101.com/r/p1WOWQ/1

              /
              [A-Z]{2}
              /
              gm
              Match a single character present in the list below [A-Z]
              {2} matches the previous token exactly 2 times
              A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
              Global pattern flags 
              g modifier: global. All matches (don't return after first match)
              m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
              
              
              9-11	AS
              16-18	PE
              
              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors