Community
    • Login

    Replace character in capture group

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    28 Posts 7 Posters 4.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi, @m-andre-z-eckenrode, @ekopalypse, @terry-r and All,

      First I would like to apologize ! Indeed, in the example of your previous post, the different parts to search are consecutive. So the \G assertion ,which searches from the location of the end of the previous match, is not needed at all !

      So my previous S/R is :

      SEARCH ^"\d+-|([\u\l]+\d+)((-)|")

      REPLACE ?1\1?3\x20


      Now, in your recent example, the general idea is to match a complete range <span......</span> and to only extract pertinent parts that you want to keep in replacement and re-order them as you like !

      I will use the free-spacing mode ( (?x) ) which generally helps to better understand complicated regexes . In this mode, the regex can be split over several lines.

      • Any line can be commented after a # symbol. To search for a literal # just escape it \#

      • Any space symbol is irrelevant so use the syntaxes \x20, [ ] or escape it with the \ symbol to search for a space char


      Before, Just an example to grasp the nuance between greedy and lazy quantifiers :

      Let’s suppose the regex **ABC.+XYZ*, with the lazy quantifer +, against the string 67890ABC123451234512345XYZ678906789067890XYZ12345 => It catches the string
      ABC123451234512345XYZ678906789067890XYZ, so the greatest non-null range of chars between the strings ABC and XYZ

      Now, if we add a question mark right after the sign +, we get the regex ABC.+?XYZ, with the lazy quantifier +?. Thus, it would only match the string ABC123451234512345XYZ which is the smallest non-null range of chars between the strings ABC and XYZ


      OK. So, the search regex can be written according to this form :

      (?x)                              #  FREE-SPACING mode
      (?-s)                             #  A DOT matches a SINGLE STANDARD char ( Not EOL chars )
        <span\x20class="contributors">  #    LITERAL string  span class="contributors">
      (                                 #  START of CAPTURING group 1 ( the PROFESSION )
        .+?                             #    SMALLEST NON-NULL range of STANDARDS characters... till the string \x20–\x20<a
      )                                 #  END of CAPTURING group 1
      \x20–\x20<a                       #  LITERAL string SPACE + EN-Dash \x{2013} + SPACE + "<a" s
      .+?                               #  SMALLEST NON-NULL range of STANDARDS characters... till a DASH punctuation sign
      -                                 #  The LITTERAL DASH punctuation sign
      (                                 #  START of CAPTURING group 2 ( the COMPLETE name )
        .+?                             #    SMALLEST NON-NULL range of STANDARDS characters... till the string ">
      )                                 #    END of CAPTURING group 2
      ">                                #  LITERAL string  ">
      .+?                               #  SMALLEST NON-NULL range of STANDARDS characters... till the string </span>
      </span>                           #  LITERAL string </span>
      

      And written in a single line, it becomes :

      SEARCH (?x-s)<span\x20class="contributors">(.+?)\x20–\x20<a.+?-(.+?)">.+?</span>

      Unfortunately, this free-spacing mode is not available for the replacement regex syntax. So we still need to write :

      REPLACEMENT \2 — \1\r\n    which can be decomposed as :

      \2        = The COMPLETE name ( Group 2 )
       —        = A SPACE char + a EM DASH char \x{2014} + a SPACE
      \1        = The PROFESSION  ( Group 1 )
      \r\n      = A LINE-BREAK
      

      So, from your initial text :

      <span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>
      

      After running the regex S/R, we get :

      John-Doe — Writer
      Timothy-Smith — Producer
      Jane-Johnson — Director
      

      Now, we just have to run this trivial regex S/R, to change any dash, between the forename and the name, with a space character

      SEARCH -

      REPLACE \x20

      Here is your expected text :

      John Doe — Writer
      Timothy Smith — Producer
      Jane Johnson — Director
      

      Now, in order to be fluent in regex matters, I’d like to advise you not to fixate on these ready-made regex examples from this forum and, instead, to start the "b-a-ba" with this excellent tutorial on regular expressions ( the reference !)

      https://www.regular-expressions.info/

      You’ll probably need half a month to be acquainted with and, let’s say, four months to build up correct regexes, for a specificneed, in a few minutes ! But it’s really worth it ;-))

      Best Regards,

      guy038

      M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 2
      • M Andre Z EckenrodeM
        M Andre Z Eckenrode @guy038
        last edited by

        @Terry-R said in Replace character in capture group:

        Unfortunately some of your guesses aren’t quite right.

        Figured that would turn out to be the case. :-)

        Might I suggest you plug this into the website:
        https://regex101.com/

        I have fairly often used that site — in fact, I brought up the subject of my mixed successes with it in my first post for this topic thread — and concur that it’s often helpful and informative, but sometimes frustrating, at least for an amateur whose ambitions often exceed his understanding and abilities, like me. For the regex operations we’re discussing in this thread, Regex101 seems not very helpful at all with the substitution expressions. If I plug @guy038’s original suggested expressions (in response to my first post) into Regex101:

        FIND: ^"\d+-|\G([\u\l]+\d+)((-)|")

        REPLACE: ?1\1?3\x20

        …I have to change [\u\l] to something else like [[:alpha:]] because PCRE via Regex101 apparently doesn’t recognize the former. And used there, the substitution expression results in:

        ?1?3 ?1word1?3 ?1word2?3 ?1word3?3 
        ?1?3 ?1word4?3 ?1word5?3 
        ?1?3 ?1word6?3 ?1word7?3 ?1word8?3 ?1word9?3 ?1word10?3 
        

        I don’t know if there are other ways of expressing it that are Regex101/PCRE-friendly.

        @guy038 said in Replace character in capture group:

        First I would like to apologize !

        No apologies necessary! You’re way better at this than I am, and I appreciate your help (and everyone else’s)!

        So the \G assertion, which searches from the location of the end of the previous match, is not needed at all !

        Noted, and thanks for all the detailed explanations.

        Now, we just have to run this trivial regex S/R, to change any dash, between the forename and the name, with a space character

        I’m afraid that would be a less-than-ideal solution, but I think it’s my own fault for neglecting to provide adequate examples and explanation. In the fictitious example HTML code I provided, all the contributors had only first and last names, but of course in real life some people get referred to using three or more names — John David Hatch, Mary Anne Perry, etc. I was specifically trying to adapt your regex search/replace methods in ^"\d+-|\G([\u\l]+\d+)((-)|") and ?1\1?3\x20 to use with my made-up HTML, and would want it to also work if any persons had three or more names. Also, I assume that if I ever actually needed to operate on HTML similar to my example code, there might also be other hyphens, outside of the blocks of code I’d be targeting for manipulation, that need to be left alone. Again, I failed to mention these possibilities in my posts, even though I had them in my mind, and I apologize.

        https://www.regular-expressions.info/

        I have consulted that site on occasion as well.

        Trying a modified tactic now… My data to be manipulated:

        <p class="credits"><span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>
        

        The difference between the HTML immediately above and that which I’d posted here is that now there are two names/hyperlinks after “Writer”, so I’m looking to make this step of regex break the credit role/name(s) into one line per set, whether or not there are multiple names/hyperlinks given for a credit role.

        FIND: (?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)

        REPLACE: (?1\t\1)(?2\2\r\n\t\1)(?3\2)

        Desired result:

        	<span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>
        	<span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span>
        	<span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>
        

        Actual result:

        	<span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>
        

        Looks like in both NPP and Regex101, only the first alternation expression <p class="credits">(<span class="contributors">) matches anything. No idea why the other two won’t. I can match any of them separately, but not as other than a first alternation expression.

        If I had gotten this to work, my next, separate regex step would be to try to get to this:

        	John Doe, Jane Johnson — writer
        	Timothy Smith — producer
        	Jane Johnson — director
        
        M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 1
        • M Andre Z EckenrodeM
          M Andre Z Eckenrode @M Andre Z Eckenrode
          last edited by

          Ok, so it looks like I can use:

          (?:<p class="credits">(<span class="contributors">)|(<\/span>)<span class="contributors">|<\/span><\/p>)

          …but not:

          (?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)

          …so I think I’ve learned that numbered backreferences used in alternation sequences are unique for each sequence. That wasn’t clear to me from the online docs for NPP and Boost Perl Regular Expression Syntax 1.70.0, but I guess makes sense now that I think about it. :-)

          1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn
            last edited by

            @M-Andre-Z-Eckenrode said in Replace character in capture group:

            …but not:

            Not 100% sure because I haven’t followed the preceding in a super-detailed fashion, but maybe what you’re looking for is called a “subroutine call” and not a “backreference”?

            The syntactical difference is:

            • \1 🡢 backreference
            • (?1) -> subroutine

            See more in this excellent posting: https://community.notepad-plus-plus.org/post/56447

            If I’m totally off-base, well, at least the “excellent posting” reference contains some otherwise good stuff. :-)

            M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 2
            • M Andre Z EckenrodeM
              M Andre Z Eckenrode @Alan Kilborn
              last edited by

              @Alan-Kilborn said in Replace character in capture group:

              maybe what you’re looking for is called a “subroutine call” and not a “backreference”?
              See more in this excellent posting:

              I don’t THINK I’m confusing the two — I’m actually trying to utilize both — though considering my track record with this particular excercise, it wouldn’t come as a complete shock to learn otherwise. But thanks in any case for the link to that truly informative post. I think I could, however, benefit from many working examples of usage in various situations.

              As far as named capture groups go, I can’t get any of the syntaxes listed in the post and the online NPP doc to actually work in NPP. For example, given text ABCDEFGHIJKLMNOPQRSTUVWXYZ, and search expression ABC(?<Name>.+?)XYZ, I get the following:

              Replacement Expression		Result
              ------------------------------------------
              \g<Name>             	=	g<Name>
              \g'Name'             	=	g'Name'
              \g{Name}             	=	g{Name}
              

              Equivalent results using \k. Do any of these actually work for anybody else?

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @M Andre Z Eckenrode
                last edited by Alan Kilborn

                @M-Andre-Z-Eckenrode said in Replace character in capture group:

                I can’t get any of the syntaxes

                If I use this as the replace-with expression for your search-for expression and data:

                find: ABC(?<Name>.+?)XYZ
                repl: abc_$+{Name}_xyz
                data to search: ABCDEFGHIJKLMNOPQRSTUVWXYZ

                I obtain:

                abc_DEFGHIJKLMNOPQRSTUVW_xyz

                I tell you that because you were asking about “replacement expression”.

                However, your examples show you were trying to use \g which I believe only works in the find expression. Example:

                find: (?<Name>t...)ING\g<Name>

                which would match:

                data to search: testINGtest or testINGtrip

                A similar but distinctly different example:

                find: (?<Name>t...)ING(?&Name)

                which would match:

                data to search: testINGtest or tripINGtrip but not testINGtrip

                PeterJonesP M Andre Z EckenrodeM 2 Replies Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones @Alan Kilborn
                  last edited by

                  @M-Andre-Z-Eckenrode ,

                  I can’t get any of the syntaxes listed … Replacement Expression

                  @Alan-Kilborn said in Replace character in capture group:

                  I believe only works in the find expression

                  You are correct.

                  And you weren’t the first person this week to not notice that the \g and \k syntaxes are in the search section, and not in the replacement section (which tried to be explicit that any syntax not mentioned in the replacement section was not valid in the replacement field, but has apparently failed).

                  Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear?

                  —
                  Note to future readers: those “phrasing” links are to a temporary branch, and in the future, they will not work. https://npp-user-manual.org/docs/searching/ is the official location of the search documentation, and https://github.com/notepad-plus-plus/npp-usermanual/blob/master/content/docs/searching.md is the master github source for the document.

                  M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 0
                  • M Andre Z EckenrodeM
                    M Andre Z Eckenrode @Alan Kilborn
                    last edited by

                    @Alan-Kilborn said in Replace character in capture group:

                    repl: abc_$+{Name}_xyz
                    your examples show you were trying to use \g which I believe only works in the find expression.

                    Aha! Looks that’s true in NPP — though \g<Name> actually DOES work in PCRE replacement expressions at Regex101.

                    Thanks for the education.

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @M Andre Z Eckenrode
                      last edited by

                      @M-Andre-Z-Eckenrode

                      DO NOT rely on regex101 for the more esoteric aspects of regex. Doing so, and then intending to use the results in Notepad++ will cause frustration. Sure, okay, for simple cases, but the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101.

                      M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 1
                      • M Andre Z EckenrodeM
                        M Andre Z Eckenrode @PeterJones
                        last edited by

                        @PeterJones said in Replace character in capture group:

                        Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear?

                        Looks good to me so far, though coming from a fairly green regex user like me, I’d take that with a grain of salt. :-)

                        On a tangent here, I’ve noticed, on occasion when doing find/replace operations, that the In selection checkbox was sometimes ghosted (not available to check or uncheck), which I keep meaning to compile a list of circumstances for presentation and inquiry in these forums sometime. I notice that in both official and proposed versions of the doc, there seems to be no mention of any limitations on when the In selection checkbox is available. There seem to be some known limitations (at least one of which is mentioned here). Maybe they should be added to the docs?

                        Alan KilbornA PeterJonesP 2 Replies Last reply Reply Quote 1
                        • M Andre Z EckenrodeM
                          M Andre Z Eckenrode @Alan Kilborn
                          last edited by

                          @Alan-Kilborn said in Replace character in capture group:

                          the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101.

                          I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case.

                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • Alan KilbornA
                            Alan Kilborn @M Andre Z Eckenrode
                            last edited by

                            @M-Andre-Z-Eckenrode said in Replace character in capture group:

                            I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case.

                            Perhaps, but I get the feeling you might be holding on to regex101 a bit much. :-)

                            Plus, I’m kind of a late joiner to this thread; there’s a lot of content.

                            1 Reply Last reply Reply Quote 0
                            • Alan KilbornA
                              Alan Kilborn @M Andre Z Eckenrode
                              last edited by Alan Kilborn

                              @M-Andre-Z-Eckenrode said in Replace character in capture group:

                              In selection checkbox was sometimes ghosted

                              In selection checkbox enabled condition: A single selection of one or more characters, that is NOT a column block selection.

                              Note that the checkbox’s appearance status can only be relied upon when you actually switch input focus to the find (family) window – upon activation the code runs a check to make sure you have the proper type of selection, and updates the checkbox and its state at that time.

                              1 Reply Last reply Reply Quote 1
                              • PeterJonesP
                                PeterJones @M Andre Z Eckenrode
                                last edited by

                                @M-Andre-Z-Eckenrode said in Replace character in capture group:

                                Looks good to me so far

                                Thanks. Submitted PR #127. Hopefully, it will make it in before the next release of the npp-user-manual.org website.

                                Alan KilbornA 1 Reply Last reply Reply Quote 0
                                • Alan KilbornA
                                  Alan Kilborn @PeterJones
                                  last edited by Alan Kilborn

                                  @PeterJones said in Replace character in capture group:

                                  Looks good to me so far

                                  Looked fine to me as well.
                                  Thanks for your fine attention to the manual.
                                  I just need to read it more when I have trouble with things. :-)

                                  1 Reply Last reply Reply Quote 1
                                  • guy038G
                                    guy038
                                    last edited by

                                    Hello, @peterjones,

                                    Sorry, I’ve just seen your post where you asked people to verify the N++ official documentation ! I’ll try to have a look, myself, very soon. It would be better to do it before the next release of the website !

                                    But, as I said to Alan, at the moment, my TO DO list, concerning N++ or else, is getting much longer ;-))

                                    Cheers,

                                    guy038

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors