Replace character in capture group
- 
 Hi, @m-andre-z-eckenrode, @ekopalypse, @terry-r and All, First I would like to apologize ! Indeed, in the example of your previous post, the different parts to search are consecutive. So the \Gassertion ,which searches from the location of the end of the previous match, is not needed at all !So my previous S/R is : SEARCH ^"\d+-|([\u\l]+\d+)((-)|")REPLACE ?1\1?3\x20
 Now, in your recent example, the general idea is to match a complete range <span......</span>and to only extract pertinent parts that you want to keep in replacement and re-order them as you like !I will use the free-spacing mode ( (?x)) which generally helps to better understand complicated regexes . In this mode, the regex can be split over several lines.- 
Any line can be commented after a #symbol. To search for a literal#just escape it\#
- 
Any space symbol is irrelevant so use the syntaxes \x20,[ ]or escape it with the\symbol to search for a space char
 
 Before, Just an example to grasp the nuance between greedy and lazy quantifiers : Let’s suppose the regex ** ABC.+XYZ*, with the lazy quantifer+, against the string 67890ABC123451234512345XYZ678906789067890XYZ12345 => It catches the string
 ABC123451234512345XYZ678906789067890XYZ, so the greatest non-null range of chars between the stringsABCandXYZNow, if we add a question mark right after the sign +, we get the regexABC.+?XYZ, with the lazy quantifier+?. Thus, it would only match the string ABC123451234512345XYZ which is the smallest non-null range of chars between the stringsABCandXYZ
 OK. So, the search regex can be written according to this form : (?x) # FREE-SPACING mode (?-s) # A DOT matches a SINGLE STANDARD char ( Not EOL chars ) <span\x20class="contributors"> # LITERAL string span class="contributors"> ( # START of CAPTURING group 1 ( the PROFESSION ) .+? # SMALLEST NON-NULL range of STANDARDS characters... till the string \x20–\x20<a ) # END of CAPTURING group 1 \x20–\x20<a # LITERAL string SPACE + EN-Dash \x{2013} + SPACE + "<a" s .+? # SMALLEST NON-NULL range of STANDARDS characters... till a DASH punctuation sign - # The LITTERAL DASH punctuation sign ( # START of CAPTURING group 2 ( the COMPLETE name ) .+? # SMALLEST NON-NULL range of STANDARDS characters... till the string "> ) # END of CAPTURING group 2 "> # LITERAL string "> .+? # SMALLEST NON-NULL range of STANDARDS characters... till the string </span> </span> # LITERAL string </span>And written in a single line, it becomes : SEARCH (?x-s)<span\x20class="contributors">(.+?)\x20–\x20<a.+?-(.+?)">.+?</span>Unfortunately, this free-spacing mode is not available for the replacement regex syntax. So we still need to write : REPLACEMENT \2 — \1\r\nwhich can be decomposed as :\2 = The COMPLETE name ( Group 2 ) — = A SPACE char + a EM DASH char \x{2014} + a SPACE \1 = The PROFESSION ( Group 1 ) \r\n = A LINE-BREAKSo, from your initial text : <span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>After running the regex S/R, we get : John-Doe — Writer Timothy-Smith — Producer Jane-Johnson — DirectorNow, we just have to run this trivial regex S/R, to change any dash, between the forename and the name, with a space character SEARCH -REPLACE \x20Here is your expected text : John Doe — Writer Timothy Smith — Producer Jane Johnson — Director
 Now, in order to be fluent in regex matters, I’d like to advise you not to fixate on these ready-made regex examples from this forum and, instead, to start the "b-a-ba"with this excellent tutorial on regular expressions ( the reference !)https://www.regular-expressions.info/ You’ll probably need half a month to be acquainted with and, let’s say, four months to build up correct regexes, for a specificneed, in a few minutes ! But it’s really worth it ;-)) Best Regards, guy038 
- 
- 
 @Terry-R said in Replace character in capture group: Unfortunately some of your guesses aren’t quite right. Figured that would turn out to be the case. :-) Might I suggest you plug this into the website: 
 https://regex101.com/I have fairly often used that site — in fact, I brought up the subject of my mixed successes with it in my first post for this topic thread — and concur that it’s often helpful and informative, but sometimes frustrating, at least for an amateur whose ambitions often exceed his understanding and abilities, like me. For the regex operations we’re discussing in this thread, Regex101 seems not very helpful at all with the substitution expressions. If I plug @guy038’s original suggested expressions (in response to my first post) into Regex101: FIND: ^"\d+-|\G([\u\l]+\d+)((-)|")REPLACE: ?1\1?3\x20…I have to change [\u\l]to something else like[[:alpha:]]because PCRE via Regex101 apparently doesn’t recognize the former. And used there, the substitution expression results in:?1?3 ?1word1?3 ?1word2?3 ?1word3?3 ?1?3 ?1word4?3 ?1word5?3 ?1?3 ?1word6?3 ?1word7?3 ?1word8?3 ?1word9?3 ?1word10?3I don’t know if there are other ways of expressing it that are Regex101/PCRE-friendly. @guy038 said in Replace character in capture group: First I would like to apologize ! No apologies necessary! You’re way better at this than I am, and I appreciate your help (and everyone else’s)! So the \Gassertion, which searches from the location of the end of the previous match, is not needed at all !Noted, and thanks for all the detailed explanations. Now, we just have to run this trivial regex S/R, to change any dash, between the forename and the name, with a space character I’m afraid that would be a less-than-ideal solution, but I think it’s my own fault for neglecting to provide adequate examples and explanation. In the fictitious example HTML code I provided, all the contributors had only first and last names, but of course in real life some people get referred to using three or more names — John David Hatch, Mary Anne Perry, etc. I was specifically trying to adapt your regex search/replace methods in ^"\d+-|\G([\u\l]+\d+)((-)|")and?1\1?3\x20to use with my made-up HTML, and would want it to also work if any persons had three or more names. Also, I assume that if I ever actually needed to operate on HTML similar to my example code, there might also be other hyphens, outside of the blocks of code I’d be targeting for manipulation, that need to be left alone. Again, I failed to mention these possibilities in my posts, even though I had them in my mind, and I apologize.I have consulted that site on occasion as well. Trying a modified tactic now… My data to be manipulated: <p class="credits"><span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>The difference between the HTML immediately above and that which I’d posted here is that now there are two names/hyperlinks after “Writer”, so I’m looking to make this step of regex break the credit role/name(s) into one line per set, whether or not there are multiple names/hyperlinks given for a credit role. FIND: (?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)REPLACE: (?1\t\1)(?2\2\r\n\t\1)(?3\2)Desired result: <span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span> <span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span> <span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>Actual result: <span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>Looks like in both NPP and Regex101, only the first alternation expression <p class="credits">(<span class="contributors">)matches anything. No idea why the other two won’t. I can match any of them separately, but not as other than a first alternation expression.If I had gotten this to work, my next, separate regex step would be to try to get to this: John Doe, Jane Johnson — writer Timothy Smith — producer Jane Johnson — director
- 
 Ok, so it looks like I can use: (?:<p class="credits">(<span class="contributors">)|(<\/span>)<span class="contributors">|<\/span><\/p>)…but not: (?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)…so I think I’ve learned that numbered backreferences used in alternation sequences are unique for each sequence. That wasn’t clear to me from the online docs for NPP and Boost Perl Regular Expression Syntax 1.70.0, but I guess makes sense now that I think about it. :-) 
- 
 @M-Andre-Z-Eckenrode said in Replace character in capture group: …but not: Not 100% sure because I haven’t followed the preceding in a super-detailed fashion, but maybe what you’re looking for is called a “subroutine call” and not a “backreference”? The syntactical difference is: - \1🡢 backreference
- (?1)-> subroutine
 See more in this excellent posting: https://community.notepad-plus-plus.org/post/56447 If I’m totally off-base, well, at least the “excellent posting” reference contains some otherwise good stuff. :-) 
- 
 @Alan-Kilborn said in Replace character in capture group: maybe what you’re looking for is called a “subroutine call” and not a “backreference”? 
 See more in this excellent posting:I don’t THINK I’m confusing the two — I’m actually trying to utilize both — though considering my track record with this particular excercise, it wouldn’t come as a complete shock to learn otherwise. But thanks in any case for the link to that truly informative post. I think I could, however, benefit from many working examples of usage in various situations. As far as named capture groups go, I can’t get any of the syntaxes listed in the post and the online NPP doc to actually work in NPP. For example, given text ABCDEFGHIJKLMNOPQRSTUVWXYZ, and search expressionABC(?<Name>.+?)XYZ, I get the following:Replacement Expression Result ------------------------------------------ \g<Name> = g<Name> \g'Name' = g'Name' \g{Name} = g{Name}Equivalent results using \k. Do any of these actually work for anybody else?
- 
 @M-Andre-Z-Eckenrode said in Replace character in capture group: I can’t get any of the syntaxes If I use this as the replace-with expression for your search-for expression and data: find: ABC(?<Name>.+?)XYZ
 repl:abc_$+{Name}_xyz
 data to search:ABCDEFGHIJKLMNOPQRSTUVWXYZI obtain: abc_DEFGHIJKLMNOPQRSTUVW_xyzI tell you that because you were asking about “replacement expression”. However, your examples show you were trying to use \gwhich I believe only works in the find expression. Example:find: (?<Name>t...)ING\g<Name>which would match: data to search: testINGtestortestINGtripA similar but distinctly different example: find: (?<Name>t...)ING(?&Name)which would match: data to search: testINGtestortripINGtripbut nottestINGtrip
- 
 I can’t get any of the syntaxes listed … Replacement Expression @Alan-Kilborn said in Replace character in capture group: I believe only works in the find expression You are correct. And you weren’t the first person this week to not notice that the \gand\ksyntaxes are in the search section, and not in the replacement section (which tried to be explicit that any syntax not mentioned in the replacement section was not valid in the replacement field, but has apparently failed).Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear? — 
 Note to future readers: those “phrasing” links are to a temporary branch, and in the future, they will not work. https://npp-user-manual.org/docs/searching/ is the official location of the search documentation, and https://github.com/notepad-plus-plus/npp-usermanual/blob/master/content/docs/searching.md is the master github source for the document.
- 
 @Alan-Kilborn said in Replace character in capture group: repl: abc_$+{Name}_xyz
 your examples show you were trying to use\gwhich I believe only works in the find expression.Aha! Looks that’s true in NPP — though \g<Name>actually DOES work in PCRE replacement expressions at Regex101.Thanks for the education. 
- 
 DO NOT rely on regex101 for the more esoteric aspects of regex. Doing so, and then intending to use the results in Notepad++ will cause frustration. Sure, okay, for simple cases, but the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101. 
- 
 @PeterJones said in Replace character in capture group: Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear? Looks good to me so far, though coming from a fairly green regex user like me, I’d take that with a grain of salt. :-) On a tangent here, I’ve noticed, on occasion when doing find/replace operations, that the In selectioncheckbox was sometimes ghosted (not available to check or uncheck), which I keep meaning to compile a list of circumstances for presentation and inquiry in these forums sometime. I notice that in both official and proposed versions of the doc, there seems to be no mention of any limitations on when theIn selectioncheckbox is available. There seem to be some known limitations (at least one of which is mentioned here). Maybe they should be added to the docs?
- 
 @Alan-Kilborn said in Replace character in capture group: the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101. I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case. 
- 
 @M-Andre-Z-Eckenrode said in Replace character in capture group: I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case. Perhaps, but I get the feeling you might be holding on to regex101 a bit much. :-) Plus, I’m kind of a late joiner to this thread; there’s a lot of content. 
- 
 @M-Andre-Z-Eckenrode said in Replace character in capture group: In selection checkbox was sometimes ghosted In selection checkbox enabled condition: A single selection of one or more characters, that is NOT a column block selection. Note that the checkbox’s appearance status can only be relied upon when you actually switch input focus to the find (family) window – upon activation the code runs a check to make sure you have the proper type of selection, and updates the checkbox and its state at that time. 
- 
 @M-Andre-Z-Eckenrode said in Replace character in capture group: Looks good to me so far Thanks. Submitted PR #127. Hopefully, it will make it in before the next release of the npp-user-manual.org website. 
- 
 @PeterJones said in Replace character in capture group: Looks good to me so far Looked fine to me as well. 
 Thanks for your fine attention to the manual.
 I just need to read it more when I have trouble with things. :-)
- 
 Hello, @peterjones, Sorry, I’ve just seen your post where you asked people to verify the N++ official documentation ! I’ll try to have a look, myself, very soon. It would be better to do it before the next release of the website ! But, as I said to Alan, at the moment, my TO DO list, concerning N++ or else, is getting much longer ;-)) Cheers, guy038 


