Replace character in capture group
-
@Terry-R said in Replace character in capture group:
Unfortunately some of your guesses aren’t quite right.
Figured that would turn out to be the case. :-)
Might I suggest you plug this into the website:
https://regex101.com/I have fairly often used that site — in fact, I brought up the subject of my mixed successes with it in my first post for this topic thread — and concur that it’s often helpful and informative, but sometimes frustrating, at least for an amateur whose ambitions often exceed his understanding and abilities, like me. For the regex operations we’re discussing in this thread, Regex101 seems not very helpful at all with the substitution expressions. If I plug @guy038’s original suggested expressions (in response to my first post) into Regex101:
FIND:
^"\d+-|\G([\u\l]+\d+)((-)|")
REPLACE:
?1\1?3\x20
…I have to change
[\u\l]
to something else like[[:alpha:]]
because PCRE via Regex101 apparently doesn’t recognize the former. And used there, the substitution expression results in:?1?3 ?1word1?3 ?1word2?3 ?1word3?3 ?1?3 ?1word4?3 ?1word5?3 ?1?3 ?1word6?3 ?1word7?3 ?1word8?3 ?1word9?3 ?1word10?3
I don’t know if there are other ways of expressing it that are Regex101/PCRE-friendly.
@guy038 said in Replace character in capture group:
First I would like to apologize !
No apologies necessary! You’re way better at this than I am, and I appreciate your help (and everyone else’s)!
So the
\G
assertion, which searches from the location of the end of the previous match, is not needed at all !Noted, and thanks for all the detailed explanations.
Now, we just have to run this trivial regex S/R, to change any dash, between the forename and the name, with a space character
I’m afraid that would be a less-than-ideal solution, but I think it’s my own fault for neglecting to provide adequate examples and explanation. In the fictitious example HTML code I provided, all the contributors had only first and last names, but of course in real life some people get referred to using three or more names — John David Hatch, Mary Anne Perry, etc. I was specifically trying to adapt your regex search/replace methods in
^"\d+-|\G([\u\l]+\d+)((-)|")
and?1\1?3\x20
to use with my made-up HTML, and would want it to also work if any persons had three or more names. Also, I assume that if I ever actually needed to operate on HTML similar to my example code, there might also be other hyphens, outside of the blocks of code I’d be targeting for manipulation, that need to be left alone. Again, I failed to mention these possibilities in my posts, even though I had them in my mind, and I apologize.I have consulted that site on occasion as well.
Trying a modified tactic now… My data to be manipulated:
<p class="credits"><span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>
The difference between the HTML immediately above and that which I’d posted here is that now there are two names/hyperlinks after “Writer”, so I’m looking to make this step of regex break the credit role/name(s) into one line per set, whether or not there are multiple names/hyperlinks given for a credit role.
FIND:
(?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)
REPLACE:
(?1\t\1)(?2\2\r\n\t\1)(?3\2)
Desired result:
<span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span> <span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span> <span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span>
Actual result:
<span class="contributors">Writer – <a href="/contribs/001-John-Doe">J. Doe</a>, <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span><span class="contributors">Producer – <a href="/contribs/002-Timothy-Smith">T. Smith</a></span><span class="contributors">Director – <a href="/contribs/003-Jane-Johnson">J. Johnson</a></span></p>
Looks like in both NPP and Regex101, only the first alternation expression
<p class="credits">(<span class="contributors">)
matches anything. No idea why the other two won’t. I can match any of them separately, but not as other than a first alternation expression.If I had gotten this to work, my next, separate regex step would be to try to get to this:
John Doe, Jane Johnson — writer Timothy Smith — producer Jane Johnson — director
-
Ok, so it looks like I can use:
(?:<p class="credits">(<span class="contributors">)|(<\/span>)<span class="contributors">|<\/span><\/p>)
…but not:
(?:<p class="credits">(<span class="contributors">)|(<\/span>)\1|\2<\/p>)
…so I think I’ve learned that numbered backreferences used in alternation sequences are unique for each sequence. That wasn’t clear to me from the online docs for NPP and Boost Perl Regular Expression Syntax 1.70.0, but I guess makes sense now that I think about it. :-)
-
@M-Andre-Z-Eckenrode said in Replace character in capture group:
…but not:
Not 100% sure because I haven’t followed the preceding in a super-detailed fashion, but maybe what you’re looking for is called a “subroutine call” and not a “backreference”?
The syntactical difference is:
\1
🡢 backreference(?1)
-> subroutine
See more in this excellent posting: https://community.notepad-plus-plus.org/post/56447
If I’m totally off-base, well, at least the “excellent posting” reference contains some otherwise good stuff. :-)
-
@Alan-Kilborn said in Replace character in capture group:
maybe what you’re looking for is called a “subroutine call” and not a “backreference”?
See more in this excellent posting:I don’t THINK I’m confusing the two — I’m actually trying to utilize both — though considering my track record with this particular excercise, it wouldn’t come as a complete shock to learn otherwise. But thanks in any case for the link to that truly informative post. I think I could, however, benefit from many working examples of usage in various situations.
As far as named capture groups go, I can’t get any of the syntaxes listed in the post and the online NPP doc to actually work in NPP. For example, given text
ABCDEFGHIJKLMNOPQRSTUVWXYZ
, and search expressionABC(?<Name>.+?)XYZ
, I get the following:Replacement Expression Result ------------------------------------------ \g<Name> = g<Name> \g'Name' = g'Name' \g{Name} = g{Name}
Equivalent results using
\k
. Do any of these actually work for anybody else? -
@M-Andre-Z-Eckenrode said in Replace character in capture group:
I can’t get any of the syntaxes
If I use this as the replace-with expression for your search-for expression and data:
find:
ABC(?<Name>.+?)XYZ
repl:abc_$+{Name}_xyz
data to search:ABCDEFGHIJKLMNOPQRSTUVWXYZ
I obtain:
abc_DEFGHIJKLMNOPQRSTUVW_xyz
I tell you that because you were asking about “replacement expression”.
However, your examples show you were trying to use
\g
which I believe only works in the find expression. Example:find:
(?<Name>t...)ING\g<Name>
which would match:
data to search:
testINGtest
ortestINGtrip
A similar but distinctly different example:
find:
(?<Name>t...)ING(?&Name)
which would match:
data to search:
testINGtest
ortripINGtrip
but nottestINGtrip
-
I can’t get any of the syntaxes listed … Replacement Expression
@Alan-Kilborn said in Replace character in capture group:
I believe only works in the find expression
You are correct.
And you weren’t the first person this week to not notice that the
\g
and\k
syntaxes are in the search section, and not in the replacement section (which tried to be explicit that any syntax not mentioned in the replacement section was not valid in the replacement field, but has apparently failed).Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear?
—
Note to future readers: those “phrasing” links are to a temporary branch, and in the future, they will not work. https://npp-user-manual.org/docs/searching/ is the official location of the search documentation, and https://github.com/notepad-plus-plus/npp-usermanual/blob/master/content/docs/searching.md is the master github source for the document. -
@Alan-Kilborn said in Replace character in capture group:
repl:
abc_$+{Name}_xyz
your examples show you were trying to use\g
which I believe only works in the find expression.Aha! Looks that’s true in NPP — though
\g<Name>
actually DOES work in PCRE replacement expressions at Regex101.Thanks for the education.
-
DO NOT rely on regex101 for the more esoteric aspects of regex. Doing so, and then intending to use the results in Notepad++ will cause frustration. Sure, okay, for simple cases, but the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101.
-
@PeterJones said in Replace character in capture group:
Could you both look at the proposed capture groups and backreferences phrasing and substitution phrasing , and make sure that the updated sections makes the distinction more clear?
Looks good to me so far, though coming from a fairly green regex user like me, I’d take that with a grain of salt. :-)
On a tangent here, I’ve noticed, on occasion when doing find/replace operations, that the
In selection
checkbox was sometimes ghosted (not available to check or uncheck), which I keep meaning to compile a list of circumstances for presentation and inquiry in these forums sometime. I notice that in both official and proposed versions of the doc, there seems to be no mention of any limitations on when theIn selection
checkbox is available. There seem to be some known limitations (at least one of which is mentioned here). Maybe they should be added to the docs? -
@Alan-Kilborn said in Replace character in capture group:
the caliber of stuff you have been discussing in this thread is going to be different in N++ and regex101.
I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case.
-
@M-Andre-Z-Eckenrode said in Replace character in capture group:
I think I’ve already made it fairly clear, in my previous posts to this thread, that that’s what I’m finding to be the case.
Perhaps, but I get the feeling you might be holding on to regex101 a bit much. :-)
Plus, I’m kind of a late joiner to this thread; there’s a lot of content.
-
@M-Andre-Z-Eckenrode said in Replace character in capture group:
In selection checkbox was sometimes ghosted
In selection checkbox enabled condition: A single selection of one or more characters, that is NOT a column block selection.
Note that the checkbox’s appearance status can only be relied upon when you actually switch input focus to the find (family) window – upon activation the code runs a check to make sure you have the proper type of selection, and updates the checkbox and its state at that time.
-
@M-Andre-Z-Eckenrode said in Replace character in capture group:
Looks good to me so far
Thanks. Submitted PR #127. Hopefully, it will make it in before the next release of the npp-user-manual.org website.
-
@PeterJones said in Replace character in capture group:
Looks good to me so far
Looked fine to me as well.
Thanks for your fine attention to the manual.
I just need to read it more when I have trouble with things. :-) -
Hello, @peterjones,
Sorry, I’ve just seen your post where you asked people to verify the N++ official documentation ! I’ll try to have a look, myself, very soon. It would be better to do it before the next release of the website !
But, as I said to Alan, at the moment, my TO DO list, concerning N++ or else, is getting much longer ;-))
Cheers,
guy038