Is it possible to replace individual characters inside a regex capture group?
-
I’ve searched the forum for other solutions, but they don’t seem to answer the question I have in mind.
I want to add
id
attributes to all HTML header elements (e.g.,h1
,h2
,h3
, etc) using the content of each element as the basis for theid
attribute, by making them lowercase and replacing spaces with hyphens.For example, I simply want to do a replacement from this:
<h1>Primary Article</h1> <h2>Subtopic</h2> <h1>A Different Approach to All This</h1>
to this:
<h1 id="primary-article">Primary Article</h1> <h2 id="subtopic">Subtopic</h2> <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
This it the farthest I’ve gotten:
Search:
<h(\d)>(.*?)</h\d>
Replace:<h\1 id="\L\2\E">\2</h\1>
It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?
If not, how would you approach this problem?
-
@pbarney said in Is it possible to replace individual characters inside a regex capture group?:
It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?
It certainly is possible, just read the FAQ post “Generic Regular Expression (regex) Formulas” and the linked post “Replacing in a specific zone of text”.
It is best to keep these 2 tasks separate, I would not even attempt to combine the 2 edits.
Tery
-
Thank you, @Terry-R. I’ll take a look at them now.
For future searchers, here are links to the two posts Terry-R mentioned:
May I ask why you wouldn’t attempt to combine them?
-
@pbarney
I’m not actually on a PC to do any testing, however I actually can’t think of a way to do it. Even if it can be done the regex would be horribly complicated and practically unsupportable if changes were required later on.Often the easy way is to do the task in several steps.
Terry
PS also consider creating a macro which can run several steps so in the end you are just running a single process. Details are referenced in the online manual and lots of posts on this forum.
-
Hello, @pbarney, @terry-r and All,
I agree with @terry-r statement when he said :
Often the easy way is to do the task in several steps
Moreover, there is an objective reason why this cannot be done in one go ! Indeed, you have to globally add all
id
attributes before, individually, replace anyspace
char with adash
character, in theid
regions !
So, given your INPUT text :
<h1>Primary Article</h1> <h2>Subtopic</h2> <h1>A Different Approach to All This</h1>
With this first regex S/R, we’ll add the
id
attributes, from the contents of each element, to anyh#
header :SEARCH
(?x-si) ( ^ \x20* < h[1-6] ) (?= > ( .+? ) < )
OR(?-si)(^\x20*<h[1-6])(?=>(.+?)<)
REPLACE
\1 id="\L\2"
And you get this temporary text :
<h1 id="primary article">Primary Article</h1> <h2 id="subtopic">Subtopic</h2> <h1 id="a different approach to all this">A Different Approach to All This</h1>
Then, use this second regex S/R to replace any
space
char, in theid
region, with adash
character :SEARCH
(?x-si) (?: ^ \x20* < h[1-6] \x20 | (?! \A ) \G ) (?: (?! > ) . )*? \K \x20
OR(?-si)(?:^\x20*<h[1-6]\x20|(?!\A)\G)(?:(?!>).)*?\K\x20
REPLACE
-
And here is your expected OUTPUT text :
<h1 id="primary-article">Primary Article</h1> <h2 id="subtopic">Subtopic</h2> <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
Best Regards,
guy038
-
@guy038, Thank you for that masterful example. I’ve seen a few of your posts before, and while the regex is mostly beyond me, I appreciate you offering workable examples.
If I owned a software development business, I’d hire you as Chief RegEx Officer.
While my constraints were small, future searchers who end up here might be interested in a regex that could handle
<h?>
elements that already have existing attributes…For example:
<h1 class="article">Top Level</h1>
yielding:
<h1 id="top-level" class="article">Top Level</h1>
.Would it be difficult to add that to your suggested offerings?
-
@pbarney said in Is it possible to replace individual characters inside a regex capture group?:
If I owned a software development business, I’d hire you as Chief RegEx Officer.
I fully endorse this.