• Login
Community
  • Login

Is it possible to replace individual characters inside a regex capture group?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 4 Posters 1.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    pbarney
    last edited by Mar 17, 2023, 6:55 PM

    I’ve searched the forum for other solutions, but they don’t seem to answer the question I have in mind.

    I want to add id attributes to all HTML header elements (e.g., h1, h2, h3, etc) using the content of each element as the basis for the id attribute, by making them lowercase and replacing spaces with hyphens.

    For example, I simply want to do a replacement from this:

    <h1>Primary Article</h1>
    <h2>Subtopic</h2>
    <h1>A Different Approach to All This</h1>
    

    to this:

    <h1 id="primary-article">Primary Article</h1>
    <h2 id="subtopic">Subtopic</h2>
    <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
    

    This it the farthest I’ve gotten:

    Search: <h(\d)>(.*?)</h\d>
    Replace: <h\1 id="\L\2\E">\2</h\1>

    It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?

    If not, how would you approach this problem?

    T 1 Reply Last reply Mar 17, 2023, 7:33 PM Reply Quote 0
    • T
      Terry R @pbarney
      last edited by Mar 17, 2023, 7:33 PM

      @pbarney said in Is it possible to replace individual characters inside a regex capture group?:

      It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?

      It certainly is possible, just read the FAQ post “Generic Regular Expression (regex) Formulas” and the linked post “Replacing in a specific zone of text”.

      It is best to keep these 2 tasks separate, I would not even attempt to combine the 2 edits.

      Tery

      P 1 Reply Last reply Mar 17, 2023, 7:46 PM Reply Quote 1
      • P
        pbarney @Terry R
        last edited by Mar 17, 2023, 7:46 PM

        Thank you, @Terry-R. I’ll take a look at them now.

        For future searchers, here are links to the two posts Terry-R mentioned:

        • Generic Regular Expression (regex) Formulas
        • Replacing in a specific zone of text

        May I ask why you wouldn’t attempt to combine them?

        T 1 Reply Last reply Mar 17, 2023, 8:25 PM Reply Quote 0
        • T
          Terry R @pbarney
          last edited by Terry R Mar 17, 2023, 8:28 PM Mar 17, 2023, 8:25 PM

          @pbarney
          I’m not actually on a PC to do any testing, however I actually can’t think of a way to do it. Even if it can be done the regex would be horribly complicated and practically unsupportable if changes were required later on.

          Often the easy way is to do the task in several steps.

          Terry

          PS also consider creating a macro which can run several steps so in the end you are just running a single process. Details are referenced in the online manual and lots of posts on this forum.

          1 Reply Last reply Reply Quote 3
          • G
            guy038
            last edited by guy038 Mar 18, 2023, 12:15 AM Mar 17, 2023, 11:53 PM

            Hello, @pbarney, @terry-r and All,

            I agree with @terry-r statement when he said :

            Often the easy way is to do the task in several steps

            Moreover, there is an objective reason why this cannot be done in one go ! Indeed, you have to globally add all id attributes before, individually, replace any space char with a dash character, in the id regions !


            So, given your INPUT text :

            <h1>Primary Article</h1>
            <h2>Subtopic</h2>
            <h1>A Different Approach to All This</h1>
            

            With this first regex S/R, we’ll add the id attributes, from the contents of each element, to any h# header :

            SEARCH (?x-si) ( ^ \x20* < h[1-6] ) (?= > ( .+? ) < )    OR    (?-si)(^\x20*<h[1-6])(?=>(.+?)<)

            REPLACE \1 id="\L\2"

            And you get this temporary text :

            <h1 id="primary article">Primary Article</h1>
            <h2 id="subtopic">Subtopic</h2>
            <h1 id="a different approach to all this">A Different Approach to All This</h1>
            

            Then, use this second regex S/R to replace any space char, in the id region, with a dash character :

            SEARCH (?x-si) (?: ^ \x20* < h[1-6] \x20 | (?! \A ) \G ) (?: (?! > ) . )*? \K \x20    OR    (?-si)(?:^\x20*<h[1-6]\x20|(?!\A)\G)(?:(?!>).)*?\K\x20

            REPLACE -

            And here is your expected OUTPUT text :

            <h1 id="primary-article">Primary Article</h1>
            <h2 id="subtopic">Subtopic</h2>
            <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
            

            Best Regards,

            guy038

            P 1 Reply Last reply Mar 20, 2023, 8:21 PM Reply Quote 5
            • P
              pbarney @guy038
              last edited by Mar 20, 2023, 8:21 PM

              @guy038, Thank you for that masterful example. I’ve seen a few of your posts before, and while the regex is mostly beyond me, I appreciate you offering workable examples.

              If I owned a software development business, I’d hire you as Chief RegEx Officer.

              While my constraints were small, future searchers who end up here might be interested in a regex that could handle <h?> elements that already have existing attributes…

              For example:
              <h1 class="article">Top Level</h1>
              yielding:
              <h1 id="top-level" class="article">Top Level</h1>.

              Would it be difficult to add that to your suggested offerings?

              A 1 Reply Last reply Mar 20, 2023, 8:33 PM Reply Quote 1
              • A
                Alan Kilborn @pbarney
                last edited by Mar 20, 2023, 8:33 PM

                @pbarney said in Is it possible to replace individual characters inside a regex capture group?:

                If I owned a software development business, I’d hire you as Chief RegEx Officer.

                I fully endorse this.

                1 Reply Last reply Reply Quote 4
                1 out of 7
                • First post
                  1/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors