Community
    • Login

    Is it possible to replace individual characters inside a regex capture group?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pbarneyP
      pbarney
      last edited by

      I’ve searched the forum for other solutions, but they don’t seem to answer the question I have in mind.

      I want to add id attributes to all HTML header elements (e.g., h1, h2, h3, etc) using the content of each element as the basis for the id attribute, by making them lowercase and replacing spaces with hyphens.

      For example, I simply want to do a replacement from this:

      <h1>Primary Article</h1>
      <h2>Subtopic</h2>
      <h1>A Different Approach to All This</h1>
      

      to this:

      <h1 id="primary-article">Primary Article</h1>
      <h2 id="subtopic">Subtopic</h2>
      <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
      

      This it the farthest I’ve gotten:

      Search: <h(\d)>(.*?)</h\d>
      Replace: <h\1 id="\L\2\E">\2</h\1>

      It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?

      If not, how would you approach this problem?

      Terry RT 1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @pbarney
        last edited by

        @pbarney said in Is it possible to replace individual characters inside a regex capture group?:

        It does everything I need, except replace the spaces in the capture group with dashes. Is this even possible?

        It certainly is possible, just read the FAQ post “Generic Regular Expression (regex) Formulas” and the linked post “Replacing in a specific zone of text”.

        It is best to keep these 2 tasks separate, I would not even attempt to combine the 2 edits.

        Tery

        pbarneyP 1 Reply Last reply Reply Quote 1
        • pbarneyP
          pbarney @Terry R
          last edited by

          Thank you, @Terry-R. I’ll take a look at them now.

          For future searchers, here are links to the two posts Terry-R mentioned:

          • Generic Regular Expression (regex) Formulas
          • Replacing in a specific zone of text

          May I ask why you wouldn’t attempt to combine them?

          Terry RT 1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R @pbarney
            last edited by Terry R

            @pbarney
            I’m not actually on a PC to do any testing, however I actually can’t think of a way to do it. Even if it can be done the regex would be horribly complicated and practically unsupportable if changes were required later on.

            Often the easy way is to do the task in several steps.

            Terry

            PS also consider creating a macro which can run several steps so in the end you are just running a single process. Details are referenced in the online manual and lots of posts on this forum.

            1 Reply Last reply Reply Quote 3
            • guy038G
              guy038
              last edited by guy038

              Hello, @pbarney, @terry-r and All,

              I agree with @terry-r statement when he said :

              Often the easy way is to do the task in several steps

              Moreover, there is an objective reason why this cannot be done in one go ! Indeed, you have to globally add all id attributes before, individually, replace any space char with a dash character, in the id regions !


              So, given your INPUT text :

              <h1>Primary Article</h1>
              <h2>Subtopic</h2>
              <h1>A Different Approach to All This</h1>
              

              With this first regex S/R, we’ll add the id attributes, from the contents of each element, to any h# header :

              SEARCH (?x-si) ( ^ \x20* < h[1-6] ) (?= > ( .+? ) < )    OR    (?-si)(^\x20*<h[1-6])(?=>(.+?)<)

              REPLACE \1 id="\L\2"

              And you get this temporary text :

              <h1 id="primary article">Primary Article</h1>
              <h2 id="subtopic">Subtopic</h2>
              <h1 id="a different approach to all this">A Different Approach to All This</h1>
              

              Then, use this second regex S/R to replace any space char, in the id region, with a dash character :

              SEARCH (?x-si) (?: ^ \x20* < h[1-6] \x20 | (?! \A ) \G ) (?: (?! > ) . )*? \K \x20    OR    (?-si)(?:^\x20*<h[1-6]\x20|(?!\A)\G)(?:(?!>).)*?\K\x20

              REPLACE -

              And here is your expected OUTPUT text :

              <h1 id="primary-article">Primary Article</h1>
              <h2 id="subtopic">Subtopic</h2>
              <h1 id="a-different-approach-to-all-this">A Different Approach to All This</h1>
              

              Best Regards,

              guy038

              pbarneyP 1 Reply Last reply Reply Quote 5
              • pbarneyP
                pbarney @guy038
                last edited by

                @guy038, Thank you for that masterful example. I’ve seen a few of your posts before, and while the regex is mostly beyond me, I appreciate you offering workable examples.

                If I owned a software development business, I’d hire you as Chief RegEx Officer.

                While my constraints were small, future searchers who end up here might be interested in a regex that could handle <h?> elements that already have existing attributes…

                For example:
                <h1 class="article">Top Level</h1>
                yielding:
                <h1 id="top-level" class="article">Top Level</h1>.

                Would it be difficult to add that to your suggested offerings?

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @pbarney
                  last edited by

                  @pbarney said in Is it possible to replace individual characters inside a regex capture group?:

                  If I owned a software development business, I’d hire you as Chief RegEx Officer.

                  I fully endorse this.

                  1 Reply Last reply Reply Quote 4
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors