Community
    • Login

    replace words between tags with regular expression

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 4 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by

      good day. I have this html code:

      <My Tag>
      	<div class="searchField">
                  <div align="right">
      
                    <a href="/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
      
      <My Tag>
      

      I want to replace all <a href="/ with <a href="https://link.ca/ between <My Tag><My Tag>

      my solution is , but not very good:

      SEARCH: <My Tag>(?s)(.*)(<a href="/).*?(?s)<My Tag>

      REPLACE BY: <My Tag>\1<a href="https://link.ca/\2<My Tag>

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • caryptC
        carypt
        last edited by

        i see you dont concern the eol (“carriage return” and “new line”) non printable signs in your search expression .(menu-view-show symbol - show all characters).

        i can get the first part til <a href="/ match , so my not clever search would be : (<My Tag>\r\n)(.*\r\n)(.*\r\n)(\r\n)(.*)(<a href="/) and replace : $1$2$3$4$5<a href="https://link.ca/ could be wrong , use with care

        1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Robin Cruise
          last edited by

          @Robin-Cruise

          If your solution works, is there reason for concern about it?
          Or would you just like someone to comment on how it could be better?

          @carypt

          The issue has nothing to do with “end-of-lines”, except that in your proposed solution, you turned it into something that involved end-of-lines. Probably best to refrain from offering solutions if your solution is going to be off-track, or even more complicated than prior ones proposed.

          1 Reply Last reply Reply Quote 0
          • caryptC
            carypt
            last edited by

            @Alan-Kilborn idk alan , in regex tester plugin i couldnt get robins search phrase match , so i varied it into working . but its looking poor .

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, @robin-cruise, @alan-kilborn, @carypt and All

              First, thanks for trying to find out a regex solution by yourself !

              Now, let’s start with this sample, which contains two sections <My Tag>.....<My Tag> and one section <My Old Tag>...........<My Old Tag>

              <My Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Tag>
              
              <My Old Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Old Tag>
              
              <My Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Tag>
              

              BTW, the closing tags should be </My Tag> and </My Old Tag> ? But this does not matter for the rest of this post !

              Your search regex <My Tag>(?s)(.*)(<a href="/).*?(?s)<My Tag> does not work as expected for several reasons :

              • First, it grasps the complete range between the first <My Tag> string till the last <My Tag> string, so the 3 sections all together, instead of one section only. To correct this behaviour just add a ? right after the first .*, in order to search for the smallest range of chars, instead of the greatest !

              => <My Tag>(?s)(.*?)(<a href="/).*?(?s)<My Tag>

              • Secondly, I suppose that you wrongly defined your group 2. Indeed, I think that it’s the range between <a href="/ and the closing tag <My tag>, which sould be stored as group 2. Note also, that the modifier (?s) coming next, is useless too, as already defined ! And better to place the first (?s) syntax at beginning of the regex, for a better understanding !

              So, your regex S/R is, now :

              SEARCH (?s)<My Tag>(.*?)<a href="/(.*?)<My Tag>

              REPLACE <My Tag>\1<a href="https://link.ca/\2<My Tag>

              If we run this regex S/R against our sample text, we notice that only the <a href="/website-1.html"> string of each good section <My Tag>......<My Tag> is changed. And…, after several repetitive clicks on the Replace All button, these sections are replaced but if you’re going on, then, the parts <a href="/website...."> of the wrong <My Old Tag>.......<My Old Tag> section are also modified :-((


              So we cannot go on this way ! Globally, the correct scheme is :

              • To search, first, for a <My tag> string

              • To catch any range of any characters till the nearest string <a href="/

              • Do the appropriate replacement

              • Re-start the search, immediately from the next character, with the \G assertion and…

              • Search, again, for any range of any characters till the nearest string <a href="/

              • Do the appropriate replacement

              And so on…

              However, in order that no replacement occurs when inside a wrong section as <My Old Tag>.......<My Old Tag>, we must add one condition : No < symbol, at beginning of a line, must be met at any location in the range of chars which is followed with the string <a href="/ ! This can be achieved with the regex ((?!^<).)+?

              Now, the initial string to change is <a href="/ and the final string should be <a href="https://link.ca/

              This is strictly equivalent to say that the empty location between the = sign and the / symbol must be replaced with the string https://link.ca. This empty location can be obtained with the \K syntax

              Finally, my regex S/R, using the free-spacing mode (?x) is :

              SEARCH (?xs) ( <My[ ]Tag> | \G ) ((?!^<).)+? <a[ ]href=" \K (?=/)

              REPLACE https://link.ca

              Note that, because of the \K syntax, you must click on the Replace All button, exclusively

              All replacements are done in all the good sections <My Tag>......<My Tag>, giving the expected text :

              <My Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="https://link.ca/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="https://link.ca/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="https://link.ca/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="https://link.ca/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="https://link.ca/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Tag>
              
              <My Old Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Old Tag>
              
              <My Tag>
              	<div class="searchField">
                          <div align="right">
              
                            <a href="https://link.ca/website-1.html"><img src="index_files/flag_lang_ro.jpg" title="ro" alt="ro" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>&nbsp; <a href="website-3.html"><img src="index_files/flag_lang_en.jpg" title="en" alt="en" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/es/website-4.html"><img src="index_files/flag_lang_es.jpg" title="es" alt="es" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/pt/website-5.html"><img src="index_files/flag_lang_pt.jpg" title="pt" alt="pt" width="28" height="19" /></a>&nbsp; <a href="https://link.ca/ar/website-6.html"><img src="index_files/flag_lang_ae.jpg" width="28" height="19" title="ar" alt="ar" /></a>&nbsp; <a href="https://link.ca/zh/website-7.html"><img src="index_files/flag_lang_zh.jpg" width="28" height="19" title="zh" alt="zh" /></a>&nbsp; <a href="https://link.ca/hi/website-8.html"><img src="index_files/flag_lang_hi.jpg" width="28" height="19" title="hi" alt="hi" /></a>&nbsp; <a href="https://link.ca/de/website-9.html"><img src="index_files/flag_lang_de.jpg" width="28" height="19" title="de" alt="de" /></a>&nbsp; <a href="https://link.ca/ru/website-10.html"><img src="index_files/flag_lang_ru.jpg" width="28" height="19" title="ru" alt="ru" /></a></div>
              <My Tag>
              

              Notes :

              • Without the free-spacing mode the search regex becomes (?s)(<My Tag>|\G)((?!^<).)+?<a href="\K(?=/)

              • The first alternative <My Tag> occurs first, and once only, per section and the second alternative \G all the other times

              • As said above, the part ((?!^<).)+? represents the smallest range of any chars, which does not contain a < symbol at beginning of a line, till… the string <a href="

              • Then, the \K syntax resets the regex engine search location and cancels all the matches found, so far

              • As the remaining of the regex is only the look-ahead (?=/), this means that this empty location, IF followed with a / symbol, is simply changed with the replacement string https://link.ca


              You 'll probably notice that the part :

              <a href="https://link.ca/fr/website-2.html"><img src="index_files/flag_lang_fr.jpg" title="fr" alt="fr" width="28" height="19" /></a>
              

              had not been changed ! But, it’s quite logical because it begins with <a href="website-3.html"> and not with <a href="/website-3.html"> !

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 4
              • caryptC
                carypt
                last edited by

                sry , must be blind .

                1 Reply Last reply Reply Quote 0
                • caryptC
                  carypt
                  last edited by carypt

                  i see the search phrase : <My Tag>.*?\K|(<a href="/)* matching all the <a href="/ in mark-window , but does not work in search/replace-window (giving totally different behavior) , also @guy038 search phrase isnt marking any in mark-window . why is it ? is there no way to control the matching without trying blindly ?

                  also i would say the regex trainer plugin isnt working correctly , so i leave it out . to my excuse from before , i wasnt reading well the original text , overread many search matches .

                  aaand i want to thank @guy038 for his detailed explainings . )

                  1 Reply Last reply Reply Quote 0
                  • Robin CruiseR
                    Robin Cruise
                    last edited by Robin Cruise

                    @guy038 said in replace words between tags with regular expression:

                    https://link.ca

                    works fine, thank you @guy038

                    Another case. Suppose instead of <My Tag> … <My Tag> I have a comment such as <!-- BEGIN --> … <!-- BEGIN -->

                    I change a little bit your regex, but I believe I made a mistake.

                    SEARCH: (?xs) (<\!-- BEGIN --> | \G ) ((?!^<).)+? <a[ ]href=" \K (?=/)

                    REPLACE: https://link.ca

                    what did I do wrong here?

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @robin-cruise and All,

                      Ah…, I immediately understood the problem !

                      When you use the free-spacing mode, with a first modifier (?x), or with a (?x....) syntax, like, for instance, (?xs-i) :

                      • Any usual space character is not part of the overall regex

                      • Any text located after a first # character are considered as comments and is not part of the overall regex, too

                      Thus, you must respect two rules :

                      • Any literal space character, to search for, must be represented with one of these three syntaxes, below :

                        • An anti-slash char \ right before that specific space char

                        • The [ ] syntax, that is to say a space char between square brackets, representing a character class feature

                        • The escape syntaxes \x20 or \x{20} or \x{0020}

                      • Any literal sharp character #, to search for, must be represented with one of the three syntaxes, below :

                        • An anti-slash char \ right before that specific # char ( => \# )

                        • The [#] syntax, that is to say a sharp char between square brackets, representing a character class feature

                        • The escape syntaxes \x23 or \x{23} or \x{0023}

                      For instance, let’s imagine that you want to match three space chars, surrounded by # characters, with a regex expression, you have the choice between all these syntaxes :

                      
                      - WITHOUT the FREE-SPACING mode :
                      
                      #   #
                      
                      #\x20{3}#
                      
                      #[ ][ ][ ]#
                      
                      - WITH the FREE-SPACING mode :
                      
                      (?x)  \x23\ \ \ \x23      # ESCAPED SPACE char and HEXADECIMAL ESCAPE of # 
                      
                      (?x)  \#[ ][ ][ ]\#       # ESCAPED SHARP char and SPACE in a CHARACTER CLASS
                      
                      (?x)  [#]\x20\x20{2}[#]   # SHARP char in a CHARACTER CLASS and HEXADECIMAL ESCAPE of SPACE chars
                      ...
                      ...
                      

                      Now, let’s go back to your new regex. I suppose that you’ve already guessed the problem ;-)) Yes, this is because of the space characters which surround the word BEGIN ! Note also that the ! char is not a special char in a character class [....]. So, the correct regex should be expressed as :

                      • (?xs) (<!--\ BEGIN\ --> | \G ) ((?!^<).)+? <a[ ]href=" \K (?=/)

                      Or

                      • (?xs) (<!--[ ]BEGIN[ ]--> | \G ) ((?!^<).)+? <a[ ]href=" \K (?=/)

                      Or

                      (?xs) (<!--\x20BEGIN\x20--> | \G ) ((?!^<).)+? <a[ ]href=" \K (?=/)

                      And, without the free-spacing mode :

                      • (?s)(<!-- BEGIN -->|\G)((?!^<).)+?<a href="\K(?=/)

                      You may even use this layout, where I’m using the free-spacing mode with numerous comments and non-capturing groups :

                      (?xs-i)                    # FREE-SPACING mode - DOT means ANY char, even EOL chars - Search SENSITIVE to CASE ( NON-INSENSITIVE ! )
                      (?:                        # FIRST NON-CAPTURING group to DEFINE a group of ALTERNATIVES
                        <!--[ ]BEGIN[ ]-->       #   FIRST alternative : the string <!-- BEGIN --> with this EXACT case
                      |                          # The ALTERNATION regex symbol
                        \G                       #   SECOND alternative : the \G assertion which forces that the NEXT match begins RIGHT AFTER the PREVIOUS one
                      )                          # END of the FIRST NON-CAPTURING group
                      (?:                        # SECOND NON-CAPTURING group to define a SINGLE REPEATED char
                        (?!^<).                  #   ANY char, even an EOL char, IF this char is NOT an OPENING ANGLE bracket at BEGINNING of a line
                                                 #    ...That is to say that the regex engine MUST NOT enter a NEW section while doing a MATCH ATTEMPT
                      )+?                        # END of the SECOND NON-CAPTURING group, REPËATED from 1 to MORE, the MINIMUM of times till...
                      <a[ ]href="                # The LITERAL string <a href="
                      \K                         # RESETS the regex engine LOCATION and CANCELS matches, so far => the PRESENT match is ONLY the EMPTY string...
                      (?=/)                      # IF FOLLOWED with an ANTISLASH character
                      

                      Now, @robin-cruise, follow these steps :

                      • Select all the text from (?xs-i) till ANTISLASH character

                      • Open the find dialog ( Ctrl + F )

                      • Type in https://link.ca in the Replace with field

                      • Select the Regular expression search mode

                      • Click on the Replace All button

                      Here you are ;-))

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • Robin CruiseR
                        Robin Cruise
                        last edited by

                        thank you !

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors