Community
    • Login

    Regex: Add html tags in the lines that doesn't have html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by

      ok, so, I believe, I took a step forward. Seems t work.

      FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

      REPLACE BY: <p class="mb-40px">\2</p>

      Now, I have to integrate this regex between section:
      <!-- START --> and <!-- FINAL -->

      I will use this generic formula:

      (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

      will become:

      FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

      REPLACE: <p class="mb-40px">\2</p>

      In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello @robin-cruise and All,

        No need to use the generic formula !

        Here is my general method :

        • From beginning of current line, I try to find a line which does not contain :

          • A string <!-- START --> at any position of current line
            AND
          • A string <!-- FINAL --> at any position of current line
            AND
            (
          • A tag <p class="mb-40px"> at any position of current line
            OR
          • A tag </p> at any position of current line
            )
        • Then I select all characters, of current line, which come :

          • After a possible <p class="mb-40px"> tag

          • Before a possible </p> tag


        So, given this INPUT text, below, with 3 lines to change :

        <!-- START -->
        
        <p class="mb-40px">I may go to cinema</p>
        
        I need someone to take me home.
        
        <p class="mb-40px">I may go to cinema</p>
        
        I need someone to take me home.</p>
        
        <p class="mb-40px">I may go to cinema</p>
        
        <p class="mb-40px">I need someone to take me home.
        
        <p class="mb-40px">I can love you now</p>
        
        <!-- FINAL -->
        

        I use the following regex S/R :

        SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

        REPLACE <p class="mb-40px">\1</p>

        And, after a click on the Replace All button, I get the expected OUTPUT text :

        <!-- START -->
        
        <p class="mb-40px">I may go to cinema</p>
        
        <p class="mb-40px">I need someone to take me home.</p>
        
        <p class="mb-40px">I may go to cinema</p>
        
        <p class="mb-40px">I need someone to take me home.</p>
        
        <p class="mb-40px">I may go to cinema</p>
        
        <p class="mb-40px">I need someone to take me home.</p>
        
        <p class="mb-40px">I can love you now</p>
        
        <!-- FINAL -->
        

        Notes :

        • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

        • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

        • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

          • All chars before the </p> tag
            OR
          • All remaining chars of current line

        Remark :

        • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

        • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

        Best Regards,

        guy038

        Robin CruiseR 1 Reply Last reply Reply Quote 1
        • Robin CruiseR
          Robin Cruise @guy038
          last edited by

          @guy038 super, thanks.

          what should be the generic regex in this case? (because I cannot figure the last part )

          (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @robin-cruise,

            You cannot use the generic regex, discussed in the topic :

            https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

            In order to solve your present goal. Why ?


            Well, because that genric regex suppose :

            • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

            • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


            But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

            Best Regards,

            guy038

            Robin CruiseR 1 Reply Last reply Reply Quote 1
            • Robin CruiseR
              Robin Cruise @guy038
              last edited by Robin Cruise

              @guy038

              SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

              REPLACE: <p class="mb-40px">\1</p>

              Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

              So, I need only to change between section <!-- START --> and <!-- FINAL -->

              <html lang="en">
              <head>
                <!-- Meta Tags -->
                <meta charset="utf-8"/>
              
              Script type="application/ld+json">
              {
                "@context": "https://schema.org/", 
                "@type": "Product", 
                "name": "10 media farces of big days",
                "image": "icon.jpg",
                "description": "horses of Letea Delta Danube successfully saved,",
                "brand": {
                  "@type": "Brand",
                  "name": "something"
                },
                "sku": "NFL",
                "gtin8": "NFL",
                "offers": {
                  "@type": "Offer",
                  "url": "https://something.html",
                  "priceCurrency": "RON",
                  "price": "0",
                  "priceValidUntil": "2022-02-15",
                  "availability": "https://schema.org/OnlineOnly"
                },
                "aggregateRating": {
                  "@type": "AggregateRating",
                  "ratingValue": "5",
                  "bestRating": "5",
                  "ratingCount": "6"
                },
                "review": {
                  "@type": "Review",
                  "reviewRating": {
                    "@type": "Rating",
                    "ratingValue": "5",
                    "bestRating": "5"
                  },
                  "author": {"@type": "Person", "name": "omehing"},
                  "publisher": {"@type": "Organization", "name": "omehing"}
                }
              }
              </script>
              
              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by

                Hi, @robin-cruise,

                Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

                We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

                BR

                guy038

                Robin CruiseR 1 Reply Last reply Reply Quote 1
                • Robin CruiseR
                  Robin Cruise @guy038
                  last edited by

                  @guy038

                  yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @Robin-cruise

                    If you don’t mind, just send me your file by e-mail !

                    Here is my temporary mail address :

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello @robin-cruise and All,

                      Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                      Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                      • In search, to look for :

                        • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                      • OR

                        • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                      • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                        • A possible <p class="mb-40px"> tag, beginning the current line

                        • Followed with a single-line range of characters :

                          • Till a </p> tag, ending the current line
                        • OR

                          • Till the end of current line
                      • In replacement, to rewrite :

                        • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                          • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                          • Then all the contents of current line ( $0 )

                          • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                      • OR

                        • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                      For instance, from this INPUT file, below :

                      <!DOCTYPE html>
                      ....
                      bla bla
                      ....
                      blah bla
                      
                      <!-- ARTICOL START -->
                      
                      <p class="mb-40px">I need someone to take me home.</p>
                      
                      I need someone to take me home.
                      
                      I need someone to take me home.</p>
                      
                      <p class="mb-40px">I need someone to take me home.
                      
                      <!-- ARTICOL FINAL -->
                      
                      bla bla
                      ....
                      blah bla
                      ....
                      </html>
                      

                      The following regex S/R :

                      SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                      REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                      Should give you the expected results :

                      <!DOCTYPE html>
                      ....
                      bla bla
                      ....
                      blah bla
                      
                      <!-- ARTICOL START -->
                      
                      <p class="mb-40px">I need someone to take me home.</p>
                      
                      <p class="mb-40px">I need someone to take me home.</p>
                      
                      <p class="mb-40px">I need someone to take me home.</p>
                      
                      <p class="mb-40px">I need someone to take me home.</p>
                      
                      <!-- ARTICOL FINAL -->
                      
                      bla bla
                      ....
                      blah bla
                      ....
                      </html>
                      

                      The message Replace All: 6 occurences were replaced is displayed in the status bar :

                      • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                      • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                      • One for the part between <!-- ARTICOL START --> and the very end of file

                      Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                      Best Regards,

                      guy038

                      Robin CruiseR 1 Reply Last reply Reply Quote 1
                      • Robin CruiseR
                        Robin Cruise @guy038
                        last edited by

                        @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                        SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                        REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                        great answer, thank you @guy038

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors