Community
    • Login

    Regex: Add html tags in the lines that doesn't have html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by

      I have this paragraph. Also, I have a line I need someone to take me home. that doesn’t have html tags. So, I need to find this line (not others) and frame it between tags

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      I need someone to take me home.
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      OUTPUT:

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.</p>
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      I don’t know why my regex doesn’t work.

      FIND: ^(?!<p class="mb-40px">)(.*?)((?!</p>).)*$

      REPLACE: <p class="mb-40px">\2\</p>

      Robin CruiseR 1 Reply Last reply Reply Quote 0
      • Robin CruiseR
        Robin Cruise @Robin Cruise
        last edited by Robin Cruise

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • Robin CruiseR
          Robin Cruise
          last edited by

          ok, so, I believe, I took a step forward. Seems t work.

          FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

          REPLACE BY: <p class="mb-40px">\2</p>

          Now, I have to integrate this regex between section:
          <!-- START --> and <!-- FINAL -->

          I will use this generic formula:

          (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

          will become:

          FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

          REPLACE: <p class="mb-40px">\2</p>

          In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hello @robin-cruise and All,

            No need to use the generic formula !

            Here is my general method :

            • From beginning of current line, I try to find a line which does not contain :

              • A string <!-- START --> at any position of current line
                AND
              • A string <!-- FINAL --> at any position of current line
                AND
                (
              • A tag <p class="mb-40px"> at any position of current line
                OR
              • A tag </p> at any position of current line
                )
            • Then I select all characters, of current line, which come :

              • After a possible <p class="mb-40px"> tag

              • Before a possible </p> tag


            So, given this INPUT text, below, with 3 lines to change :

            <!-- START -->
            
            <p class="mb-40px">I may go to cinema</p>
            
            I need someone to take me home.
            
            <p class="mb-40px">I may go to cinema</p>
            
            I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.
            
            <p class="mb-40px">I can love you now</p>
            
            <!-- FINAL -->
            

            I use the following regex S/R :

            SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

            REPLACE <p class="mb-40px">\1</p>

            And, after a click on the Replace All button, I get the expected OUTPUT text :

            <!-- START -->
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I can love you now</p>
            
            <!-- FINAL -->
            

            Notes :

            • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

            • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

            • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

              • All chars before the </p> tag
                OR
              • All remaining chars of current line

            Remark :

            • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

            • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

            Best Regards,

            guy038

            Robin CruiseR 1 Reply Last reply Reply Quote 1
            • Robin CruiseR
              Robin Cruise @guy038
              last edited by

              @guy038 super, thanks.

              what should be the generic regex in this case? (because I cannot figure the last part )

              (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi, @robin-cruise,

                You cannot use the generic regex, discussed in the topic :

                https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

                In order to solve your present goal. Why ?


                Well, because that genric regex suppose :

                • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

                • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


                But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

                Best Regards,

                guy038

                Robin CruiseR 1 Reply Last reply Reply Quote 1
                • Robin CruiseR
                  Robin Cruise @guy038
                  last edited by Robin Cruise

                  @guy038

                  SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

                  REPLACE: <p class="mb-40px">\1</p>

                  Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

                  So, I need only to change between section <!-- START --> and <!-- FINAL -->

                  <html lang="en">
                  <head>
                    <!-- Meta Tags -->
                    <meta charset="utf-8"/>
                  
                  Script type="application/ld+json">
                  {
                    "@context": "https://schema.org/", 
                    "@type": "Product", 
                    "name": "10 media farces of big days",
                    "image": "icon.jpg",
                    "description": "horses of Letea Delta Danube successfully saved,",
                    "brand": {
                      "@type": "Brand",
                      "name": "something"
                    },
                    "sku": "NFL",
                    "gtin8": "NFL",
                    "offers": {
                      "@type": "Offer",
                      "url": "https://something.html",
                      "priceCurrency": "RON",
                      "price": "0",
                      "priceValidUntil": "2022-02-15",
                      "availability": "https://schema.org/OnlineOnly"
                    },
                    "aggregateRating": {
                      "@type": "AggregateRating",
                      "ratingValue": "5",
                      "bestRating": "5",
                      "ratingCount": "6"
                    },
                    "review": {
                      "@type": "Review",
                      "reviewRating": {
                        "@type": "Rating",
                        "ratingValue": "5",
                        "bestRating": "5"
                      },
                      "author": {"@type": "Person", "name": "omehing"},
                      "publisher": {"@type": "Organization", "name": "omehing"}
                    }
                  }
                  </script>
                  
                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @robin-cruise,

                    Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

                    We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

                    BR

                    guy038

                    Robin CruiseR 1 Reply Last reply Reply Quote 1
                    • Robin CruiseR
                      Robin Cruise @guy038
                      last edited by

                      @guy038

                      yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @Robin-cruise

                        If you don’t mind, just send me your file by e-mail !

                        Here is my temporary mail address :

                        BR

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello @robin-cruise and All,

                          Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                          Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                          • In search, to look for :

                            • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                          • OR

                            • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                          • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                            • A possible <p class="mb-40px"> tag, beginning the current line

                            • Followed with a single-line range of characters :

                              • Till a </p> tag, ending the current line
                            • OR

                              • Till the end of current line
                          • In replacement, to rewrite :

                            • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                              • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                              • Then all the contents of current line ( $0 )

                              • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                          • OR

                            • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                          For instance, from this INPUT file, below :

                          <!DOCTYPE html>
                          ....
                          bla bla
                          ....
                          blah bla
                          
                          <!-- ARTICOL START -->
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          I need someone to take me home.
                          
                          I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.
                          
                          <!-- ARTICOL FINAL -->
                          
                          bla bla
                          ....
                          blah bla
                          ....
                          </html>
                          

                          The following regex S/R :

                          SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                          REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                          Should give you the expected results :

                          <!DOCTYPE html>
                          ....
                          bla bla
                          ....
                          blah bla
                          
                          <!-- ARTICOL START -->
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <!-- ARTICOL FINAL -->
                          
                          bla bla
                          ....
                          blah bla
                          ....
                          </html>
                          

                          The message Replace All: 6 occurences were replaced is displayed in the status bar :

                          • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                          • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                          • One for the part between <!-- ARTICOL START --> and the very end of file

                          Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                          Best Regards,

                          guy038

                          Robin CruiseR 1 Reply Last reply Reply Quote 1
                          • Robin CruiseR
                            Robin Cruise @guy038
                            last edited by

                            @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                            SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                            REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                            great answer, thank you @guy038

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors