Community
    • Login

    Regex: Add html tags in the lines that doesn't have html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise @Robin Cruise
      last edited by Robin Cruise

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • Robin CruiseR
        Robin Cruise
        last edited by

        ok, so, I believe, I took a step forward. Seems t work.

        FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

        REPLACE BY: <p class="mb-40px">\2</p>

        Now, I have to integrate this regex between section:
        <!-- START --> and <!-- FINAL -->

        I will use this generic formula:

        (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

        will become:

        FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

        REPLACE: <p class="mb-40px">\2</p>

        In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello @robin-cruise and All,

          No need to use the generic formula !

          Here is my general method :

          • From beginning of current line, I try to find a line which does not contain :

            • A string <!-- START --> at any position of current line
              AND
            • A string <!-- FINAL --> at any position of current line
              AND
              (
            • A tag <p class="mb-40px"> at any position of current line
              OR
            • A tag </p> at any position of current line
              )
          • Then I select all characters, of current line, which come :

            • After a possible <p class="mb-40px"> tag

            • Before a possible </p> tag


          So, given this INPUT text, below, with 3 lines to change :

          <!-- START -->
          
          <p class="mb-40px">I may go to cinema</p>
          
          I need someone to take me home.
          
          <p class="mb-40px">I may go to cinema</p>
          
          I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.
          
          <p class="mb-40px">I can love you now</p>
          
          <!-- FINAL -->
          

          I use the following regex S/R :

          SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

          REPLACE <p class="mb-40px">\1</p>

          And, after a click on the Replace All button, I get the expected OUTPUT text :

          <!-- START -->
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I can love you now</p>
          
          <!-- FINAL -->
          

          Notes :

          • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

          • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

          • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

            • All chars before the </p> tag
              OR
            • All remaining chars of current line

          Remark :

          • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

          • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

          Best Regards,

          guy038

          Robin CruiseR 1 Reply Last reply Reply Quote 1
          • Robin CruiseR
            Robin Cruise @guy038
            last edited by

            @guy038 super, thanks.

            what should be the generic regex in this case? (because I cannot figure the last part )

            (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @robin-cruise,

              You cannot use the generic regex, discussed in the topic :

              https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

              In order to solve your present goal. Why ?


              Well, because that genric regex suppose :

              • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

              • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


              But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

              Best Regards,

              guy038

              Robin CruiseR 1 Reply Last reply Reply Quote 1
              • Robin CruiseR
                Robin Cruise @guy038
                last edited by Robin Cruise

                @guy038

                SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

                REPLACE: <p class="mb-40px">\1</p>

                Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

                So, I need only to change between section <!-- START --> and <!-- FINAL -->

                <html lang="en">
                <head>
                  <!-- Meta Tags -->
                  <meta charset="utf-8"/>
                
                Script type="application/ld+json">
                {
                  "@context": "https://schema.org/", 
                  "@type": "Product", 
                  "name": "10 media farces of big days",
                  "image": "icon.jpg",
                  "description": "horses of Letea Delta Danube successfully saved,",
                  "brand": {
                    "@type": "Brand",
                    "name": "something"
                  },
                  "sku": "NFL",
                  "gtin8": "NFL",
                  "offers": {
                    "@type": "Offer",
                    "url": "https://something.html",
                    "priceCurrency": "RON",
                    "price": "0",
                    "priceValidUntil": "2022-02-15",
                    "availability": "https://schema.org/OnlineOnly"
                  },
                  "aggregateRating": {
                    "@type": "AggregateRating",
                    "ratingValue": "5",
                    "bestRating": "5",
                    "ratingCount": "6"
                  },
                  "review": {
                    "@type": "Review",
                    "reviewRating": {
                      "@type": "Rating",
                      "ratingValue": "5",
                      "bestRating": "5"
                    },
                    "author": {"@type": "Person", "name": "omehing"},
                    "publisher": {"@type": "Organization", "name": "omehing"}
                  }
                }
                </script>
                
                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hi, @robin-cruise,

                  Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

                  We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

                  BR

                  guy038

                  Robin CruiseR 1 Reply Last reply Reply Quote 1
                  • Robin CruiseR
                    Robin Cruise @guy038
                    last edited by

                    @guy038

                    yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @Robin-cruise

                      If you don’t mind, just send me your file by e-mail !

                      Here is my temporary mail address :

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello @robin-cruise and All,

                        Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                        Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                        • In search, to look for :

                          • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                        • OR

                          • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                        • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                          • A possible <p class="mb-40px"> tag, beginning the current line

                          • Followed with a single-line range of characters :

                            • Till a </p> tag, ending the current line
                          • OR

                            • Till the end of current line
                        • In replacement, to rewrite :

                          • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                            • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                            • Then all the contents of current line ( $0 )

                            • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                        • OR

                          • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                        For instance, from this INPUT file, below :

                        <!DOCTYPE html>
                        ....
                        bla bla
                        ....
                        blah bla
                        
                        <!-- ARTICOL START -->
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        I need someone to take me home.
                        
                        I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.
                        
                        <!-- ARTICOL FINAL -->
                        
                        bla bla
                        ....
                        blah bla
                        ....
                        </html>
                        

                        The following regex S/R :

                        SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                        REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                        Should give you the expected results :

                        <!DOCTYPE html>
                        ....
                        bla bla
                        ....
                        blah bla
                        
                        <!-- ARTICOL START -->
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <!-- ARTICOL FINAL -->
                        
                        bla bla
                        ....
                        blah bla
                        ....
                        </html>
                        

                        The message Replace All: 6 occurences were replaced is displayed in the status bar :

                        • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                        • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                        • One for the part between <!-- ARTICOL START --> and the very end of file

                        Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                        Best Regards,

                        guy038

                        Robin CruiseR 1 Reply Last reply Reply Quote 1
                        • Robin CruiseR
                          Robin Cruise @guy038
                          last edited by

                          @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                          SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                          REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                          great answer, thank you @guy038

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors