• Login
Community
  • Login

Regex: Add html tags in the lines that doesn't have html tags

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
12 Posts 2 Posters 2.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    Robin Cruise
    last edited by May 25, 2022, 8:08 PM

    ok, so, I believe, I took a step forward. Seems t work.

    FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

    REPLACE BY: <p class="mb-40px">\2</p>

    Now, I have to integrate this regex between section:
    <!-- START --> and <!-- FINAL -->

    I will use this generic formula:

    (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

    will become:

    FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

    REPLACE: <p class="mb-40px">\2</p>

    In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 May 26, 2022, 8:12 AM May 25, 2022, 10:02 PM

      Hello @robin-cruise and All,

      No need to use the generic formula !

      Here is my general method :

      • From beginning of current line, I try to find a line which does not contain :

        • A string <!-- START --> at any position of current line
          AND
        • A string <!-- FINAL --> at any position of current line
          AND
          (
        • A tag <p class="mb-40px"> at any position of current line
          OR
        • A tag </p> at any position of current line
          )
      • Then I select all characters, of current line, which come :

        • After a possible <p class="mb-40px"> tag

        • Before a possible </p> tag


      So, given this INPUT text, below, with 3 lines to change :

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      I need someone to take me home.
      
      <p class="mb-40px">I may go to cinema</p>
      
      I need someone to take me home.</p>
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      I use the following regex S/R :

      SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

      REPLACE <p class="mb-40px">\1</p>

      And, after a click on the Replace All button, I get the expected OUTPUT text :

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.</p>
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.</p>
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.</p>
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      Notes :

      • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

      • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

      • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

        • All chars before the </p> tag
          OR
        • All remaining chars of current line

      Remark :

      • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

      • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

      Best Regards,

      guy038

      R 1 Reply Last reply May 26, 2022, 7:41 AM Reply Quote 1
      • R
        Robin Cruise @guy038
        last edited by May 26, 2022, 7:41 AM

        @guy038 super, thanks.

        what should be the generic regex in this case? (because I cannot figure the last part )

        (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 May 26, 2022, 8:49 AM May 26, 2022, 8:46 AM

          Hi, @robin-cruise,

          You cannot use the generic regex, discussed in the topic :

          https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

          In order to solve your present goal. Why ?


          Well, because that genric regex suppose :

          • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

          • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


          But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

          Best Regards,

          guy038

          R 1 Reply Last reply May 26, 2022, 12:21 PM Reply Quote 1
          • R
            Robin Cruise @guy038
            last edited by Robin Cruise May 26, 2022, 12:22 PM May 26, 2022, 12:21 PM

            @guy038

            SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

            REPLACE: <p class="mb-40px">\1</p>

            Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

            So, I need only to change between section <!-- START --> and <!-- FINAL -->

            <html lang="en">
            <head>
              <!-- Meta Tags -->
              <meta charset="utf-8"/>
            
            Script type="application/ld+json">
            {
              "@context": "https://schema.org/", 
              "@type": "Product", 
              "name": "10 media farces of big days",
              "image": "icon.jpg",
              "description": "horses of Letea Delta Danube successfully saved,",
              "brand": {
                "@type": "Brand",
                "name": "something"
              },
              "sku": "NFL",
              "gtin8": "NFL",
              "offers": {
                "@type": "Offer",
                "url": "https://something.html",
                "priceCurrency": "RON",
                "price": "0",
                "priceValidUntil": "2022-02-15",
                "availability": "https://schema.org/OnlineOnly"
              },
              "aggregateRating": {
                "@type": "AggregateRating",
                "ratingValue": "5",
                "bestRating": "5",
                "ratingCount": "6"
              },
              "review": {
                "@type": "Review",
                "reviewRating": {
                  "@type": "Rating",
                  "ratingValue": "5",
                  "bestRating": "5"
                },
                "author": {"@type": "Person", "name": "omehing"},
                "publisher": {"@type": "Organization", "name": "omehing"}
              }
            }
            </script>
            
            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by May 26, 2022, 2:54 PM

              Hi, @robin-cruise,

              Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

              We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

              BR

              guy038

              R 1 Reply Last reply May 26, 2022, 3:01 PM Reply Quote 1
              • R
                Robin Cruise @guy038
                last edited by May 26, 2022, 3:01 PM

                @guy038

                yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                1 Reply Last reply Reply Quote 0
                • G
                  guy038
                  last edited by guy038 May 27, 2022, 1:05 AM May 26, 2022, 3:12 PM

                  Hi, @Robin-cruise

                  If you don’t mind, just send me your file by e-mail !

                  Here is my temporary mail address :

                  BR

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 May 27, 2022, 11:32 AM May 27, 2022, 11:23 AM

                    Hello @robin-cruise and All,

                    Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                    Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                    • In search, to look for :

                      • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                    • OR

                      • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                    • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                      • A possible <p class="mb-40px"> tag, beginning the current line

                      • Followed with a single-line range of characters :

                        • Till a </p> tag, ending the current line
                      • OR

                        • Till the end of current line
                    • In replacement, to rewrite :

                      • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                        • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                        • Then all the contents of current line ( $0 )

                        • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                    • OR

                      • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                    For instance, from this INPUT file, below :

                    <!DOCTYPE html>
                    ....
                    bla bla
                    ....
                    blah bla
                    
                    <!-- ARTICOL START -->
                    
                    <p class="mb-40px">I need someone to take me home.</p>
                    
                    I need someone to take me home.
                    
                    I need someone to take me home.</p>
                    
                    <p class="mb-40px">I need someone to take me home.
                    
                    <!-- ARTICOL FINAL -->
                    
                    bla bla
                    ....
                    blah bla
                    ....
                    </html>
                    

                    The following regex S/R :

                    SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                    REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                    Should give you the expected results :

                    <!DOCTYPE html>
                    ....
                    bla bla
                    ....
                    blah bla
                    
                    <!-- ARTICOL START -->
                    
                    <p class="mb-40px">I need someone to take me home.</p>
                    
                    <p class="mb-40px">I need someone to take me home.</p>
                    
                    <p class="mb-40px">I need someone to take me home.</p>
                    
                    <p class="mb-40px">I need someone to take me home.</p>
                    
                    <!-- ARTICOL FINAL -->
                    
                    bla bla
                    ....
                    blah bla
                    ....
                    </html>
                    

                    The message Replace All: 6 occurences were replaced is displayed in the status bar :

                    • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                    • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                    • One for the part between <!-- ARTICOL START --> and the very end of file

                    Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                    Best Regards,

                    guy038

                    R 1 Reply Last reply May 27, 2022, 2:34 PM Reply Quote 1
                    • R
                      Robin Cruise @guy038
                      last edited by May 27, 2022, 2:34 PM

                      @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                      SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                      REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                      great answer, thank you @guy038

                      1 Reply Last reply Reply Quote 0
                      12 out of 12
                      • First post
                        12/12
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors