Community
    • Login

    Match everything except the text and <br> tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    21 Posts 3 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi, @dr-ramaanand, @peterjones and All,

      Ah, I was a bit too slow and Peter just beats me ! Note that I used the same process than Peter to determine where the error occurs !

      @dr-ramaanand, you just did a small typo error in the regex that you provided !


      The correct regex, to match your text, is not that one, with a /s* syntax :

                                                                                                                                                                                            V
      (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>/s*</div>\s*<div class="container">\s*<div class="left">
      

      but this one, with a correct \s* syntax :

                                                                                                                                                                                            V
      (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>\s*</div>\s*<div class="container">\s*<div class="left">
      

      Remarks :

      • May be, it would be preferable to add a \s* syntax at the very end of your regex !

      • You could also simplify this regex, significantly, by using the version below :

      SEARCH (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">\s*

      BR

      guy038

      dr ramaanandD 1 Reply Last reply Reply Quote 1
      • dr ramaanandD
        dr ramaanand @guy038
        last edited by

        @guy038 I have more than one <div class="left">, so how do I make it stop searching after finding the first <div class="left"> ?

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by

          Hello, @dr-ramaanand and All,

          To solve this case, I would use the following regex S/R :

          SEARCH (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">

          REPLACE Whatever you want to !

          Note that I did not add, this time, the \s* part at the end of the search regex.

          Also notice the two lazy syntaxes ( .+? ), right after \A and right before \s*<div class="left">, in order to select only the first section s*<div style=.....\s*<div class="left">, only !

          BR

          guy038

          dr ramaanandD 1 Reply Last reply Reply Quote 0
          • dr ramaanandD
            dr ramaanand @guy038
            last edited by dr ramaanand

            @guy038 I used this as a sample:-

            <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
            <div class="left">
            <p class=MsoNormal><b><span style='font-size:13.5pt;line-height:115%;
                font-family:"Verdana","sans-serif";color:red'>SYNONYMS </span></b>
            </p>
            <div class="left">
            

            Your Regular expression does not stop searching at the first occurrence of <div class="left">

            dr ramaanandD 1 Reply Last reply Reply Quote 0
            • dr ramaanandD
              dr ramaanand @dr ramaanand
              last edited by

              @guy038 This RegEx helped stop searching as soon as it found a <p........>:-
              (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">(?=\s*+<p[^<>]*+>)

              dr ramaanandD 1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by

                Hi, @dr-ramaanand and All,

                Ah, of course, if you add a <div class="left"> line, right after the first <div style="..... line, it will not work !


                So, given this INPUT text, pasted in a new tab:

                <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                <div class="left">
                <p class=MsoNormal><b><span style='font-size:13.5pt;line-height:115%;
                    font-family:"Verdana","sans-serif";color:red'>SYNONYMS </span></b>
                </p>
                <div class="left">
                

                Simply, change the previous search regex by this new version :

                (?s)\A.+?\R\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">

                Note the différence : between #EBF4FB;">\s* and \s*<div class="left">, I changed the part .+? by .*?

                I also slightly change the position of the \K feature


                Ax expected, this new regex will match the two consecutive lines :

                <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                <div class="left">
                

                BR

                guy038

                1 Reply Last reply Reply Quote 0
                • dr ramaanandD
                  dr ramaanand @dr ramaanand
                  last edited by

                  @guy038 This RegEx: (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">.+(?=\s*+<div class="left">) would have stopped searching just before the second occurrence of <div class="left"> if the sample to be searched was like this:-

                  <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                  <div class="left">
                  <div class="left">
                  
                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    @dr-ramaanand,

                    Yes, your regex does match the same amount of text as my version but my regex seems more simple and logic !

                    BR

                    guy038

                    dr ramaanandD 1 Reply Last reply Reply Quote 0
                    • dr ramaanandD
                      dr ramaanand @guy038
                      last edited by

                      @guy038 d’accord, merci beaucoup!

                      dr ramaanandD 1 Reply Last reply Reply Quote 0
                      • dr ramaanandD
                        dr ramaanand @dr ramaanand
                        last edited by

                        @guy038 your last RegEx finds the first occurrence of <div class="left"> even if there is some other text above it. Lovely!

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by

                          Hi, @dr-ramaanand and All,

                          Again, I did not check all the possibilities before posting. Sorry for the NOISE !

                          So, the right regex to use should be :

                          (?s)\A.*?\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">


                          This time, it will work if you pasted this text, in a new tab

                          <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                          <div class="left">
                          <div class="left">
                          

                          But it will also works, if you pasted the following text, in a new tab

                          
                          First non-blank line
                          second line
                          
                          Third line before the block to match
                          
                          <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                          <div class="left">
                          <div class="left">
                          

                          Best Regards,

                          guy038

                          dr ramaanandD 1 Reply Last reply Reply Quote 0
                          • dr ramaanandD
                            dr ramaanand @guy038
                            last edited by dr ramaanand

                            @guy038 I am not sure if I am allowed to do it (as the solution was provided by you), so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038

                              Hello, @dr-ramaanand and All,

                              You said in your previous post :

                              … so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

                              But, actually, my regex finds the first occurrence of the <div class="left"> tag, AFTER a first occurrence of the <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;"> tag !


                              So, to my mind, the correct way to match the first occurrence of a specific tag, in current file, is to use the generic regex :

                              (?s-i)\A.*?\K<TAG Name(?: .*?)?>

                              Just replace the generic TAG Name value with a valid HTML tag

                              Note that, in case of the comment tag, replace the generic TAG Name, into the above regex, by the literal string !--.*?--


                              Similarly, the correct way to match the last occurrence of a specific tag, in current file, is to use the generic regex :

                              (?s-i)\A.*\K<TAG Name(?: .*?)?>

                              BR

                              guy038

                              dr ramaanandD 1 Reply Last reply Reply Quote 0
                              • dr ramaanandD
                                dr ramaanand @guy038
                                last edited by

                                @guy038 said in Match everything except the text and <br> tags:

                                (?s-i)\A.\K<TAG Name(?: .?)?>

                                I think that that should be (?s-i)\A.*\K<TAG Name(?:.*?)?> with no spaces anywhere in the middle

                                1 Reply Last reply Reply Quote 0
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  Hi, @dr-ramaanand and All,

                                  In order to use a valid INPUT text to do some tests, just open the main page of our forum. Then hit the Ctrl + U shortcut to open the HTML source page of our forum and paste its contents in a new tab


                                  My generic regex tries to match the syntax <TAG......, till the nearest > character and must be valid for any kind of tag.

                                  Thus, I prefer to insert a space char to verify that the tag is a valid one . Indeed, this regex will match, either, tags like <head> or for example <span style="color:blue">blue</span>

                                  If you replace the TAG Name in the generic regex (?s-i)\A.*?\K<TAG Name(?: .*?)?>, which matches the first tag, named TAG, in current file, you get, from the examples, the regexes :

                                  • (?s-i)\A.*?\K<head(?: .*?)?>

                                  • (?s-i)\A.*?\K<span(?: .*?)?>

                                  Just test them against the HTML code source of our forum


                                  Now, let’s suppose, for example, that you want to find out the first input ...> tag, AFTER the first img ......> tag, in the HTML code source of our forum :

                                  Then, from my previous post, you would have to use the following regex :

                                  (?s-i)\A.*?<img(?: .*?)?>.*?\K<input(?: .*?)?>

                                  which matches, as expected, the following line :

                                  <input autocomplete="off" type="text" class="form-control hidden" name="term" placeholder="Search"/>
                                  

                                  BR

                                  guy038

                                  P.S. : You also replied in an old post, regarding this extra space char. However, I’ll not reply because this topic is old and not exactly related to the present discussion !

                                  dr ramaanandD 1 Reply Last reply Reply Quote 0
                                  • dr ramaanandD
                                    dr ramaanand @guy038
                                    last edited by

                                    @guy038 Okay, thank you!

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors