• Login
Community
  • Login

Match everything except the text and <br> tags

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
21 Posts 3 Posters 1.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    dr ramaanand
    last edited by Nov 14, 2024, 3:02 PM

    <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
    <p><span style="font-size:13.5pt"><span style="font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;"><span style="color:#075296">Skin lumpy, thick, hard. Excoriations, cracks or fissures. Gluey moisture.<br />
    			<br />
    			Moist, crusty eruptions. Obesity. Sourness. </span></span></span></p>
    			</div>
    </div>
    <div class="container">
    <div class="left">
    

    This Regular expression failed to find the above: (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>/s*</div>\s*<div class="container">\s*<div class="left">

    D P 2 Replies Last reply Nov 14, 2024, 3:04 PM Reply Quote 0
    • D
      dr ramaanand @dr ramaanand
      last edited by dr ramaanand Nov 14, 2024, 3:23 PM Nov 14, 2024, 3:04 PM

      Please tweak my above Regular expression to match (and find) everything except the text and <br> tags. I am trying to replace the above in a couple of (multiple) files

      P 1 Reply Last reply Nov 14, 2024, 3:22 PM Reply Quote -1
      • P
        PeterJones @dr ramaanand
        last edited by PeterJones Nov 14, 2024, 5:52 PM Nov 14, 2024, 3:22 PM

        @dr-ramaanand ,

        Regex is not the best way to edit HTML. You have been posting HTML-related regex questions for years, and we have tried to communicate this to you over and over again.

        And even if you decide to choose the wrong tool for the job (which is your perogative), you also haven’t seemed to learn enough about regex to muddle your way through, yourself. You must put in more effort to learn it yourself, rather than coming back every few months with some slightly different situation than all your others.

        As you have been told before, but you seem to have forgotten:

        Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

        1 Reply Last reply Reply Quote 1
        • P
          PeterJones @dr ramaanand
          last edited by PeterJones Nov 14, 2024, 3:30 PM Nov 14, 2024, 3:28 PM

          @dr-ramaanand said in Match everything except the text and <br> tags:

          (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>/s*</div>\s*<div class="container">\s*<div class="left">

          In this case, your problem is easy. /s* doesn’t match what you think it matches. I think you meant \s*

          Do you know how I found this? I started with a smaller part of your regex, saw that it matched, then slowly added more and more until it didn’t match; then I backed up and found the exact section that caused it to fail, and the solution was easy. It was simple debugging skills, which you need to learn if you are going to continue to manipulate data using regex.

          D 1 Reply Last reply Nov 14, 2024, 3:34 PM Reply Quote 1
          • D
            dr ramaanand @PeterJones
            last edited by Nov 14, 2024, 3:34 PM

            @PeterJones Thank you very much. This Regular expression worked: (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>\s*</div>\s*<div class="container">\s*<div class="left">

            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by guy038 Nov 14, 2024, 6:08 PM Nov 14, 2024, 3:53 PM

              Hi, @dr-ramaanand, @peterjones and All,

              Ah, I was a bit too slow and Peter just beats me ! Note that I used the same process than Peter to determine where the error occurs !

              @dr-ramaanand, you just did a small typo error in the regex that you provided !


              The correct regex, to match your text, is not that one, with a /s* syntax :

                                                                                                                                                                                                    V
              (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>/s*</div>\s*<div class="container">\s*<div class="left">
              

              but this one, with a correct \s* syntax :

                                                                                                                                                                                                    V
              (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*<p[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>\s*<span[^<>]*+>.+?</span>\s*</span>\s*</span>\s*</p>\s*</div>\s*</div>\s*<div class="container">\s*<div class="left">
              

              Remarks :

              • May be, it would be preferable to add a \s* syntax at the very end of your regex !

              • You could also simplify this regex, significantly, by using the version below :

              SEARCH (?s)<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">\s*

              BR

              guy038

              D 1 Reply Last reply Nov 14, 2024, 3:57 PM Reply Quote 1
              • D
                dr ramaanand @guy038
                last edited by Nov 14, 2024, 3:57 PM

                @guy038 I have more than one <div class="left">, so how do I make it stop searching after finding the first <div class="left"> ?

                1 Reply Last reply Reply Quote 0
                • G
                  guy038
                  last edited by Nov 14, 2024, 5:58 PM

                  Hello, @dr-ramaanand and All,

                  To solve this case, I would use the following regex S/R :

                  SEARCH (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">

                  REPLACE Whatever you want to !

                  Note that I did not add, this time, the \s* part at the end of the search regex.

                  Also notice the two lazy syntaxes ( .+? ), right after \A and right before \s*<div class="left">, in order to select only the first section s*<div style=.....\s*<div class="left">, only !

                  BR

                  guy038

                  D 1 Reply Last reply Nov 14, 2024, 8:20 PM Reply Quote 0
                  • D
                    dr ramaanand @guy038
                    last edited by dr ramaanand Nov 14, 2024, 8:38 PM Nov 14, 2024, 8:20 PM

                    @guy038 I used this as a sample:-

                    <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                    <div class="left">
                    <p class=MsoNormal><b><span style='font-size:13.5pt;line-height:115%;
                        font-family:"Verdana","sans-serif";color:red'>SYNONYMS </span></b>
                    </p>
                    <div class="left">
                    

                    Your Regular expression does not stop searching at the first occurrence of <div class="left">

                    D 1 Reply Last reply Nov 14, 2024, 8:45 PM Reply Quote 0
                    • D
                      dr ramaanand @dr ramaanand
                      last edited by Nov 14, 2024, 8:45 PM

                      @guy038 This RegEx helped stop searching as soon as it found a <p........>:-
                      (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.+?\s*<div class="left">(?=\s*+<p[^<>]*+>)

                      D 1 Reply Last reply Nov 14, 2024, 9:21 PM Reply Quote 0
                      • G
                        guy038
                        last edited by Nov 14, 2024, 9:21 PM

                        Hi, @dr-ramaanand and All,

                        Ah, of course, if you add a <div class="left"> line, right after the first <div style="..... line, it will not work !


                        So, given this INPUT text, pasted in a new tab:

                        <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                        <div class="left">
                        <p class=MsoNormal><b><span style='font-size:13.5pt;line-height:115%;
                            font-family:"Verdana","sans-serif";color:red'>SYNONYMS </span></b>
                        </p>
                        <div class="left">
                        

                        Simply, change the previous search regex by this new version :

                        (?s)\A.+?\R\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">

                        Note the différence : between #EBF4FB;">\s* and \s*<div class="left">, I changed the part .+? by .*?

                        I also slightly change the position of the \K feature


                        Ax expected, this new regex will match the two consecutive lines :

                        <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                        <div class="left">
                        

                        BR

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • D
                          dr ramaanand @dr ramaanand
                          last edited by Nov 14, 2024, 9:21 PM

                          @guy038 This RegEx: (?s)\A.+?\R\K\s*<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">.+(?=\s*+<div class="left">) would have stopped searching just before the second occurrence of <div class="left"> if the sample to be searched was like this:-

                          <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                          <div class="left">
                          <div class="left">
                          
                          1 Reply Last reply Reply Quote 0
                          • G
                            guy038
                            last edited by Nov 14, 2024, 9:27 PM

                            @dr-ramaanand,

                            Yes, your regex does match the same amount of text as my version but my regex seems more simple and logic !

                            BR

                            guy038

                            D 1 Reply Last reply Nov 14, 2024, 9:33 PM Reply Quote 0
                            • D
                              dr ramaanand @guy038
                              last edited by Nov 14, 2024, 9:33 PM

                              @guy038 d’accord, merci beaucoup!

                              D 1 Reply Last reply Nov 14, 2024, 9:39 PM Reply Quote 0
                              • D
                                dr ramaanand @dr ramaanand
                                last edited by Nov 14, 2024, 9:39 PM

                                @guy038 your last RegEx finds the first occurrence of <div class="left"> even if there is some other text above it. Lovely!

                                1 Reply Last reply Reply Quote 0
                                • G
                                  guy038
                                  last edited by Nov 14, 2024, 9:46 PM

                                  Hi, @dr-ramaanand and All,

                                  Again, I did not check all the possibilities before posting. Sorry for the NOISE !

                                  So, the right regex to use should be :

                                  (?s)\A.*?\s*\K<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">\s*.*?\s*<div class="left">


                                  This time, it will work if you pasted this text, in a new tab

                                  <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                                  <div class="left">
                                  <div class="left">
                                  

                                  But it will also works, if you pasted the following text, in a new tab

                                  
                                  First non-blank line
                                  second line
                                  
                                  Third line before the block to match
                                  
                                  <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">
                                  <div class="left">
                                  <div class="left">
                                  

                                  Best Regards,

                                  guy038

                                  D 1 Reply Last reply Nov 15, 2024, 4:16 AM Reply Quote 0
                                  • D
                                    dr ramaanand @guy038
                                    last edited by dr ramaanand Nov 15, 2024, 4:59 AM Nov 15, 2024, 4:16 AM

                                    @guy038 I am not sure if I am allowed to do it (as the solution was provided by you), so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

                                    1 Reply Last reply Reply Quote 0
                                    • G
                                      guy038
                                      last edited by guy038 Nov 16, 2024, 12:15 PM Nov 16, 2024, 12:09 PM

                                      Hello, @dr-ramaanand and All,

                                      You said in your previous post :

                                      … so I am requesting you to post the last Regular Expression you provided with the sample to be edited with a new heading, “How to find the first occurrence of a tag ?” so that people can search and find it online. Thank you!

                                      But, actually, my regex finds the first occurrence of the <div class="left"> tag, AFTER a first occurrence of the <div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;"> tag !


                                      So, to my mind, the correct way to match the first occurrence of a specific tag, in current file, is to use the generic regex :

                                      (?s-i)\A.*?\K<TAG Name(?: .*?)?>

                                      Just replace the generic TAG Name value with a valid HTML tag

                                      Note that, in case of the comment tag, replace the generic TAG Name, into the above regex, by the literal string !--.*?--


                                      Similarly, the correct way to match the last occurrence of a specific tag, in current file, is to use the generic regex :

                                      (?s-i)\A.*\K<TAG Name(?: .*?)?>

                                      BR

                                      guy038

                                      D 1 Reply Last reply Nov 17, 2024, 5:27 PM Reply Quote 0
                                      • D
                                        dr ramaanand @guy038
                                        last edited by Nov 17, 2024, 5:27 PM

                                        @guy038 said in Match everything except the text and <br> tags:

                                        (?s-i)\A.\K<TAG Name(?: .?)?>

                                        I think that that should be (?s-i)\A.*\K<TAG Name(?:.*?)?> with no spaces anywhere in the middle

                                        1 Reply Last reply Reply Quote 0
                                        • G
                                          guy038
                                          last edited by guy038 Nov 17, 2024, 5:55 PM Nov 17, 2024, 5:51 PM

                                          Hi, @dr-ramaanand and All,

                                          In order to use a valid INPUT text to do some tests, just open the main page of our forum. Then hit the Ctrl + U shortcut to open the HTML source page of our forum and paste its contents in a new tab


                                          My generic regex tries to match the syntax <TAG......, till the nearest > character and must be valid for any kind of tag.

                                          Thus, I prefer to insert a space char to verify that the tag is a valid one . Indeed, this regex will match, either, tags like <head> or for example <span style="color:blue">blue</span>

                                          If you replace the TAG Name in the generic regex (?s-i)\A.*?\K<TAG Name(?: .*?)?>, which matches the first tag, named TAG, in current file, you get, from the examples, the regexes :

                                          • (?s-i)\A.*?\K<head(?: .*?)?>

                                          • (?s-i)\A.*?\K<span(?: .*?)?>

                                          Just test them against the HTML code source of our forum


                                          Now, let’s suppose, for example, that you want to find out the first input ...> tag, AFTER the first img ......> tag, in the HTML code source of our forum :

                                          Then, from my previous post, you would have to use the following regex :

                                          (?s-i)\A.*?<img(?: .*?)?>.*?\K<input(?: .*?)?>

                                          which matches, as expected, the following line :

                                          <input autocomplete="off" type="text" class="form-control hidden" name="term" placeholder="Search"/>
                                          

                                          BR

                                          guy038

                                          P.S. : You also replied in an old post, regarding this extra space char. However, I’ll not reply because this topic is old and not exactly related to the present discussion !

                                          D 1 Reply Last reply Nov 17, 2024, 6:13 PM Reply Quote 0
                                          7 out of 21
                                          • First post
                                            7/21
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors