Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Regex: Find those tags that contain a string, but which do not contain other string

    Help wanted · · · – – – · · ·
    3
    9
    208
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • rodica F
      rodica F last edited by

      1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name</a> is beyond magic.</p>
          
      2. <p class="mb-40px">I love my home s< because I stay with my lovely cat.</p>
      
      3. <p class="mb-40px">Because of this book t< I cannot sleep well.</p>
      

      I want to find only the lines that have the operator < included in the html tag <p class=“mb-40px”> </p> , except those lines that have

      In my example above, the output should be line 2 (that have s< ) and line 3 ( that have t< )

      So, I use @guy032 generic formula: (REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))

      In my case FIND: (<p class=“mb-40px”>)+(.)+\K(\w<)(?s:(?=.*(</p>)))

      The problem is that my regex find also the e</a> from the first line. And I don’t wanna find the tags with </a>

      Maybe @guy038 have a better GENERIC for this kind of problem

      1 Reply Last reply Reply Quote -1
      • guy038
        guy038 last edited by guy038

        Hello, @rodica-f and All,

        Just consider this example :

        
        1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name </a> is z<beyond f< magic.</p>
            
        2. <p class="Test">I love my home s< because I stay with my b<lovely cat.</p>
        
        3. <p class="mb-40px">Because of this book t<I cannot a< sleep well.</p>
        

        Within this text :

        • Two tags begin with <p class="mb-40px"> and one begins with <p class="Test">

        • Each <p tag contains two < operators ( one followed with a space char, the other followed with a letter )


        So :

        • To find any <p... tag containing any string \w<, preceded with a space char, use the regex :

        SEARCH / MARK (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<

        • To find any <p... tag and containing any string \w<, preceded and followed with a space, use the regex :

        SEARCH / MARK (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)

        • To find the specific tag <p class="mb-40px"> containing any string \w<, preceded with a space char, use the regex :

        SEARCH / MARK (?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<

        • To find the specific tag <p class="mb-40px"> containing any string \w<, preceded and followed with a space char, use the regex :

        SEARCH / MARK (?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)

        Best Regards,

        guy038

        P.S. :

        BTW, no need to use a new profile. You’re certainly @robin-cruise !

        rodica F 1 Reply Last reply Reply Quote 1
        • rodica F
          rodica F @guy038 last edited by

          @guy038 thanks for the solution.

          1 mobile account and 1 desktop account. No difference. It’s all about where you are at that time…

          rodica F 1 Reply Last reply Reply Quote 1
          • rodica F
            rodica F @rodica F last edited by rodica F

            @guy038 So, the generic formulas for this kind of problem (contain a string, but doesn’t contain other string) should be this:

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR

            (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

            BSR (begin part) = <p class="mb-40px">
            ESR (end part) = </p>
            FR - (FIND Regex) = \w<

            1 Reply Last reply Reply Quote 0
            • guy038
              guy038 last edited by

              Hi, @rodica-f and All,

              Just a remainder :

              • Don’t forget to move the caret to the very beginning of file, before running the regex, with the Ctrl + Home shortcut

              BR

              guy038

              Robin Cruise 1 Reply Last reply Reply Quote 2
              • Robin Cruise
                Robin Cruise @guy038 last edited by Robin Cruise

                @guy038 but what if I have the following case?

                Must use a regex as to find all lines which contain <p class="sd-23"> but does not contain the closing tag </p>

                <p class="sd-23">Somebody to love</p>
                
                <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen.
                
                <p class="sd-23">Holy Birth of God, have mercy on us!</p>
                

                my regex doesn’t work at all. It should have found the second line.

                FIND: (?<p class="sd-23">).*(?!</p>)

                1 Reply Last reply Reply Quote 0
                • guy038
                  guy038 last edited by guy038

                  Hello, @Robin-cruise and All,

                  Well, not so difficult ! I assume that each line must end with the </p> tag and that you’re not speaking about any multi-lines block !


                  Then, use the following regex in order to find out all the lines beginning with <p class="sd-23"> and not ending with the </p> tag :

                  (?-i)\h*<p class="sd-23">((?!</p>).)*$

                  For instance, using this four-lines text :

                  p class="sd-23">Somebody to love</p>
                  
                  <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen.
                  
                      <p class="sd-23">
                  
                      <p class="sd-23">Holy Birth of God, have mercy on us!</p>
                  

                  The regex would select the entire lines 2 and 3 !


                  Notes :

                  • The regex finds, first, the string <p class="sd-23">, with this exact case, after possible leading blank characters

                  • Then, it grasps all remaining text ( .* ) till the end of the current line ( $ )…

                  • …But ONLY IF it does not meet the </p> tag at any position after <p class="sd-23">, till the end of current line

                  Best Regards,

                  guy038

                  Robin Cruise 1 Reply Last reply Reply Quote 1
                  • Robin Cruise
                    Robin Cruise @guy038 last edited by Robin Cruise

                    @guy038 thanks ! but I don’t understand what does this doing:

                    (?-i)\h*

                    1 Reply Last reply Reply Quote 0
                    • guy038
                      guy038 last edited by guy038

                      Hi, @robin-cruise,

                      • The (?-i) part means that, from thiat point, the search will be sentitive to case. So, it will match the string <p class="sd-23">, but not, for instance, the string <P class="sd-23"> nor the string <p CLASS="sd-23"> !

                      • Then, the \h class character represents any horizontal blank character ( so, either, the \t [ Tabulation ] char or the \x20 [ space] char or the \xa0 character [No-breaking Space] char)

                      • Thus, the \h* syntax represents any range of horizontal blank chars, from 0 to n

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 1
                      • First post
                        Last post
                      Copyright © 2014 NodeBB Forums | Contributors