• Login
Community
  • Login

Regex: Find those tags that contain a string, but which do not contain other string

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
9 Posts 3 Posters 631 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    rodica F
    last edited by Mar 18, 2022, 9:21 AM

    1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name</a> is beyond magic.</p>
        
    2. <p class="mb-40px">I love my home s< because I stay with my lovely cat.</p>
    
    3. <p class="mb-40px">Because of this book t< I cannot sleep well.</p>
    

    I want to find only the lines that have the operator < included in the html tag <p class=“mb-40px”> </p> , except those lines that have

    In my example above, the output should be line 2 (that have s< ) and line 3 ( that have t< )

    So, I use @guy032 generic formula: (REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))

    In my case FIND: (<p class=“mb-40px”>)+(.)+\K(\w<)(?s:(?=.*(</p>)))

    The problem is that my regex find also the e</a> from the first line. And I don’t wanna find the tags with </a>

    Maybe @guy038 have a better GENERIC for this kind of problem

    1 Reply Last reply Reply Quote -1
    • G
      guy038
      last edited by guy038 Mar 19, 2022, 12:27 PM Mar 19, 2022, 12:26 PM

      Hello, @rodica-f and All,

      Just consider this example :

      
      1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name </a> is z<beyond f< magic.</p>
          
      2. <p class="Test">I love my home s< because I stay with my b<lovely cat.</p>
      
      3. <p class="mb-40px">Because of this book t<I cannot a< sleep well.</p>
      

      Within this text :

      • Two tags begin with <p class="mb-40px"> and one begins with <p class="Test">

      • Each <p tag contains two < operators ( one followed with a space char, the other followed with a letter )


      So :

      • To find any <p... tag containing any string \w<, preceded with a space char, use the regex :

      SEARCH / MARK (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<

      • To find any <p... tag and containing any string \w<, preceded and followed with a space, use the regex :

      SEARCH / MARK (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)

      • To find the specific tag <p class="mb-40px"> containing any string \w<, preceded with a space char, use the regex :

      SEARCH / MARK (?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<

      • To find the specific tag <p class="mb-40px"> containing any string \w<, preceded and followed with a space char, use the regex :

      SEARCH / MARK (?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)

      Best Regards,

      guy038

      P.S. :

      BTW, no need to use a new profile. You’re certainly @robin-cruise !

      R 1 Reply Last reply Mar 19, 2022, 5:17 PM Reply Quote 1
      • R
        rodica F @guy038
        last edited by Mar 19, 2022, 5:17 PM

        @guy038 thanks for the solution.

        1 mobile account and 1 desktop account. No difference. It’s all about where you are at that time…

        R 1 Reply Last reply Mar 19, 2022, 5:45 PM Reply Quote 1
        • R
          rodica F @rodica F
          last edited by rodica F Mar 19, 2022, 5:45 PM Mar 19, 2022, 5:45 PM

          @guy038 So, the generic formulas for this kind of problem (contain a string, but doesn’t contain other string) should be this:

          (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)

          (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

          (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR

          (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)

          BSR (begin part) = <p class="mb-40px">
          ESR (end part) = </p>
          FR - (FIND Regex) = \w<

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by Mar 19, 2022, 6:03 PM

            Hi, @rodica-f and All,

            Just a remainder :

            • Don’t forget to move the caret to the very beginning of file, before running the regex, with the Ctrl + Home shortcut

            BR

            guy038

            R 1 Reply Last reply May 22, 2022, 8:12 PM Reply Quote 2
            • R
              Robin Cruise @guy038
              last edited by Robin Cruise May 22, 2022, 8:13 PM May 22, 2022, 8:12 PM

              @guy038 but what if I have the following case?

              Must use a regex as to find all lines which contain <p class="sd-23"> but does not contain the closing tag </p>

              <p class="sd-23">Somebody to love</p>
              
              <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen.
              
              <p class="sd-23">Holy Birth of God, have mercy on us!</p>
              

              my regex doesn’t work at all. It should have found the second line.

              FIND: (?<p class="sd-23">).*(?!</p>)

              1 Reply Last reply Reply Quote 0
              • G
                guy038
                last edited by guy038 May 23, 2022, 12:57 PM May 23, 2022, 12:52 PM

                Hello, @Robin-cruise and All,

                Well, not so difficult ! I assume that each line must end with the </p> tag and that you’re not speaking about any multi-lines block !


                Then, use the following regex in order to find out all the lines beginning with <p class="sd-23"> and not ending with the </p> tag :

                (?-i)\h*<p class="sd-23">((?!</p>).)*$

                For instance, using this four-lines text :

                p class="sd-23">Somebody to love</p>
                
                <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen.
                
                    <p class="sd-23">
                
                    <p class="sd-23">Holy Birth of God, have mercy on us!</p>
                

                The regex would select the entire lines 2 and 3 !


                Notes :

                • The regex finds, first, the string <p class="sd-23">, with this exact case, after possible leading blank characters

                • Then, it grasps all remaining text ( .* ) till the end of the current line ( $ )…

                • …But ONLY IF it does not meet the </p> tag at any position after <p class="sd-23">, till the end of current line

                Best Regards,

                guy038

                R 1 Reply Last reply May 23, 2022, 1:30 PM Reply Quote 1
                • R
                  Robin Cruise @guy038
                  last edited by Robin Cruise May 23, 2022, 1:30 PM May 23, 2022, 1:30 PM

                  @guy038 thanks ! but I don’t understand what does this doing:

                  (?-i)\h*

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 May 23, 2022, 4:53 PM May 23, 2022, 4:50 PM

                    Hi, @robin-cruise,

                    • The (?-i) part means that, from thiat point, the search will be sentitive to case. So, it will match the string <p class="sd-23">, but not, for instance, the string <P class="sd-23"> nor the string <p CLASS="sd-23"> !

                    • Then, the \h class character represents any horizontal blank character ( so, either, the \t [ Tabulation ] char or the \x20 [ space] char or the \xa0 character [No-breaking Space] char)

                    • Thus, the \h* syntax represents any range of horizontal blank chars, from 0 to n

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors