• Login
Community
  • Login

Find words stuck to a bracket or parenthesis but skip those in the CSS and meta tags

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
13 Posts 2 Posters 704 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    dr ramaanand @dr ramaanand
    last edited by dr ramaanand Dec 7, 2024, 3:36 PM Dec 7, 2024, 3:22 PM

    At regex101.com , it finds 21 matches [although, if it skips what is in the CSS section and meta tags, it should find just one match - Sulph(Brimstone)] - see https://regex101.com/r/h9S2PI/1 but on Notepad++, it selects everything from the top to the bottom instead of skipping what I want it to

    1 Reply Last reply Reply Quote 0
    • D
      dr ramaanand @dr ramaanand
      last edited by Dec 7, 2024, 3:39 PM

      @dr-ramaanand said in Find words stuck to a bracket or parenthesis but skip those in the CSS and meta tags:

      (<style[\S\s]*?</style>)(*SKIP)(F)|(<script[\S\s]?</script>)(*SKIP)(*F)|

      does not seem to skip what is between <style............ and <\/style>) and what is between <script............ and <\/script>)

      1 Reply Last reply Reply Quote 0
      • M
        Mark Olson
        last edited by Dec 7, 2024, 6:09 PM

        @dr-ramaanand
        First of all, this seems like a really obvious use case for an XML parser and not regular expressions (even more so than other examples where people are trying to parse XML with regex). Just parse the XML, recursively search the elements, and extract words in a bracket or parenthesis only if the tag name is not CSS or meta. Yes, I know you’re not going to listen to me because nobody ever listens to me when I give this recommendation, but I’m going to keep banging my head against that wall because I know I’m right.

        I’m not going to try to analyze your regexes because they’re godawful unhinged enormous monstrosities that should really be making you reconsider your life choices, but I have a more meta question for you: is there any particular reason why you’re doing x(*SKIP)(*F)|y(*SKIP)(*F)|z(*SKIP)(*F)|a instead of the more concise (?:x|y|z)(*SKIP)(*F)|a ?

        D 1 Reply Last reply Dec 7, 2024, 6:22 PM Reply Quote 1
        • D
          dr ramaanand @Mark Olson
          last edited by dr ramaanand Dec 7, 2024, 6:26 PM Dec 7, 2024, 6:22 PM

          @Mark-Olson Sorry, there is nothing in common to club either the first string or the last string

          D 1 Reply Last reply Dec 7, 2024, 6:37 PM Reply Quote 0
          • D
            dr ramaanand @dr ramaanand
            last edited by dr ramaanand Dec 7, 2024, 6:48 PM Dec 7, 2024, 6:37 PM

            @Mark-Olson The regular expression (?:<html[\S\s]*?<\/h1>|<p[^>]*>[\S\s]*?uses\]<\/p>|<[\S\s]*?>|<style[^<>]*>[\S\s]*?<\/style>|<script[^<>]*>[\S\s]*?<\/script>)(*SKIP)(*F)|\w+\( also finds 21 matches, without skipping what is between <style......> and </style> and <script......> and </script> in the block for testing typed at the top

            D M 2 Replies Last reply Dec 7, 2024, 7:00 PM Reply Quote 0
            • D
              dr ramaanand @dr ramaanand
              last edited by Dec 7, 2024, 7:00 PM

              @Mark-Olson (?:<html[\S\s]*?<\/h1>|<p[^>]*>[\S\s]*?uses\]<\/p>|<[\S\s]*?>|style[^>]*>[^<>]*<\/style>|script[^>]*>[^<>]*<\/script>)(*SKIP)(*F)|\w+\( is an invalid expression

              D 1 Reply Last reply Dec 7, 2024, 7:09 PM Reply Quote 0
              • D
                dr ramaanand @dr ramaanand
                last edited by dr ramaanand Dec 7, 2024, 7:11 PM Dec 7, 2024, 7:09 PM

                @Mark-Olson (?:<html[\S\s]*?<\/h1>|<p[^>]*>[\S\s]*?uses\]<\/p>|<[\S\s]*?>|style[^>]*>[^<>][\S\s]*<\/style>|script[^>]*>[^<>][\S\s]*<\/script>)(*SKIP)(*F)|\w+\( skips what is between <style......> and </style> but not what is between <script......> and </script> - can you tweak that regular expression to skip what comes between <script......> and </script> ?

                1 Reply Last reply Reply Quote 0
                • M
                  Mark Olson @dr ramaanand
                  last edited by Dec 7, 2024, 7:18 PM

                  @dr-ramaanand said

                  @Mark-Olson The regular expression some...unhinged...monstrosity also finds 21 matches, blah blah blah

                  You misunderstood me. (?:x|y|z)(*SKIP)(*F)|a is exactly equivalent to x(*SKIP)(*F)|y(*SKIP)(*F)|z(*SKIP)(*F)|a except that it’s shorter. My proposal wasn’t trying to make your regex more correct, it was just trying to make it shorter without making it less correct.

                  @dr-ramaanand said

                  please help me

                  another request

                  yet another request

                  Look, please stop mentioning me. I am unfollowing this post because it’s cognitively draining just reading it. I’m done trying to help you with regular expressions. You are trying to use a hammer (regular expressions) to solve a problem that requires a screwdriver (XML). XML is a recursive data structure or context free language (for example, a div can contain a div, which can contain another div, and on and on and on…), and regular expressions are not suitable for such languages. Technically Notepad++'s Boost regex flavor can be used to handle such languages, but making a Notepad++ regex that properly handles all the subtleties of XML would make you cry tears of blood.

                  You are going to have a miserable time working with XML (and its little brother, HTML) until you learn how to use a real scripting language like Python. This is your final warning. I’m not trying to punish you, just trying to preserve my own sanity.

                  D 1 Reply Last reply Dec 7, 2024, 7:23 PM Reply Quote 3
                  • D
                    dr ramaanand @Mark Olson
                    last edited by dr ramaanand Dec 7, 2024, 7:52 PM Dec 7, 2024, 7:23 PM

                    This RegEx (?:<html[\S\s]*?<\/h1>|<p[^>]*>[\S\s]*?uses\]<\/p>|<[\S\s]*?>|style[^>]*>[^<>][\S\s]*<\/script>)(*SKIP)(*F)|\w+\( seems fine. It finds every word stuck to a simple/round bracket or parenthesis to its right but skips what comes between the <html........</h1>, <p...... uses]</p>, <.........>, as well as style.......... </script>

                    D 1 Reply Last reply Dec 8, 2024, 10:01 AM Reply Quote 0
                    • D
                      dr ramaanand @dr ramaanand
                      last edited by Dec 8, 2024, 10:01 AM

                      Sorry, the Regular expression (?:<html[\S\s]*?<\/h1>|<p[^>]*>[\S\s]*?uses\]<\/p>|<[\S\s]*?>|style[^>]*>[^<>][\S\s]*<\/script>)(*SKIP)(*F)|\w+\( finds every word stuck to a simple/round bracket or parenthesis to its right but skips what comes between the <html........ and </h1>, <p...... and uses]</p>, <......... and >, as well as style.......... and </script>

                      D 1 Reply Last reply Dec 9, 2024, 2:25 AM Reply Quote 0
                      • D
                        dr ramaanand @dr ramaanand
                        last edited by Dec 9, 2024, 2:25 AM

                        On using the above RegEx, in 80 files out of 300 files in a folder, the whole text of the file was selected, instead of just a word stuck to a simple, round bracket/parenthesis to its right. Is it because no such words were found in the file (only 2 such words were found in two different files)?

                        1 Reply Last reply Reply Quote 0
                        12 out of 13
                        • First post
                          12/13
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors