Community
    • Login

    Regex: Find String in HTML Not at Line Start or Following </p>

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex html
    15 Posts 5 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mark OlsonM
      Mark Olson @dr ramaanand
      last edited by

      @dr-ramaanand said in Regex: Find String in HTML Not at Line Start or Following </p>:

      ^string_to_find(*SKIP)(*F)|<p>string_to_find(*SKIP)(*F)|string_to_find is a more, “easy to remember”

      I would never advocate using backtracking control verbs like (*SKIP) and (*F) when something else would suffice, unless the backtracking control approach is MUCH simpler, which this clearly is not. Very few regex implementations include backtracking control verbs, so you will usually need to rewrite this regex when you go somewhere else.

      1 Reply Last reply Reply Quote 0
      • Sylvester BullittS
        Sylvester Bullitt @dr ramaanand
        last edited by

        @dr-ramaanand As it turns out, we’ve discovered a number of additional “first on line” scenarios that we’ve since added to our negative lookbehinds:

        (?#Not 1st word in line)(?<!^)(?<!^<q>)(?<!^“)(?<!<p>)(?<!<p><q>)(?<!<p>“)(?<!<p class="chorus">)(?<!<br>)
        

        We’d be willing to consider other ways of doing this if there are simpler or more understandable techniques. However, I’ll be the first to admit I’m unfamiliar with the most of the new regex you sent. Could you explain what its components do? And where is the documentation for them located?

        dr ramaanandD Mark OlsonM 3 Replies Last reply Reply Quote 0
        • dr ramaanandD
          dr ramaanand @Sylvester Bullitt
          last edited by

          @Sylvester-Bullitt

          • https://community.notepad-plus-plus.org/post/55467
          • https://community.notepad-plus-plus.org/post/60429
          • https://community.notepad-plus-plus.org/topic/20432
          • https://community.notepad-plus-plus.org/post/64421
          • https://community.notepad-plus-plus.org/post/60332
          • https://community.notepad-plus-plus.org/post/60220
          1 Reply Last reply Reply Quote 0
          • Mark OlsonM
            Mark Olson @Sylvester Bullitt
            last edited by

            @Sylvester-Bullitt
            I default to RexEgg.com for most regex-related questions. It is an excellent resource.

            guy038 has also written a good explanation of backtracking control verbs.

            Sylvester BullittS 1 Reply Last reply Reply Quote 0
            • dr ramaanandD
              dr ramaanand @Sylvester Bullitt
              last edited by dr ramaanand

              @Sylvester-Bullitt The (SKIP) and (FAIL) method is easy because all that needs to be skipped should be on the left of (*SKIP)(*F)| and what needs to be found should be on its right. You can add all that you need to skip with a string_to_skip(*SKIP)(*F)| on the left

              1 Reply Last reply Reply Quote 0
              • Sylvester BullittS
                Sylvester Bullitt @Mark Olson
                last edited by

                @Mark-Olson Thanks!

                dr ramaanandD 1 Reply Last reply Reply Quote 0
                • dr ramaanandD
                  dr ramaanand @Sylvester Bullitt
                  last edited by dr ramaanand

                  @Sylvester-Bullitt
                  ^Achtung(*SKIP)(*F)|<p>Achtung(*SKIP)(*F)|Achtung will find the word Achtung except if it is at the beginning of the line or if it is preceded by a <p>
                  You may also use ^Achtung(*SKIP)(*F)|<p[^<>]*>Achtung(*SKIP)(*F)|<q>Achtung(*SKIP)(*F)|<p><q>Achtung(*SKIP)(*F)|Achtung which will skip every <p................................>, <q> and <p><q> if they are followed by the word Achtung but will find the word Achtung otherwise.

                  To skip <br> also, use the Regular expression ^Achtung(*SKIP)(*F)|<p[^<>]*>Achtung(*SKIP)(*F)|<q>Achtung(*SKIP)(*F)|<p><q>Achtung(*SKIP)(*F)|<br>(*SKIP)(*F)|Achtung

                  Sylvester BullittS 1 Reply Last reply Reply Quote 0
                  • Sylvester BullittS
                    Sylvester Bullitt @dr ramaanand
                    last edited by

                    @dr-ramaanand If I read this correctly, I’d have to repeat one of these (*SKIP)(*F) constructs for each of my current negative lookbehinds. So that would actually make the overall regex longer.

                    And Mark Olson makes a good point. It would also make our regex non-portable, which is a major consideration. We prefer to use grammar that is supported by the large majority of regex engines, when we have the choice.

                    dr ramaanandD 1 Reply Last reply Reply Quote 0
                    • dr ramaanandD
                      dr ramaanand @Sylvester Bullitt
                      last edited by

                      @Sylvester-Bullitt Please do whatever suits you. I am not commanding you to use the (SKIP)(FAIL) method only!

                      Sylvester BullittS 1 Reply Last reply Reply Quote 0
                      • Sylvester BullittS
                        Sylvester Bullitt @dr ramaanand
                        last edited by

                        @dr-ramaanand Understand. Thanks for your input and time!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors