• Login
Community
  • Login

Regex: Find that words that have html, but doesn't have .dot before html.

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
6 Posts 3 Posters 344 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H
    Hellena Crainicu
    last edited by Nov 10, 2021, 11:48 AM

    hello, I need to find that words that have html, but doesn’t have .dot before html. See the lines below:

    1. I go home now.html" and then
    2. Take me with you nowhtml" and then
    3. asfa asdfdsa
    4. <!DOCTYPE html>
    5. <!--<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" >
    6. <html lang="en-US" xmlns:og="schema/">

    The Output should be the line 2, because is contains WORD+HTML, but without .HTML:

    Take me with you nowhtml" and then

    I made a regex, but is not very good, because it finds all lines that doesn’t have .HTML

    SEARCH: ^(?=.*html)(?:(?!\w+\.html).)+$

    A 1 Reply Last reply Nov 10, 2021, 12:31 PM Reply Quote 0
    • A
      Alan Kilborn @Hellena Crainicu
      last edited by Nov 10, 2021, 12:31 PM

      @hellena-crainicu

      You seem to ask a lot of regex questions.
      You’d be better served by really studying regex and solving your own problems.
      Likely soon this forum will quit just handing you the answers.

      But…

      You might try (?-s)^.*?\whtml.*, but it will match line 5 in addition to line 2. If this is not desired, you have to be more specific about what you need.

      H 1 Reply Last reply Nov 10, 2021, 12:36 PM Reply Quote 0
      • H
        Hellena Crainicu @Alan Kilborn
        last edited by Nov 10, 2021, 12:36 PM

        @alan-kilborn

        (?-s)^.*?\whtml.*

        your regex is almost good, but it finds 2 lines, instead of one:

        Take me with you nowhtml" and then

        and

        <!--<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" >

        THE OUTPUT must be only:

        Take me with you nowhtml" and then

        p.s. I ask, because my solution is not to good, and I am not an expert on regex. But as you can see, is not easy to find the solution on this problem.

        A 1 Reply Last reply Nov 10, 2021, 12:50 PM Reply Quote 0
        • A
          Alan Kilborn @Hellena Crainicu
          last edited by Alan Kilborn Nov 10, 2021, 12:50 PM Nov 10, 2021, 12:50 PM

          @hellena-crainicu said in Regex: Find that words that have html, but doesn't have .dot before html.:

          your regex is almost good, but it finds 2 lines, instead of one:

          Yes. Did you read where I wrote?:

          it will match line 5 in addition to line 2. If this is not desired, you have to be more specific about what you need

          1 Reply Last reply Reply Quote 0
          • R
            Robin Cruise
            last edited by Robin Cruise Nov 10, 2021, 8:17 PM Nov 10, 2021, 8:17 PM

            try this regex:

            SEARCH: (?:^|\h)\w+html" This will match your request.

            but, also, can be a 6 case which you missed. Suppose you have a link such as: https://mywebsite.com/prince-is-my-fatherhtml

            So, in this case, the [dot] before html is missing. But I don’t know how to handle this situation…

            R 1 Reply Last reply Nov 10, 2021, 9:35 PM Reply Quote 1
            • R
              Robin Cruise @Robin Cruise
              last edited by Nov 10, 2021, 9:35 PM

              ok, try this solution, very good for all your example and for the 6’ case:

              SEARCH: ^(?=.*https://)(?:(?!\.html).)+$

              1 Reply Last reply Reply Quote 0
              5 out of 6
              • First post
                5/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors