Community
    • Login

    Regex: Find html tags that only contain numbers

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 448 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH
      Hellena Crainicu
      last edited by

      3 cases:

      <p>My love is you</p>
      <p>2 5 x 7</p>
      <p>Love is me 5675</p>

      I need to find a regex as to find all html tags that only contain numbers.

      The output find should be only: <p>2 5 x 7</p>

      My 2 regex expressions are almost good, but not good. Maybe someone give me a little help.

      FIND: <.*(\d+.*)</p>
      or
      FIND: <.*(\d+.*)(?!\w+)</p>

      Mark OlsonM mkupperM Hellena CrainicuH 3 Replies Last reply Reply Quote 0
      • Mark OlsonM
        Mark Olson @Hellena Crainicu
        last edited by

        @Hellena-Crainicu
        The title of this topic is inconsistent with the request in your first post. You say that you want to “find html tags that contain only numbers” and say you want to find only <p>2 5 x 7</p> in

        <p>My love is you</p>
        <p>2 5 x 7</p>
        <p>Love is me 5675</p>
        

        but there is an additional match that you don’t want (<p>Love is me 5675</p>) and unless you tell us why you don’t want it, we won’t be able to properly help you.

        1 Reply Last reply Reply Quote 0
        • mkupperM
          mkupper @Hellena Crainicu
          last edited by mkupper

          @Hellena-Crainicu said in Regex: Find html tags that only contain numbers:

          <p>2 5 x 7</p>

          That contains both spaces and the letter x and thus is not something that only contain numbers.

          An expression that only matches one or more numbers separated by white space would be something like (\s*\d)+\s*

          • (\s*\d)+ Allow for zero or more leading spaces followed by a decimal digit. The + lets the group repeat one or more times. This allows for a one single digit number on up to any number-of-digit numbers separated by any number or spaces.
          • \s* Allow for zero or more trailing spaces

          That expression may have issues as \s is fairly liberal in what it accepts as a “space.” For example, it accepts Unicode no-break spaces, thin spaces, Ogham space marks, the Mongolian vowel separator, etc. A more restricted “space” may just be a plain \x20. You need decide what you accept as “space.”

          If you want to support the letter “x” as a “number” then that is easy to add, if you can carefully define the rules for when “x” as a number is allowed. For example, is x by itself a number or not? Are x x and xx numbers or not?

          I ignored the “HTML tags” part of the original message as HTML tags are irregular text that is next to impossible, and perhaps impossible, to fully parse using regular expressions. You can chose to define a limited set of things that are HTML tags that can be matched via regular expressions.

          1 Reply Last reply Reply Quote 1
          • Hellena CrainicuH
            Hellena Crainicu @Hellena Crainicu
            last edited by Hellena Crainicu

            I believe I find the solution:

            FIND: <[^>]*>(\d+(\.\d+)?(\s*\w*)*)<\/[^>]*>

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors