Regex: Find html tags that only contain numbers

Hellena Crainicu

3 cases:

My love is you
2 5 x 7
Love is me 5675

I need to find a regex as to find all html tags that only contain numbers.

The output find should be only: 2 5 x 7

My 2 regex expressions are almost good, but not good. Maybe someone give me a little help.

FIND: <.*(\d+.*)
or
FIND: <.*(\d+.*)(?!\w+)

Mark Olson

@Hellena-Crainicu
The title of this topic is inconsistent with the request in your first post. You say that you want to “find html tags that contain only numbers” and say you want to find only 2 5 x 7 in

<p>My love is you</p>
<p>2 5 x 7</p>
<p>Love is me 5675</p>

but there is an additional match that you don’t want (Love is me 5675) and unless you tell us why you don’t want it, we won’t be able to properly help you.

mkupper

@Hellena-Crainicu said in Regex: Find html tags that only contain numbers:

2 5 x 7

That contains both spaces and the letter x and thus is not something that only contain numbers.

An expression that only matches one or more numbers separated by white space would be something like (\s*\d)+\s*

(\s*\d)+ Allow for zero or more leading spaces followed by a decimal digit. The + lets the group repeat one or more times. This allows for a one single digit number on up to any number-of-digit numbers separated by any number or spaces.
\s* Allow for zero or more trailing spaces

That expression may have issues as \s is fairly liberal in what it accepts as a “space.” For example, it accepts Unicode no-break spaces, thin spaces, Ogham space marks, the Mongolian vowel separator, etc. A more restricted “space” may just be a plain \x20. You need decide what you accept as “space.”

If you want to support the letter “x” as a “number” then that is easy to add, if you can carefully define the rules for when “x” as a number is allowed. For example, is x by itself a number or not? Are x x and xx numbers or not?

I ignored the “HTML tags” part of the original message as HTML tags are irregular text that is next to impossible, and perhaps impossible, to fully parse using regular expressions. You can chose to define a limited set of things that are HTML tags that can be matched via regular expressions.

Hellena Crainicu

I believe I find the solution:

FIND: <[^>]*>(\d+(\.\d+)?(\s*\w*)*)<\/[^>]*>