Find numbers greater than 101
-
@Scott-Nielson
width.*(\d{3,}|10[1-9]\d)px(?=<style)
doesn’t help stop searching after the<style
-
@Scott-Nielson
I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.PythonScript can also help with this kind of thing.
As far as a regex that matches integers greater than 101 (never mind floating-point numbers like
1e3
), I would say:
(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})
because this covers the cases:- no leading - sign (
(?<!-
) - 102-109 (
10[2-9]
) - any number from 110-199 (
1[1-9]\d
) - any number from 200 to 999 (
[2-9]\d{2}
) - any number greater than 1000 (
[1-9]\d{3,}
)
but I haven’t tested that regex and have no intention of doing so because that is precisely the sort of thing that you shouldn’t be using regular expressions for.
Oh and BTW
width.*(\d{3,}|10[1-9]\d)px(?=<style)
is probably giving you trouble because it includes.*
, which will happily consume characters until the next newline, or until end of file if you have. matches newline
selected. I would usewidth\h*[=:]\h*
instead, which matcheswidth
followed by any amount of non-newline whitespace (\h*
), a colon or equals sign, and then any amount of non-newline whitespace.And if you continue to have trouble parsing HTML with regular expressions, and you start to question your sanity, read this.
- no leading - sign (
-
@Mark-Olson
(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})
matches even the numbers after the<style
. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after<style
before I start questioning my sanity! -
@Scott-Nielson said in Find numbers greater than 101:
@Mark-Olson
(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})
matches even the numbers after the<style
. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after<style
before I start questioning my sanity!I’m pretty sure that what you are trying to do cannot be done with a regular expression. The reason is that you’re trying to find something such that the last tag preceding it satisfies a certain criteria (is “</h1>”), but the length of text between the tag and the thing for which you are searching is variable in length. Lookbacks have to be fixed length.
That’s not to say there is no way to solve the problem, but it will have to be approached from some other direction. It might help if we know what you want to do with the numbers after you find them. (For example, if you’re replacing them with something else, it might be possible to match the entire string from the tag to the number and use a capture group in the replacement string. Applying that repeatedly could work if what you’re changing the numbers to is no longer a number greater than 101.)
-
@Mark-Olson said in Find numbers greater than 101:
I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.
Columns++ can form parts of the replacement string by manipulating matches and capture groups which it can parse as numbers, but (so far) it doesn’t do anything special with the matching itself. (It would be possible to match numbers and then make the replacement conditional on the value of the number, so that only numbers greater than 101 are changed, but the original poster asked to find the numbers, not to change them.)
Columns++ can search in marked regions, which might help here because the thing that makes this so difficult is trying to find the numbers only in certain contexts. Notepad++ can mark what it finds (e.g., </h1>[^<]*), but it has no way to then search within the marked regions; Columns++ could do that.
But since the original poster said he needs to do this in multiple files, and Columns++ only works on the current tab — no “all open documents” or “find in files” functions exist — it seems that won’t help, either. If there is a plugin that can search in marked regions and do multiple files at once, that would probably work.
-
@Coises Forget the
</h1>
. I think something likewidth\h*[=:]\h*(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})px(?=<style)
should be tweaked a bit to stop matching/searching after<style
-
@Scott-Nielson Someone at www.regex101.com gave me this RegEx as a solution, provided there is a closing
</style>
tag:<style(?:[^<]+|<(?!\/style))*+</style>(*SKIP)(*F)|width[:=]\h*(?!10[01]px)\d{3,}px
and it is perfect! -
@Scott-Nielson Just so that others may also understand, the test string used was tweaked to this:
Some text here width:15px width: 100% width=693px width: 105px width: 1529px <style> width:15px width: 100% width=693px width: 105px width: 1529px </style> width: 105px
Everything between
<style>
and</style>
is skipped but everywhere else, the digits between “width:/width=” and the “px” are matched -
@Mark-Olson Thanks!
-
Hello @scott-nielson, @mark-olson, @coises and All,
Scott, your regex can even be simplified as below :
- SEARCH
(?s)<style.*?</style>(*SKIP)(*F)|(?-i)width[:=]\h*(?!10[01])\d{3,}px
or
- SEARCH
(?xs) <style .*? </style> (*SKIP) (*F) | (?-i) width [:=] \h* (?! 10[01] ) \d{3,} px
if you prefer to use the free-spacing mode(?x)
Just test it against the following text :
Some text here width:15px width: 100% width=693px width: 105px width: 1529px <style> width:15px width: 100% width=693px width: 105px width: 1529px </style> width: 105px width:47px width: 100% width=100px width: 105px width= 99% width= 100% width= 101% width= 102% width: 789px <style> width:1px width: 50% width=234px width: 101px width: 1000px </style> width: 10px width: 110px width= 99px width: 200px
Globally, this special syntax, using two backtracking Control verbs means :
<What I don't want>(*SKIP)(*F)|<What I want>
Refer to this post for further explanations on these special verbs :
https://community.notepad-plus-plus.org/post/55464
Best Regards,
guy038
- SEARCH
-
OK, thanks @guy038