• Login
Community
  • Login

Find numbers greater than 101

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
12 Posts 4 Posters 3.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Scott Nielson
    last edited by Scott Nielson Sep 25, 2023, 3:04 AM Sep 25, 2023, 1:17 AM

    In another post (by someone else), I read that the Regular expression \b(\d{3,}|2[5-9]|[3-9]\d)\b will help find numbers 25 and greater in multiple files
    How do I find numbers greater than 101 but only if they are followed by px immediately in multiple files only after </h1> but not after <style> from this example:-

    </h1>
    Some text here
    width:15px
    width: 100%
    width=693px
    width: 105px
    width: 1529px
    <style>
    width:15px
    width: 100%
    width=693px
    width: 105px
    width: 1529px
    

    So, in the example above, only the first width=693px, width: 105px and width: 1529px should be found but nothing else

    S 1 Reply Last reply Sep 25, 2023, 2:51 AM Reply Quote 0
    • S
      Scott Nielson @Scott Nielson
      last edited by Scott Nielson Sep 25, 2023, 2:54 AM Sep 25, 2023, 2:51 AM

      @Scott-Nielson width.*(\d{3,}|10[1-9]\d)px(?=<style) doesn’t help stop searching after the <style

      M 1 Reply Last reply Sep 25, 2023, 4:31 AM Reply Quote 0
      • M
        Mark Olson @Scott Nielson
        last edited by Mark Olson Sep 25, 2023, 4:49 AM Sep 25, 2023, 4:31 AM

        @Scott-Nielson
        I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.

        PythonScript can also help with this kind of thing.

        As far as a regex that matches integers greater than 101 (never mind floating-point numbers like 1e3), I would say:
        (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})
        because this covers the cases:

        • no leading - sign ((?<!-)
        • 102-109 (10[2-9])
        • any number from 110-199 (1[1-9]\d)
        • any number from 200 to 999 ([2-9]\d{2})
        • any number greater than 1000 ([1-9]\d{3,})

        but I haven’t tested that regex and have no intention of doing so because that is precisely the sort of thing that you shouldn’t be using regular expressions for.

        Oh and BTW width.*(\d{3,}|10[1-9]\d)px(?=<style) is probably giving you trouble because it includes .*, which will happily consume characters until the next newline, or until end of file if you have . matches newline selected. I would use width\h*[=:]\h* instead, which matches width followed by any amount of non-newline whitespace (\h*), a colon or equals sign, and then any amount of non-newline whitespace.

        And if you continue to have trouble parsing HTML with regular expressions, and you start to question your sanity, read this .

        S C 3 Replies Last reply Sep 25, 2023, 5:06 AM Reply Quote 0
        • S
          Scott Nielson @Mark Olson
          last edited by Sep 25, 2023, 5:06 AM

          @Mark-Olson (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,}) matches even the numbers after the <style. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after <style before I start questioning my sanity!

          C 1 Reply Last reply Sep 25, 2023, 5:21 AM Reply Quote 0
          • C
            Coises @Scott Nielson
            last edited by Sep 25, 2023, 5:21 AM

            @Scott-Nielson said in Find numbers greater than 101:

            @Mark-Olson (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,}) matches even the numbers after the <style. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after <style before I start questioning my sanity!

            I’m pretty sure that what you are trying to do cannot be done with a regular expression. The reason is that you’re trying to find something such that the last tag preceding it satisfies a certain criteria (is “</h1>”), but the length of text between the tag and the thing for which you are searching is variable in length. Lookbacks have to be fixed length.

            That’s not to say there is no way to solve the problem, but it will have to be approached from some other direction. It might help if we know what you want to do with the numbers after you find them. (For example, if you’re replacing them with something else, it might be possible to match the entire string from the tag to the number and use a capture group in the replacement string. Applying that repeatedly could work if what you’re changing the numbers to is no longer a number greater than 101.)

            1 Reply Last reply Reply Quote 1
            • C
              Coises @Mark Olson
              last edited by Coises Sep 25, 2023, 5:48 AM Sep 25, 2023, 5:47 AM

              @Mark-Olson said in Find numbers greater than 101:

              I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.

              Columns++ can form parts of the replacement string by manipulating matches and capture groups which it can parse as numbers, but (so far) it doesn’t do anything special with the matching itself. (It would be possible to match numbers and then make the replacement conditional on the value of the number, so that only numbers greater than 101 are changed, but the original poster asked to find the numbers, not to change them.)

              Columns++ can search in marked regions, which might help here because the thing that makes this so difficult is trying to find the numbers only in certain contexts. Notepad++ can mark what it finds (e.g., </h1>[^<]*), but it has no way to then search within the marked regions; Columns++ could do that.

              But since the original poster said he needs to do this in multiple files, and Columns++ only works on the current tab — no “all open documents” or “find in files” functions exist — it seems that won’t help, either. If there is a plugin that can search in marked regions and do multiple files at once, that would probably work.

              S 1 Reply Last reply Sep 25, 2023, 6:51 AM Reply Quote 0
              • S
                Scott Nielson @Coises
                last edited by Sep 25, 2023, 6:51 AM

                @Coises Forget the </h1>. I think something like width\h*[=:]\h*(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})px(?=<style) should be tweaked a bit to stop matching/searching after <style

                S 1 Reply Last reply Sep 25, 2023, 7:15 AM Reply Quote 0
                • S
                  Scott Nielson @Scott Nielson
                  last edited by Sep 25, 2023, 7:15 AM

                  @Scott-Nielson Someone at www.regex101.com gave me this RegEx as a solution, provided there is a closing </style> tag: <style(?:[^<]+|<(?!\/style))*+</style>(*SKIP)(*F)|width[:=]\h*(?!10[01]px)\d{3,}px and it is perfect!

                  S 1 Reply Last reply Sep 25, 2023, 7:23 AM Reply Quote 0
                  • S
                    Scott Nielson @Scott Nielson
                    last edited by Scott Nielson Sep 25, 2023, 7:30 AM Sep 25, 2023, 7:23 AM

                    @Scott-Nielson Just so that others may also understand, the test string used was tweaked to this:

                    Some text here
                    width:15px
                    width: 100%
                    width=693px
                    width: 105px
                    width: 1529px
                    <style>
                    width:15px
                    width: 100%
                    width=693px
                    width: 105px
                    width: 1529px
                    </style>
                    width: 105px
                    

                    Everything between <style> and </style> is skipped but everywhere else, the digits between “width:/width=” and the “px” are matched

                    1 Reply Last reply Reply Quote 0
                    • S
                      Scott Nielson @Mark Olson
                      last edited by Sep 25, 2023, 7:35 AM

                      @Mark-Olson Thanks!

                      1 Reply Last reply Reply Quote 0
                      • G
                        guy038
                        last edited by Sep 25, 2023, 10:09 AM

                        Hello @scott-nielson, @mark-olson, @coises and All,

                        Scott, your regex can even be simplified as below :

                        • SEARCH (?s)<style.*?</style>(*SKIP)(*F)|(?-i)width[:=]\h*(?!10[01])\d{3,}px

                        or

                        • SEARCH (?xs) <style .*? </style> (*SKIP) (*F) | (?-i) width [:=] \h* (?! 10[01] ) \d{3,} px if you prefer to use the free-spacing mode (?x)

                        Just test it against the following text :

                        Some text here
                        width:15px
                        width: 100%
                        width=693px
                        width: 105px
                        width: 1529px
                        <style>
                        width:15px
                        width: 100%
                        width=693px
                        width: 105px
                        width: 1529px
                        </style>
                        width: 105px
                        width:47px
                        width: 100%
                        width=100px
                        width: 105px
                        width= 99%
                        width= 100%
                        width= 101%
                        width= 102%
                        width: 789px
                        <style>
                        width:1px
                        width: 50%
                        width=234px
                        width: 101px
                        width: 1000px
                        </style>
                        width: 10px
                        width: 110px
                        width= 99px
                        width: 200px
                        

                        Globally, this special syntax, using two backtracking Control verbs means :

                        <What I don't want>(*SKIP)(*F)|<What I want>

                        Refer to this post for further explanations on these special verbs :

                        https://community.notepad-plus-plus.org/post/55464

                        Best Regards,

                        guy038

                        S 1 Reply Last reply Sep 25, 2023, 10:16 AM Reply Quote 2
                        • S
                          Scott Nielson @guy038
                          last edited by Sep 25, 2023, 10:16 AM

                          OK, thanks @guy038

                          1 Reply Last reply Reply Quote 1
                          1 out of 12
                          • First post
                            1/12
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors