Community
    • Login

    Find numbers greater than 101

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 4 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott NielsonS
      Scott Nielson
      last edited by Scott Nielson

      In another post (by someone else), I read that the Regular expression \b(\d{3,}|2[5-9]|[3-9]\d)\b will help find numbers 25 and greater in multiple files
      How do I find numbers greater than 101 but only if they are followed by px immediately in multiple files only after </h1> but not after <style> from this example:-

      </h1>
      Some text here
      width:15px
      width: 100%
      width=693px
      width: 105px
      width: 1529px
      <style>
      width:15px
      width: 100%
      width=693px
      width: 105px
      width: 1529px
      

      So, in the example above, only the first width=693px, width: 105px and width: 1529px should be found but nothing else

      Scott NielsonS 1 Reply Last reply Reply Quote 0
      • Scott NielsonS
        Scott Nielson @Scott Nielson
        last edited by Scott Nielson

        @Scott-Nielson width.*(\d{3,}|10[1-9]\d)px(?=<style) doesn’t help stop searching after the <style

        Mark OlsonM 1 Reply Last reply Reply Quote 0
        • Mark OlsonM
          Mark Olson @Scott Nielson
          last edited by Mark Olson

          @Scott-Nielson
          I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.

          PythonScript can also help with this kind of thing.

          As far as a regex that matches integers greater than 101 (never mind floating-point numbers like 1e3), I would say:
          (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})
          because this covers the cases:

          • no leading - sign ((?<!-)
          • 102-109 (10[2-9])
          • any number from 110-199 (1[1-9]\d)
          • any number from 200 to 999 ([2-9]\d{2})
          • any number greater than 1000 ([1-9]\d{3,})

          but I haven’t tested that regex and have no intention of doing so because that is precisely the sort of thing that you shouldn’t be using regular expressions for.

          Oh and BTW width.*(\d{3,}|10[1-9]\d)px(?=<style) is probably giving you trouble because it includes .*, which will happily consume characters until the next newline, or until end of file if you have . matches newline selected. I would use width\h*[=:]\h* instead, which matches width followed by any amount of non-newline whitespace (\h*), a colon or equals sign, and then any amount of non-newline whitespace.

          And if you continue to have trouble parsing HTML with regular expressions, and you start to question your sanity, read this.

          Scott NielsonS CoisesC 3 Replies Last reply Reply Quote 0
          • Scott NielsonS
            Scott Nielson @Mark Olson
            last edited by

            @Mark-Olson (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,}) matches even the numbers after the <style. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after <style before I start questioning my sanity!

            CoisesC 1 Reply Last reply Reply Quote 0
            • CoisesC
              Coises @Scott Nielson
              last edited by

              @Scott-Nielson said in Find numbers greater than 101:

              @Mark-Olson (?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,}) matches even the numbers after the <style. Please give me a RegEx that finds 3-4 digit numerals between width and px and stops searching after <style before I start questioning my sanity!

              I’m pretty sure that what you are trying to do cannot be done with a regular expression. The reason is that you’re trying to find something such that the last tag preceding it satisfies a certain criteria (is “</h1>”), but the length of text between the tag and the thing for which you are searching is variable in length. Lookbacks have to be fixed length.

              That’s not to say there is no way to solve the problem, but it will have to be approached from some other direction. It might help if we know what you want to do with the numbers after you find them. (For example, if you’re replacing them with something else, it might be possible to match the entire string from the tag to the number and use a capture group in the replacement string. Applying that repeatedly could work if what you’re changing the numbers to is no longer a number greater than 101.)

              1 Reply Last reply Reply Quote 1
              • CoisesC
                Coises @Mark Olson
                last edited by Coises

                @Mark-Olson said in Find numbers greater than 101:

                I think the ColumnsPlusPlus plugin can do this sort of thing (searching for numbers with certain specifications) for you.

                Columns++ can form parts of the replacement string by manipulating matches and capture groups which it can parse as numbers, but (so far) it doesn’t do anything special with the matching itself. (It would be possible to match numbers and then make the replacement conditional on the value of the number, so that only numbers greater than 101 are changed, but the original poster asked to find the numbers, not to change them.)

                Columns++ can search in marked regions, which might help here because the thing that makes this so difficult is trying to find the numbers only in certain contexts. Notepad++ can mark what it finds (e.g., </h1>[^<]*), but it has no way to then search within the marked regions; Columns++ could do that.

                But since the original poster said he needs to do this in multiple files, and Columns++ only works on the current tab — no “all open documents” or “find in files” functions exist — it seems that won’t help, either. If there is a plugin that can search in marked regions and do multiple files at once, that would probably work.

                Scott NielsonS 1 Reply Last reply Reply Quote 0
                • Scott NielsonS
                  Scott Nielson @Coises
                  last edited by

                  @Coises Forget the </h1>. I think something like width\h*[=:]\h*(?<!-)(?:10[2-9]|1[1-9]\d|[2-9]\d{2}|[1-9]\d{3,})px(?=<style) should be tweaked a bit to stop matching/searching after <style

                  Scott NielsonS 1 Reply Last reply Reply Quote 0
                  • Scott NielsonS
                    Scott Nielson @Scott Nielson
                    last edited by

                    @Scott-Nielson Someone at www.regex101.com gave me this RegEx as a solution, provided there is a closing </style> tag: <style(?:[^<]+|<(?!\/style))*+</style>(*SKIP)(*F)|width[:=]\h*(?!10[01]px)\d{3,}px and it is perfect!

                    Scott NielsonS 1 Reply Last reply Reply Quote 0
                    • Scott NielsonS
                      Scott Nielson @Scott Nielson
                      last edited by Scott Nielson

                      @Scott-Nielson Just so that others may also understand, the test string used was tweaked to this:

                      Some text here
                      width:15px
                      width: 100%
                      width=693px
                      width: 105px
                      width: 1529px
                      <style>
                      width:15px
                      width: 100%
                      width=693px
                      width: 105px
                      width: 1529px
                      </style>
                      width: 105px
                      

                      Everything between <style> and </style> is skipped but everywhere else, the digits between “width:/width=” and the “px” are matched

                      1 Reply Last reply Reply Quote 0
                      • Scott NielsonS
                        Scott Nielson @Mark Olson
                        last edited by

                        @Mark-Olson Thanks!

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by

                          Hello @scott-nielson, @mark-olson, @coises and All,

                          Scott, your regex can even be simplified as below :

                          • SEARCH (?s)<style.*?</style>(*SKIP)(*F)|(?-i)width[:=]\h*(?!10[01])\d{3,}px

                          or

                          • SEARCH (?xs) <style .*? </style> (*SKIP) (*F) | (?-i) width [:=] \h* (?! 10[01] ) \d{3,} px if you prefer to use the free-spacing mode (?x)

                          Just test it against the following text :

                          Some text here
                          width:15px
                          width: 100%
                          width=693px
                          width: 105px
                          width: 1529px
                          <style>
                          width:15px
                          width: 100%
                          width=693px
                          width: 105px
                          width: 1529px
                          </style>
                          width: 105px
                          width:47px
                          width: 100%
                          width=100px
                          width: 105px
                          width= 99%
                          width= 100%
                          width= 101%
                          width= 102%
                          width: 789px
                          <style>
                          width:1px
                          width: 50%
                          width=234px
                          width: 101px
                          width: 1000px
                          </style>
                          width: 10px
                          width: 110px
                          width= 99px
                          width: 200px
                          

                          Globally, this special syntax, using two backtracking Control verbs means :

                          <What I don't want>(*SKIP)(*F)|<What I want>

                          Refer to this post for further explanations on these special verbs :

                          https://community.notepad-plus-plus.org/post/55464

                          Best Regards,

                          guy038

                          Scott NielsonS 1 Reply Last reply Reply Quote 2
                          • Scott NielsonS
                            Scott Nielson @guy038
                            last edited by

                            OK, thanks @guy038

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors