Community
    • Login

    Find a specific sequence on every line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 4 Posters 720 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @george-martinez, @neil-schipper and All,

      Neil, I would prefer this regex S/R :

      SEARCH (?-si)^.+port:\h*(\d+).*

      REPLACE port $1

      See the difference with your version, against the text below :

      sdfsdf3 sdff9dg port:      2522 dgfgdfg
      d*f@@fsdf #sdfd sdf s port:52
      sdf g2dfg53354 !df gdfgdf port: 81 sdf gfdgdfg
      

      Note that my first attempt was the regex S/R below, which is correct but does not output aligned numbers, yet ;-))

      SEARCH (?-si)^.+(port:\h*\d+).*

      REPLACE $1

      Best Regards,

      guy038

      P.S. :

      Neil, you don’t need to add the comment “. matches newline not checked”, if you begin your regexes with the (?-s) modifier. This modifier forces the regex engine to consider any regex . as a single standard char, EVEN IF the . matches newline option is ticked ! And vice-versa for the (?s) modifier

      Neil SchipperN 1 Reply Last reply Reply Quote 1
      • Neil SchipperN
        Neil Schipper @guy038
        last edited by

        Hello @guy038,

        Your solution is probably more robust and general than mine, but we can’t really say because the spec is incomplete, and all three of our solutions could fail on some data.

        • spec does not say whether lines may start with 'port: ’ which my solution accepts and both of your solutions skip
        • spec does not say whether 'port: ’ (when not at start) must be preceded by whitespace, so all three of our solutions process sdf g2fport: 5354 !df df port: 82 sdf into port 82, possibly failing to capture port 5354
        • spec says neither “exactly one space after the colon” (me) nor “any run of whitespace” (you) after the colon
        • (this one is trivial, but) spec doesn’t say whether output should keep the colon (which your preferred solution, probably inadvertently, leaves out)

        So we’re all guessing to some degree.

        And based on the loose way things are often specified, people taking our advice are probably not brutally testing our solutions against reams of test cases, and we can only hope and pray that they’re not involved with anything life-critical.

        In regard to selecting “line oriented” vs. “sea of bytes oriented” by prefixing every search string with a (? directive, I do recognize its power (and only recently, thanks to posts by yourself and the other 2 or 3 regex solution-providers at this site). I also think that including it is a disincentive for regex noobs to dig in and learn what the whole search string is doing. It’s an extra subtlety on top of what are already somewhat subtle concepts & constructs, like ‘+’ vs. ‘*’, and dozens more.

        Actually I regard placing in the find dialog, the option that modifies ‘.’ something of a design flaw, and I feel it should really be hidden away in Preferences. My reasoning is that a super-majority of users are non-sophisticates, and a super-majority of use cases are line-oriented; a good number of this kind of user will be able and willing to learn a few regex tricks now and again and build their competence over time.

        OTOH, people dealing with “sea of bytes” situations are more likely to be computer/data savvy who are in a position to “read the docs” and become regex sophisticates; if not they must seek advice from reliable sophisticates and apply their solutions in innocence/ignorance.

        Neil SchipperN 1 Reply Last reply Reply Quote 0
        • Neil SchipperN
          Neil Schipper @Neil Schipper
          last edited by

          @Neil-Schipper said in Find a specific sequence on every line:

          spec doesn’t say whether output should keep the colon

          Oops: he did include the colon in required output. He also (maybe inadvertently) omitted the space between colon and the number.

          So @george-martinez: test, test, test.

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hi, @george-martinez, @neil-schipper and All,

            Regarding the P.S. part of my previous post, it was more specifically provided for your personal knowledge of N++ Boost regexes, as you seem to master the basics of regexes and not for regex noobs, of course !

            You may find some info on these modifiers, at the beginning of my other post :

            https://community.notepad-plus-plus.org/post/70509


            Oh…, as you spoke about the colon char, I’ve just realized that I omitted the colon char in the Replace regex :-(

            So my final version should be :

            SEARCH (?-si)^.+port:\h*(\d+).*

            REPLACE port: $1

            BR

            guy038

            Neil SchipperN 1 Reply Last reply Reply Quote 0
            • Neil SchipperN
              Neil Schipper @guy038
              last edited by

              Hi @guy038,

              Your #70509 was quite extraordinary in depth and detail. I started a reply but did not publish because I’m still absorbing some aspects of it, and, I have some uncertainties. I do intend to reply, I hope in the next day or so.

              In regard to this thread, you did not respond to the main points of my last post.

              First point is that because the spec is loose, it’s tough to be sure whether any of our three solutions are truly complete (and you only referred to the least interesting of the four bullets).

              For example, both of your solutions would leave this line unchanged:

              port: 184 06 dfghjk
              

              because they don’t match port: in column 1. But maybe that can’t occur in the data files @george-martinez has to process, and your solution is fine. We just don’t know. (Also, I could have had a 5th bullet to mention that your solutions force case sensitivity, but this was not specified.) Anyway, I’ll try to remember when I offer solutions to state if I think there are unspecified aspects of the data described/provided by the request-maker that could make the solution misfire.

              The second point questions whether it’s a good idea to prefix every solution with a (? directive on pedagogical grounds. I tried to convey that I learned it recently (from you, plural, since I don’t remember if I saw it first in a post by you or TJ, PJ, or AK), and that I recognize that it’s a robust technique.

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi, @george-martinez, @neil-schipper and All,

                I agree that the right regex solution highly depends on OP’s needs ! So I’ll try to be a bit more exhaustive ;-))


                The regexes, below, matches ANY line, without its line-break chars, containing the string port:, followed with an INTEGER, with the following conditions :

                Case A  The string 'port....###', with this EXACT case, may occur at ANY position of the current line
                Case B  The string 'port....###', with this EXACT case, is FOLLOWED                                      with some STANDARD characters
                Case C  The string 'port....###', with this EXACT case, is PRECEDED                                      with some STANDARD characters
                Case D  The string 'port....###', with this EXACT case, is PRECEDED and FOLLOWED                         with some STANDARD characters
                Case E  The string 'port....###', with this EXACT case, BEGINS          the current line
                Case F  The string 'port....###', with this EXACT case, BEGINS          the current line and is FOLLOWED with some STANDARD characters
                Case G  The string 'port....###', with this EXACT case, ENDS            the current line
                Case H  The string 'port....###', with this EXACT case, ENDS            the current line and is PRECEDED with some STANDARD characters
                Case I  The string 'port....###', with this EXACT case, BEGINS and ENDS the current line
                
                Case J  The string 'port....###', WHATEVER its case,    may occur at ANY position of the current line
                Case K  The string 'port....###', WHATEVER its case,    is FOLLOWED                                      with some STANDARD characters
                Case L  The string 'port....###', WHATEVER its case,    is PRECEDED                                      with some STANDARD characters
                Case M  The string 'port....###', WHATEVER its case,    is PRECEDED and FOLLOWED                         with some STANDARD characters
                Case N  The string 'port....###', WHATEVER its case,    BEGINS          the current line
                Case O  The string 'port....###', WHATEVER its case,    BEGINS          the current line and is FOLLOWED with some STANDARD characters
                Case P  The string 'port....###', WHATEVER its case,    ENDS            the current line
                Case Q  The string 'port....###', WHATEVER its case,    ENDS            the current line and is PRECEDED with some STANDARD characters
                Case R  The string 'port....###', WHATEVER its case,    BEGINS and ENDS the current line
                
                and :
                
                Case 1  ANY range of consecutive HORIZONTAL BLANK char(s), even NONE, between the string 'port:' and the INTEGER
                Case 2  ANY range of consecutive HORIZONTAL BLANK char(s),            between the string 'port:' and the INTEGER
                Case 3  A SINGLE SPACE      char                                      between the string 'port:' and the INTEGER
                Case 4  A SINGLE TABULATION char                                      between the string 'port:' and the INTEGER
                
                

                Thus, here is the table of the different search regexes, according to their respective conditions

                At the INTERSECTION of the column, relative to the character(s) between the string port: and the integer and the line, relative to the possible locations of the string port:.......#### and its case, you’ll find the appropriate search regex :


                         Case 1                      Case 2                       Case 3                      Case 4
                
                (?-is)^.*port:\h*(\d+).*    (?-is)^.*port:\h+(\d+).*    (?-is)^.*port:\x20(\d+).*    (?-is)^.*port:\t(\d+).*    Case A
                (?-is)^.*port:\h*(\d+).+    (?-is)^.*port:\h+(\d+).+    (?-is)^.*port:\x20(\d+).+    (?-is)^.*port:\t(\d+).+    Case B
                
                (?-is)^.+port:\h*(\d+).*    (?-is)^.+port:\h+(\d+).*    (?-is)^.+port:\x20(\d+).*    (?-is)^.+port:\t(\d+).*    Case C
                (?-is)^.+port:\h*(\d+).+    (?-is)^.+port:\h+(\d+).+    (?-is)^.+port:\x20(\d+).+    (?-is)^.+port:\t(\d+).+    Case D
                
                (?-is)^port:\h*(\d+).*      (?-is)^port:\h+(\d+).*      (?-is)^port:\x20(\d+).*      (?-is)^port:\t(\d+).*      Case E
                (?-is)^port:\h*(\d+).+      (?-is)^port:\h+(\d+).+      (?-is)^port:\x20(\d+).+      (?-is)^port:\t(\d+).+      Case F
                
                (?-is)^.*port:\h*(\d+)$     (?-is)^.*port:\h+(\d+)$     (?-is)^.*port:\x20(\d+)$     (?-is)^.*port:\t(\d+)$     Case G
                (?-is)^.+port:\h*(\d+)$     (?-is)^.+port:\h+(\d+)$     (?-is)^.+port:\x20(\d+)$     (?-is)^.+port:\t(\d+)$     Case H
                
                (?-is)^port:\h*(\d+)$       (?-is)^port:\h+(\d+)$       (?-is)^port:\x20(\d+)$       (?-is)^port:\t(\d+)$       Case I
                
                (?i-s)^.*port:\h*(\d+).*    (?i-s)^.*port:\h+(\d+).*    (?i-s)^.*port:\x20(\d+).*    (?i-s)^.*port:\t(\d+).*    Case J
                (?i-s)^.*port:\h*(\d+).+    (?i-s)^.*port:\h+(\d+).+    (?i-s)^.*port:\x20(\d+).+    (?i-s)^.*port:\t(\d+).+    Case K
                
                (?i-s)^.+port:\h*(\d+).*    (?i-s)^.+port:\h+(\d+).*    (?i-s)^.+port:\x20(\d+).*    (?i-s)^.+port:\t(\d+).*    Case L
                (?i-s)^.+port:\h*(\d+).+    (?i-s)^.+port:\h+(\d+).+    (?i-s)^.+port:\x20(\d+).+    (?i-s)^.+port:\t(\d+).+    Case M
                
                (?i-s)^port:\h*(\d+).*      (?i-s)^port:\h+(\d+).*      (?i-s)^port:\x20(\d+).*      (?i-s)^port:\t(\d+).*      Case N
                (?i-s)^port:\h*(\d+).+      (?i-s)^port:\h+(\d+).+      (?i-s)^port:\x20(\d+).+      (?i-s)^port:\t(\d+).+      Case O
                
                (?i-s)^.*port:\h*(\d+)$     (?i-s)^.*port:\h+(\d+)$     (?i-s)^.*port:\x20(\d+)$     (?i-s)^.*port:\t(\d+)$     Case P
                (?i-s)^.+port:\h*(\d+)$     (?i-s)^.+port:\h+(\d+)$     (?i-s)^.+port:\x20(\d+)$     (?i-s)^.+port:\t(\d+)$     Case Q
                
                (?i-s)^port:\h*(\d+)$       (?i-s)^port:\h+(\d+)$       (?i-s)^port:\x20(\d+)$       (?i-s)^port:\t(\d+)$       Case R
                

                Best Regards,

                guy038

                Alan KilbornA Neil SchipperN 2 Replies Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by Alan Kilborn

                  @guy038 said in Find a specific sequence on every line:

                  right regex solution highly depends on OP’s needs ! So I’ll try to be a bit more exhaustive

                  I think you are going to exhaust yourself if you solve every possible problem that someone could be asking for. :-)

                  Maybe our new moderator could come up with a set of rules for asking data manipulation questions. If you don’t as your question correctly, you just get redirected back to the instructions, until you meet the criteria for an adequately stated problem. Hmm, maybe that would exhaust him as well. :-)

                  Neil SchipperN 1 Reply Last reply Reply Quote 1
                  • Neil SchipperN
                    Neil Schipper @guy038
                    last edited by

                    @guy038 Many of your posts are amazing, and they seem to be getting more amazing.

                    I’m not sure if you (or anyone) has done something like this before – ie, coding an entire family of regex search expressions for all conceivable variations on a given requirement – but it’s intriguing to imagine an engine behind a friendly interface asking users to specify key aspects of their requirements in natural language, such that the engine would then generate a list like you’ve done here.

                    It could even be built into Np++ or an add-on. That would make a lot of requests here redundant. And some folk would have to find a new hobby.

                    Alan KilbornA 1 Reply Last reply Reply Quote 1
                    • Neil SchipperN
                      Neil Schipper @Alan Kilborn
                      last edited by

                      @Alan-Kilborn said in Find a specific sequence on every line:

                      I think you are going to exhaust yourself if you solve every possible problem that someone could be asking for. :-)

                      I worry that if someone inadvertently posted nothing more than “I need a regex”, @guy038, cool and composed as ever, would charge forth, and how the terabytes would fly!

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Neil Schipper
                        last edited by

                        @Neil-Schipper said in Find a specific sequence on every line:

                        it’s intriguing to imagine an engine behind a friendly interface asking users to specify key aspects of their requirements in natural language

                        That would be, well, pure “magic”:

                        444080f6-9b1c-4dc6-bad7-357552cc007e-image.png

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors