Find a specific sequence on every line
-
Hello, @george-martinez, @neil-schipper and All,
Neil, I would prefer this regex S/R :
SEARCH
(?-si)^.+port:\h*(\d+).*
REPLACE
port $1
See the difference with your version, against the text below :
sdfsdf3 sdff9dg port: 2522 dgfgdfg d*f@@fsdf #sdfd sdf s port:52 sdf g2dfg53354 !df gdfgdf port: 81 sdf gfdgdfg
Note that my first attempt was the regex S/R below, which is correct but does not output aligned numbers, yet ;-))
SEARCH
(?-si)^.+(port:\h*\d+).*
REPLACE
$1
Best Regards,
guy038
P.S. :
Neil, you don’t need to add the comment “. matches newline not checked”, if you begin your regexes with the
(?-s)
modifier. This modifier forces the regex engine to consider any regex.
as a single standard char, EVEN IF the. matches newline
option is ticked ! And vice-versa for the(?s)
modifier -
Hello @guy038,
Your solution is probably more robust and general than mine, but we can’t really say because the spec is incomplete, and all three of our solutions could fail on some data.
- spec does not say whether lines may start with 'port: ’ which my solution accepts and both of your solutions skip
- spec does not say whether 'port: ’ (when not at start) must be preceded by whitespace, so all three of our solutions process
sdf g2fport: 5354 !df df port: 82 sdf
into port 82, possibly failing to capture port 5354 - spec says neither “exactly one space after the colon” (me) nor “any run of whitespace” (you) after the colon
- (this one is trivial, but) spec doesn’t say whether output should keep the colon (which your preferred solution, probably inadvertently, leaves out)
So we’re all guessing to some degree.
And based on the loose way things are often specified, people taking our advice are probably not brutally testing our solutions against reams of test cases, and we can only hope and pray that they’re not involved with anything life-critical.
In regard to selecting “line oriented” vs. “sea of bytes oriented” by prefixing every search string with a
(?
directive, I do recognize its power (and only recently, thanks to posts by yourself and the other 2 or 3 regex solution-providers at this site). I also think that including it is a disincentive for regex noobs to dig in and learn what the whole search string is doing. It’s an extra subtlety on top of what are already somewhat subtle concepts & constructs, like ‘+’ vs. ‘*’, and dozens more.Actually I regard placing in the find dialog, the option that modifies ‘.’ something of a design flaw, and I feel it should really be hidden away in Preferences. My reasoning is that a super-majority of users are non-sophisticates, and a super-majority of use cases are line-oriented; a good number of this kind of user will be able and willing to learn a few regex tricks now and again and build their competence over time.
OTOH, people dealing with “sea of bytes” situations are more likely to be computer/data savvy who are in a position to “read the docs” and become regex sophisticates; if not they must seek advice from reliable sophisticates and apply their solutions in innocence/ignorance.
-
@Neil-Schipper said in Find a specific sequence on every line:
spec doesn’t say whether output should keep the colon
Oops: he did include the colon in required output. He also (maybe inadvertently) omitted the space between colon and the number.
So @george-martinez: test, test, test.
-
Hi, @george-martinez, @neil-schipper and All,
Regarding the P.S. part of my previous post, it was more specifically provided for your personal knowledge of N++ Boost regexes, as you seem to master the basics of regexes and not for regex noobs, of course !
You may find some info on these modifiers, at the beginning of my other post :
https://community.notepad-plus-plus.org/post/70509
Oh…, as you spoke about the colon char, I’ve just realized that I omitted the colon char in the Replace regex :-(
So my final version should be :
SEARCH
(?-si)^.+port:\h*(\d+).*
REPLACE
port: $1
BR
guy038
-
Hi @guy038,
Your #70509 was quite extraordinary in depth and detail. I started a reply but did not publish because I’m still absorbing some aspects of it, and, I have some uncertainties. I do intend to reply, I hope in the next day or so.
In regard to this thread, you did not respond to the main points of my last post.
First point is that because the spec is loose, it’s tough to be sure whether any of our three solutions are truly complete (and you only referred to the least interesting of the four bullets).
For example, both of your solutions would leave this line unchanged:
port: 184 06 dfghjk
because they don’t match
port:
in column 1. But maybe that can’t occur in the data files @george-martinez has to process, and your solution is fine. We just don’t know. (Also, I could have had a 5th bullet to mention that your solutions force case sensitivity, but this was not specified.) Anyway, I’ll try to remember when I offer solutions to state if I think there are unspecified aspects of the data described/provided by the request-maker that could make the solution misfire.The second point questions whether it’s a good idea to prefix every solution with a
(?
directive on pedagogical grounds. I tried to convey that I learned it recently (from you, plural, since I don’t remember if I saw it first in a post by you or TJ, PJ, or AK), and that I recognize that it’s a robust technique. -
Hi, @george-martinez, @neil-schipper and All,
I agree that the right regex solution highly depends on OP’s needs ! So I’ll try to be a bit more exhaustive ;-))
The regexes, below, matches ANY line, without its line-break chars, containing the string
port:
, followed with an INTEGER, with the following conditions :Case A The string 'port....###', with this EXACT case, may occur at ANY position of the current line Case B The string 'port....###', with this EXACT case, is FOLLOWED with some STANDARD characters Case C The string 'port....###', with this EXACT case, is PRECEDED with some STANDARD characters Case D The string 'port....###', with this EXACT case, is PRECEDED and FOLLOWED with some STANDARD characters Case E The string 'port....###', with this EXACT case, BEGINS the current line Case F The string 'port....###', with this EXACT case, BEGINS the current line and is FOLLOWED with some STANDARD characters Case G The string 'port....###', with this EXACT case, ENDS the current line Case H The string 'port....###', with this EXACT case, ENDS the current line and is PRECEDED with some STANDARD characters Case I The string 'port....###', with this EXACT case, BEGINS and ENDS the current line Case J The string 'port....###', WHATEVER its case, may occur at ANY position of the current line Case K The string 'port....###', WHATEVER its case, is FOLLOWED with some STANDARD characters Case L The string 'port....###', WHATEVER its case, is PRECEDED with some STANDARD characters Case M The string 'port....###', WHATEVER its case, is PRECEDED and FOLLOWED with some STANDARD characters Case N The string 'port....###', WHATEVER its case, BEGINS the current line Case O The string 'port....###', WHATEVER its case, BEGINS the current line and is FOLLOWED with some STANDARD characters Case P The string 'port....###', WHATEVER its case, ENDS the current line Case Q The string 'port....###', WHATEVER its case, ENDS the current line and is PRECEDED with some STANDARD characters Case R The string 'port....###', WHATEVER its case, BEGINS and ENDS the current line and : Case 1 ANY range of consecutive HORIZONTAL BLANK char(s), even NONE, between the string 'port:' and the INTEGER Case 2 ANY range of consecutive HORIZONTAL BLANK char(s), between the string 'port:' and the INTEGER Case 3 A SINGLE SPACE char between the string 'port:' and the INTEGER Case 4 A SINGLE TABULATION char between the string 'port:' and the INTEGER
Thus, here is the table of the different search regexes, according to their respective conditions
At the INTERSECTION of the column, relative to the character(s) between the string
port:
and the integer and the line, relative to the possible locations of the stringport:.......####
and its case, you’ll find the appropriate search regex :
Case 1 Case 2 Case 3 Case 4 (?-is)^.*port:\h*(\d+).* (?-is)^.*port:\h+(\d+).* (?-is)^.*port:\x20(\d+).* (?-is)^.*port:\t(\d+).* Case A (?-is)^.*port:\h*(\d+).+ (?-is)^.*port:\h+(\d+).+ (?-is)^.*port:\x20(\d+).+ (?-is)^.*port:\t(\d+).+ Case B (?-is)^.+port:\h*(\d+).* (?-is)^.+port:\h+(\d+).* (?-is)^.+port:\x20(\d+).* (?-is)^.+port:\t(\d+).* Case C (?-is)^.+port:\h*(\d+).+ (?-is)^.+port:\h+(\d+).+ (?-is)^.+port:\x20(\d+).+ (?-is)^.+port:\t(\d+).+ Case D (?-is)^port:\h*(\d+).* (?-is)^port:\h+(\d+).* (?-is)^port:\x20(\d+).* (?-is)^port:\t(\d+).* Case E (?-is)^port:\h*(\d+).+ (?-is)^port:\h+(\d+).+ (?-is)^port:\x20(\d+).+ (?-is)^port:\t(\d+).+ Case F (?-is)^.*port:\h*(\d+)$ (?-is)^.*port:\h+(\d+)$ (?-is)^.*port:\x20(\d+)$ (?-is)^.*port:\t(\d+)$ Case G (?-is)^.+port:\h*(\d+)$ (?-is)^.+port:\h+(\d+)$ (?-is)^.+port:\x20(\d+)$ (?-is)^.+port:\t(\d+)$ Case H (?-is)^port:\h*(\d+)$ (?-is)^port:\h+(\d+)$ (?-is)^port:\x20(\d+)$ (?-is)^port:\t(\d+)$ Case I (?i-s)^.*port:\h*(\d+).* (?i-s)^.*port:\h+(\d+).* (?i-s)^.*port:\x20(\d+).* (?i-s)^.*port:\t(\d+).* Case J (?i-s)^.*port:\h*(\d+).+ (?i-s)^.*port:\h+(\d+).+ (?i-s)^.*port:\x20(\d+).+ (?i-s)^.*port:\t(\d+).+ Case K (?i-s)^.+port:\h*(\d+).* (?i-s)^.+port:\h+(\d+).* (?i-s)^.+port:\x20(\d+).* (?i-s)^.+port:\t(\d+).* Case L (?i-s)^.+port:\h*(\d+).+ (?i-s)^.+port:\h+(\d+).+ (?i-s)^.+port:\x20(\d+).+ (?i-s)^.+port:\t(\d+).+ Case M (?i-s)^port:\h*(\d+).* (?i-s)^port:\h+(\d+).* (?i-s)^port:\x20(\d+).* (?i-s)^port:\t(\d+).* Case N (?i-s)^port:\h*(\d+).+ (?i-s)^port:\h+(\d+).+ (?i-s)^port:\x20(\d+).+ (?i-s)^port:\t(\d+).+ Case O (?i-s)^.*port:\h*(\d+)$ (?i-s)^.*port:\h+(\d+)$ (?i-s)^.*port:\x20(\d+)$ (?i-s)^.*port:\t(\d+)$ Case P (?i-s)^.+port:\h*(\d+)$ (?i-s)^.+port:\h+(\d+)$ (?i-s)^.+port:\x20(\d+)$ (?i-s)^.+port:\t(\d+)$ Case Q (?i-s)^port:\h*(\d+)$ (?i-s)^port:\h+(\d+)$ (?i-s)^port:\x20(\d+)$ (?i-s)^port:\t(\d+)$ Case R
Best Regards,
guy038
-
@guy038 said in Find a specific sequence on every line:
right regex solution highly depends on OP’s needs ! So I’ll try to be a bit more exhaustive
I think you are going to exhaust yourself if you solve every possible problem that someone could be asking for. :-)
Maybe our new moderator could come up with a set of rules for asking data manipulation questions. If you don’t as your question correctly, you just get redirected back to the instructions, until you meet the criteria for an adequately stated problem. Hmm, maybe that would exhaust him as well. :-)
-
@guy038 Many of your posts are amazing, and they seem to be getting more amazing.
I’m not sure if you (or anyone) has done something like this before – ie, coding an entire family of regex search expressions for all conceivable variations on a given requirement – but it’s intriguing to imagine an engine behind a friendly interface asking users to specify key aspects of their requirements in natural language, such that the engine would then generate a list like you’ve done here.
It could even be built into Np++ or an add-on. That would make a lot of requests here redundant. And some folk would have to find a new hobby.
-
@Alan-Kilborn said in Find a specific sequence on every line:
I think you are going to exhaust yourself if you solve every possible problem that someone could be asking for. :-)
I worry that if someone inadvertently posted nothing more than “I need a regex”, @guy038, cool and composed as ever, would charge forth, and how the terabytes would fly!
-
@Neil-Schipper said in Find a specific sequence on every line:
it’s intriguing to imagine an engine behind a friendly interface asking users to specify key aspects of their requirements in natural language
That would be, well, pure “magic”: