Search syntax help: keyword found, but also need data two lines down

Mike P

Hello N++ community,

Could you please help me with a search problem that I’m having?

The logfile I’m using will often have a common event code with different descriptive details. I’d like to search for this event code and only see results that also include the desired descriptive details, which exists two lines down. I’m thinking that I need to include \v or \Z in my search string, but so far I haven’t been able to make anything work.

In this example, I’m interested in the 255 RH_ANG event code and descriptive details [00453]

Before data:

E  29-Sep-20 09:01:48	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Time: 29.9.2020, 09:01:48, Process: C:\Service\bin\Rep.exe_2780,
                                                                        Text: Backlog (31.12.2009 18:01:11) [00398]AbortByBlocked 

W  29-Sep-20 09:01:48	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Time: 29.9.2020, 09:01:48, Process: C:\Service\bin\Rep.exe_2780,
                                                                        Text: Backlog (31.12.2009 18:01:11) [00001]Read T0=0												

E  29-Sep-20 12:56:10	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Time: 29.9.2020, 12:56:10, Process: C:\Service\bin\Rep.exe_2776,
                                                                        Text: (29.09.2020 12:56:11) [00315]Manually disabled via ST

W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Time: 29.9.2020, 12:56:12, Process: C:\Service\bin\Rep.exe_2776,
                                                                        Text: (29.09.2020 12:56:13) [00453]0100: GP  120cm -67C -34R   0A 545ST   0[%] 320F 750fr no3d 1#1

W  29-Sep-20 13:56:35	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Time: 29.9.2020, 13:56:35, Process: C:\Service\bin\Rep.exe_2776,
                                                                        Text: (29.09.2020 13:56:36) [00001]sequence no send: 0, received: 0, state of device: 3

Currently, I’m using this search string

(?-s)^W.*?\K(\t)255(\t)RH_ANG

Which gives this partially useful result

W  29-Sep-20 09:01:48	255	RH_ANG           <I>RH diagnostic info    </I>
W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
W  29-Sep-20 13:56:35	255	RH_ANG           <I>RH diagnostic info    </I>

After data - the search results that I’d like to see is this:

W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Text: (29.09.2020 12:56:13) [00453]0100: GP  120cm -67C -34R   0A 545ST   0[%] 320F 750fr no3d 1#1

Thank you!

—

moderator added code markdown around text; please don’t forget to use the </> button to mark example text as “code” so that characters don’t get changed by the forum

PeterJones

@Mike-P,

Currently, I’m using this search string

(?-s)^W.*?\K(\t)255(\t)RH_ANG

Which gives this partially useful result

W  29-Sep-20 09:01:48	255	RH_ANG           <I>RH diagnostic info    </I>
W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
W  29-Sep-20 13:56:35	255	RH_ANG           <I>RH diagnostic info    </I>

Actually, that search just finds the TAB then 255 then TAB then RH_ANG. You must be doing something else/more if you were able to filter down to the “partially useful result”

After data - the search results that I’d like to see is this:

W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Text: (29.09.2020 12:56:13) [00453]0100: GP  120cm -67C -34R   0A 545ST   0[%] 320F 750fr no3d 1#1

I would do it in two steps.

To match the W line and those following, and only extract the W line then skip one then the Text line, I would do something like:
FIND = (?-s)^(W.*?\t255\tRH_ANG.*\R).*\R(.*\R)
REPLACE = $1$2
SEARCH MODE = regular expression
Fewer details than my previous explanation (you can follow the link in the previous discussion to the User Manual Regex section for exact details on each symbol):
- The first (...) puts the first W line (including newline) into group #1
- the .*\R matches one entire line – so this is the one between the W line and the next line`
- the (.*\R) matches one entire line (including newline) and puts it in group #2 – this will be the second line after the W line
- the replacement just uses group#1 and group#2, so the first and third lines
- if you don’t have a newline after your last Text: line, it won’t match properly; always make sure you have a blank line at the end of files when doing complex regex with \R markers in your search.
If you also want to delete any of the groups that don’t start with ^E (which your “after” example implies), I would do a second replacement:
FIND = (?-s)^E(.*?\R)+?\R
REPLACE = (empty field)
SEARCH MODE = regular expression
- The ^E says that the first part of the match must be an E at the beginning of the line
- (.*?\R) will match the rest of the line, including newlin
- modifiying it with +? means it will match multiple lines (since the E is not inside the () group, the +? doesn’t modify that part, so only the first line needs to start with E), but non-greedily, so it will stop at the first instance of whatever comes next
- the last \R means it will stop the match on a blank line after the N lines that matched above. Since the previous modifier was non-greedy, it will start at the first blank line it encounters after an E-prefixed line.
- replacing with nothing effectively deletes that block
- so this will delete the E groups without deleting the W groups

With the data you gave, I ended up with the following, after running both those steps

W  29-Sep-20 09:01:48	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Text: Backlog (31.12.2009 18:01:11) [00001]Read T0=0												

W  29-Sep-20 12:56:12	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Text: (29.09.2020 12:56:13) [00453]0100: GP  120cm -67C -34R   0A 545ST   0[%] 320F 750fr no3d 1#1

W  29-Sep-20 13:56:35	255	RH_ANG           <I>RH diagnostic info    </I>
                                                                        Text: (29.09.2020 13:56:36) [00001]sequence no send: 0, received: 0, state of device: 3

Please note: while we in the forum will help people get started with regex, having one user ask a whole bunch of regex questions gets rather uninteresting to us. Since you are at least putting in an effort, showing input and desired data, and the steps you tried, you’re likely to get more help than the lazy posters… but if that’s all you ever ask, we’ll get tired, because we aren’t really a “help me with my regex” forum. As I put it for people who are pushing that limit or beyond:

Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.