Remove unmarked lines (not just unbookmarked lines)
-
could you give us an example of what you did?
Maybe a screenshot as well?Cheers
Claudia -
Sure, here’s what I did:
I used this regex to select the text I wanted:
(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*
(or this one I have since switched to, using the
. matches newline
option:(TERM1|TERM2|TERM3).*?(?=^2)
(both match the same selection of lines))In English, it’s going through a log file to pick out any line that mentions any of the 3 terms I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.
Here’s a simulated example of what that log file looks like:
2016-08-07 05:26:54,688 Place3-1463 ERROR Logger3 Error on randomThing java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Something happened 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Place3-1438 DEBUG Logger2 delete successful for thething{id=123456} 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,735 Place3-1438 DEBUG Logger2 Sending somthing place://thing.rpc:15478 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,297 Place3-1463 DEBUG Logger2 delete successful 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
And when I apply the mark with that regex:
If
Bookmark line
applied a bookmark to every marked line, there would be no issue, but as you can see on line 19, only the first line of a match is bookmarked, even though the entire match is marked. This means that when I use theRemove Unmarked Lines
function:Only the 3 bookmarked will be preserved, rather than all 12 marked lines
So, I need a way to do one of the following:
- Remove all unmarked lines (most helpful)
- Bookmark every line of a multiline selection
- Construct a new regex that will mark only the lines I don’t care about (least helpful, and I’ve already spent a few hours attempting this, to no avail)
-
Just noticed (too late to edit) that there was an inadvertent space at the beginning of my regex in that first screenshot. So line 17 should also be selected. With the regex corrected, this is what the marking produces:
-
The 5 marks (i.e. matches for your RE) should be preserved which you would like to have been 14 bookmarks instead of the shown 5.
-
@MAPJe71 The marks are not preserved though; most of the 4th mark (which spans 10 lines) is removed when I select
Remove unmarked lines
. As can be seen in the below gif:If 14 bookmarks had been generated,
Remove unmarked lines
would have done what I expected. I imagine this is best implemented as an additional checkbox. IfBookmark line
is checked, then the optionBookmark each line in multiline match
would become available to be checked.Alternatively, if
Remove unmarked lines
actually removed lines that do not contain marks (rather than lines that do not contain bookmarks), that also would have accomplished what I needed. If that menu option was instead namedRemove unbookmarked lines
, I wouldn’t expect it to do what I wanted here. -
Maybe use a macro or a Python script?
For the macro I was thinking of extending the Find-what RE to
^(?:\d{4}(?:-\d{2}){2}).*?(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*
(using the Find tab not the Mark tab), copy the match to a new document and run that multiple times. -
don’t see how this could be done using native npp functionality easily.
A python script could look like this.regex = '(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*' matches = [] MARK_BOOKMARK = 24 editor.research(regex, lambda m: matches.append(m.span())) for match in matches: start = editor.lineFromPosition(match[0]) end = editor.lineFromPosition(match[1]) if start == end: editor.markerAdd(start, MARK_BOOKMARK) else: for i in range(start,end+1): editor.markerAdd(i, MARK_BOOKMARK)
Cheers
Claudia -
Hello SnoringFrog and All,
Sorry for that late reply. It’s just my summer holidays, in Brittany, with a wonderful weather, since a week !
SnoringFrog, finally, your problem is about the Java report error, which lies on several lines, in your file. So, why don’t you gather all these lines in an unique line ? By that means, you don’t even need to bookmark anything !
Just one assumption : the lines, of the report Java error, from the second one, always begin with some blank characters ( four spaces in your example)
Then, the job can be split in tree steps :
-
Gather all lines of any Java report error block, into a single physical line.
-
Delete any line, which does NOT contain the lower case word java AND the upper words TERM1, TERM2 and TERM3, as well as a possible line of a Java report error block, which may follow it.
-
Restore the original form of all the remaining Java report error blocks.
Right ! Let’s go ( All the following S/R need the Regular expression search mode )
First of all, make a copy of your original log file ( One never knows ! )
Now, from your original example, below :
2016-08-07 05:26:54,688 Place3-1463 ERROR Logger3 Error on randomThing java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Something happened 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Place3-1438 DEBUG Logger2 delete successful for thething{id=123456} 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,735 Place3-1438 DEBUG Logger2 Sending somthing place://thing.rpc:15478 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,297 Place3-1463 DEBUG Logger2 delete successful 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
The following simple S/R !
SEARCH :
\R(\h+at )
, with a space, between the string “at” and the closing round bracketREPLACE :
\1
gives the following text, below :
2016-08-07 05:26:54,688 Place3-1463 ERROR Logger3 Error on randomThing java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Something happened 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Place3-1438 DEBUG Logger2 delete successful for thething{id=123456} 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,735 Place3-1438 DEBUG Logger2 Sending somthing place://thing.rpc:15478 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,297 Place3-1463 DEBUG Logger2 delete successful 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
Then, the second S/R gets rid of all the unwanted lines :
SEARCH :
(?-is)^(?!.*(java|TERM[123])).*\R(?:\h*java.*\R)?
REPLACE : Leave
EMPTY
and gives the resulting text, below :
2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
Finally, the simple S/R, below :
SEARCH :
\h+at\x20
REPLACE :
\r\n$0
restore the original appearance of the Java report error block !
2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
This text does contain the lines 12, 14, 17, 19 and 30, of your original text, which contains the word TERM[123], as well as the 9 lines of the second Java report error, between lines 20 and 28 :-))
NOTES on the regexes :
-
In the first S/R, the
\R
form stands for any kind of EOL characters and the\1
syntax, in replacement, represents the string at, with some leading blank characters and a space after, stored as group1, in the searched part -
The syntax
(?-is)
, beginning the second S/R, forces the regex engine to consider the search as sensitive to the case and the DOT as the expression of a standard character, only -
Then, the negative look-ahead
^(?!.*(java|TERM[123]))
, try to match any line, which does NOT contain, from the beginning, the strings “java”, “TERM1”, “TERM2” and “TERM3”, in this exact case -
The following
.*\R
form, represents all the contents of that line -
Finally, a possible non-capturing group
(?:\h*java.*\R)?
stands for the line of a POSSIBLE Java report error, NOT preceded by a line, containing the string TERM[123] and which, consequently, must, also, be deleted -
In the third S/R, the
$0
represents the totality of the searched string, that is to say, the word at, with some leading blank characters and a space after
Best regards,
guy038
P.S. :
In addition, the regexes, given above, still work if all your text, itself, is already indented with some blank characters :-))
-
-
@guy038 This does seem like the best method I have available right now. I ended up changing some of your regexes a bit, but the basic idea was the same. Collapsing the java errors temporarily was definitely the trick I needed.
In case you are interested, here are the steps I used:
First S/R:
SEARCH:\R([^2])
REPLACE:qzqqz\1
I noticed a few lines in some java errors that did not begin with spaces (which my example does not account for), but I also know that java errors are the only lines in these logs that do not begin with “2”, so I used that to find them. Since I could not depend on
\h
to re-expand the errors later, I inserted “qzqqz” (a dummy value I am confident will not appear anywhere in the log) so that I could find these again later.Second S/R:
SEARCH:^(?!.*(TERM1|TERM2|TERM3)).*\R
REPLACE: leave emptyBecause the change in the first S/R now also makes the Java errors get merged into the line that precedes them, I only need to search for lines that do not contain my search terms and remove them. This leaves me with only the lines I care about, including any associated Java errors.
Third S/R:
SEARCH:qzqqz
REPLACE:\r\n
This just replaces the dummy value that I used in the first S/R with the line break it represented, returning the errors to their original formatting.
Once again, thanks! Your method definitely helped me a lot.
-
Hi, All,
After a Scott Sumner’s e-email to me, who suggested a possible connection between the SnoringFrog’s problem and this newer post, below, on 11/16/16 :
https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8
we, indeed, can use this general method described, to build an other regex, which achieve the same goal :-))
So, starting with SnoringFrog’s original text, below :
2016-08-07 05:26:54,688 Place3-1463 ERROR Logger3 Error on randomThing java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Something happened 2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Place3-1438 DEBUG Logger2 delete successful for thething{id=123456} 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,735 Place3-1438 DEBUG Logger2 Sending somthing place://thing.rpc:15478 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,297 Place3-1463 DEBUG Logger2 delete successful 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,750 Item1-8607 DEBUG Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
As he said, in his second post :
In English, it’s going through a log file to pick out any line that mentions any of the 3 terms ( TERM 1, TERM2 or TERM3 ) I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.
If, in addition, we consider that all this text could be indented, the regex which matches, all what he would like to, is, therefore :
SEARCH :
(?-is)^.+(?:TERM1|TERM2|TERM3).*\R?(?:(?s)\h*java.+?(?=^\h*2))?
Notes :
-
I wrote
\R?
, just in case a line, containing the stringTERM(123]
, would end the current file, without any EOL character ! -
The java error block
(?s)java.+?(?=^2)
, which is optional, is, then, written((?s)java.+?(?=^2))?
-
The
\h
syntax matches, either, a single space (\x20
) , tabulation(\x09
) or a no-breaking space (\xa0
) -
This regex contains only TWO non capturing groups, with syntax beginning by
(?:....
So, when this regex is, now, EMBEDDED in the global regex
(?s)^.*?(Your regex to match)|(?s).*\z
, described in :https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8
We obtain, for the SnoringFrog’s problem, the final S/R, below :
SEARCH
(?s)^.*?((?-is)^.+(?:TERM1|TERM2|TERM3).*\R?((?s)\h*java.+?(?=^\h*2))?)|(?s).*\z
REPLACE
(?1\1)
Note : In replacement, I do not add a line break,
\r\n
, as the searched expression matched, already contains these EOL charactersAfter replacement, we get, with that second method, the same expected text, below :
2016-08-07 05:26:54,704 Item1-8608 DEBUG Logger2 Last attempt not useful TERM3 2016-08-07 05:26:54,719 Item1-8608 DEBUG Logger2 Successfully closed TERM2 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e 2016-08-07 05:26:55,329 Place3-1463 ERROR Logger3 Error on TERM2. java.util.concurrent.TimeoutException at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at com.java.class.functionClass.java:15 at java.lang.Thread.run(Thread.java:745) 2016-08-07 05:26:54,782 Item1-8605 DEBUG Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
Best Regards
guy038
-
-
The marker ID used for bookmarks changed in Notepad++ 8.4.6 (and later). It is now 20, instead of 24. So, all references to 24 in this thread and/or its script(s), should be changed to 20.