Remove unmarked lines (not just unbookmarked lines)



  • Search -> Bookmark -> Remove Unmarked Lines only removes unbookmarked lines (which makes enough sense, given that it’s under the Bookmark menu, but maybe it should be renamed considering “Mark” is also a term used in NP++?). I’m looking for a way to remove all unmarked lines, but so far can’t find one.

    My specific case was this: I used a regex that matched multi-line results to mark/bookmark lines. However, the Bookmark line option of marking only bookmarks the first line of that match (while marking the entirety of it), so following this with a Remove Unmarked Lines did not preserve all of the lines I wanted. Is there a way in NP++ to accomplish what I’m trying to do more effectively? Or is the only way to do this through either adding a non-bookmark oriented “Remove Unmarked Lines” function or through enabling Bookmark line to bookmark all lines of a multiline match?



  • @SnoringFrog

    could you give us an example of what you did?
    Maybe a screenshot as well?

    Cheers
    Claudia



  • Sure, here’s what I did:

    I used this regex to select the text I wanted:

    (TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*
    

    (or this one I have since switched to, using the . matches newline option: (TERM1|TERM2|TERM3).*?(?=^2) (both match the same selection of lines))

    In English, it’s going through a log file to pick out any line that mentions any of the 3 terms I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.

    Here’s a simulated example of what that log file looks like:

    2016-08-07 05:26:54,688	Place3-1463	ERROR	Logger3	Error  on randomThing
    java.util.concurrent.TimeoutException
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,704	Item1-8608	DEBUG	Logger2	Something happened
    2016-08-07 05:26:54,704	Item1-8608	DEBUG	Logger2	Last attempt not useful TERM3
    2016-08-07 05:26:54,719	Place3-1438	DEBUG	Logger2	delete successful for thething{id=123456}
    2016-08-07 05:26:54,719	Item1-8608	DEBUG	Logger2	Successfully closed TERM2
    2016-08-07 05:26:54,735	Place3-1438	DEBUG	Logger2	Sending somthing place://thing.rpc:15478
    2016-08-07 05:26:54,750	Item1-8607	DEBUG	Logger2	stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782	Item1-8605	DEBUG	Logger2	function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,297	Place3-1463	DEBUG	Logger2	delete successful
    2016-08-07 05:26:55,329	Place3-1463	ERROR	Logger3	Error  on TERM2.
    java.util.concurrent.TimeoutException
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at com.java.class.functionClass.java:15
    	at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,750	Item1-8607	DEBUG	Logger2	stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782	Item1-8605	DEBUG	Logger2	function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    And when I apply the mark with that regex:
    marked log text

    If Bookmark line applied a bookmark to every marked line, there would be no issue, but as you can see on line 19, only the first line of a match is bookmarked, even though the entire match is marked. This means that when I use the Remove Unmarked Lines function:

    bookmark line

    Only the 3 bookmarked will be preserved, rather than all 12 marked lines

    So, I need a way to do one of the following:

    • Remove all unmarked lines (most helpful)
    • Bookmark every line of a multiline selection
    • Construct a new regex that will mark only the lines I don’t care about (least helpful, and I’ve already spent a few hours attempting this, to no avail)


  • Just noticed (too late to edit) that there was an inadvertent space at the beginning of my regex in that first screenshot. So line 17 should also be selected. With the regex corrected, this is what the marking produces:

    marked lines



  • The 5 marks (i.e. matches for your RE) should be preserved which you would like to have been 14 bookmarks instead of the shown 5.



  • @MAPJe71 The marks are not preserved though; most of the 4th mark (which spans 10 lines) is removed when I select Remove unmarked lines. As can be seen in the below gif:

    Remove unmarked lines

    If 14 bookmarks had been generated, Remove unmarked lines would have done what I expected. I imagine this is best implemented as an additional checkbox. If Bookmark line is checked, then the option Bookmark each line in multiline match would become available to be checked.

    Alternatively, if Remove unmarked lines actually removed lines that do not contain marks (rather than lines that do not contain bookmarks), that also would have accomplished what I needed. If that menu option was instead named Remove unbookmarked lines, I wouldn’t expect it to do what I wanted here.



  • Maybe use a macro or a Python script?

    For the macro I was thinking of extending the Find-what RE to ^(?:\d{4}(?:-\d{2}){2}).*?(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)* (using the Find tab not the Mark tab), copy the match to a new document and run that multiple times.



  • @SnoringFrog

    don’t see how this could be done using native npp functionality easily.
    A python script could look like this.

    regex = '(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*'
    matches = []
    MARK_BOOKMARK = 24
    
    editor.research(regex, lambda m: matches.append(m.span()))
    for match in matches:
        start = editor.lineFromPosition(match[0])
        end = editor.lineFromPosition(match[1])
        if start == end:
            editor.markerAdd(start, MARK_BOOKMARK)
        else:
            for i in range(start,end+1):
                editor.markerAdd(i, MARK_BOOKMARK)
    

    Cheers
    Claudia



  • Hello SnoringFrog and All,

    Sorry for that late reply. It’s just my summer holidays, in Brittany, with a wonderful weather, since a week !

    SnoringFrog, finally, your problem is about the Java report error, which lies on several lines, in your file. So, why don’t you gather all these lines in an unique line ? By that means, you don’t even need to bookmark anything !

    Just one assumption : the lines, of the report Java error, from the second one, always begin with some blank characters ( four spaces in your example)

    Then, the job can be split in tree steps :

    • Gather all lines of any Java report error block, into a single physical line.

    • Delete any line, which does NOT contain the lower case word java AND the upper words TERM1, TERM2 and TERM3, as well as a possible line of a Java report error block, which may follow it.

    • Restore the original form of all the remaining Java report error blocks.


    Right ! Let’s go ( All the following S/R need the Regular expression search mode )

    First of all, make a copy of your original log file ( One never knows ! )

    Now, from your original example, below :

    2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    The following simple S/R !

    SEARCH : \R(\h+at ) , with a space, between the string “at” and the closing round bracket

    REPLACE : \1

    gives the following text, below :

    2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
    java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    Then, the second S/R gets rid of all the unwanted lines :

    SEARCH : (?-is)^(?!.*(java|TERM[123])).*\R(?:\h*java.*\R)?

    REPLACE : Leave EMPTY

    and gives the resulting text, below :

    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    Finally, the simple S/R, below :

    SEARCH : \h+at\x20

    REPLACE : \r\n$0

    restore the original appearance of the Java report error block !

    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    This text does contain the lines 12, 14, 17, 19 and 30, of your original text, which contains the word TERM[123], as well as the 9 lines of the second Java report error, between lines 20 and 28 :-))


    NOTES on the regexes :

    • In the first S/R, the \R form stands for any kind of EOL characters and the \1 syntax, in replacement, represents the string at, with some leading blank characters and a space after, stored as group1, in the searched part

    • The syntax (?-is), beginning the second S/R, forces the regex engine to consider the search as sensitive to the case and the DOT as the expression of a standard character, only

    • Then, the negative look-ahead ^(?!.*(java|TERM[123])), try to match any line, which does NOT contain, from the beginning, the strings “java”, “TERM1”, “TERM2” and “TERM3”, in this exact case

    • The following .*\R form, represents all the contents of that line

    • Finally, a possible non-capturing group (?:\h*java.*\R)? stands for the line of a POSSIBLE Java report error, NOT preceded by a line, containing the string TERM[123] and which, consequently, must, also, be deleted

    • In the third S/R, the $0 represents the totality of the searched string, that is to say, the word at, with some leading blank characters and a space after


    Best regards,

    guy038

    P.S. :

    In addition, the regexes, given above, still work if all your text, itself, is already indented with some blank characters :-))



  • @guy038 This does seem like the best method I have available right now. I ended up changing some of your regexes a bit, but the basic idea was the same. Collapsing the java errors temporarily was definitely the trick I needed.

    In case you are interested, here are the steps I used:

    First S/R:
    SEARCH: \R([^2])
    REPLACE: qzqqz\1

    I noticed a few lines in some java errors that did not begin with spaces (which my example does not account for), but I also know that java errors are the only lines in these logs that do not begin with “2”, so I used that to find them. Since I could not depend on \h to re-expand the errors later, I inserted “qzqqz” (a dummy value I am confident will not appear anywhere in the log) so that I could find these again later.

    Second S/R:
    SEARCH: ^(?!.*(TERM1|TERM2|TERM3)).*\R
    REPLACE: leave empty

    Because the change in the first S/R now also makes the Java errors get merged into the line that precedes them, I only need to search for lines that do not contain my search terms and remove them. This leaves me with only the lines I care about, including any associated Java errors.

    Third S/R:
    SEARCH: qzqqz
    REPLACE: \r\n

    This just replaces the dummy value that I used in the first S/R with the line break it represented, returning the errors to their original formatting.

    Once again, thanks! Your method definitely helped me a lot.



  • Hi, All,

    After a Scott Sumner’s e-email to me, who suggested a possible connection between the SnoringFrog’s problem and this newer post, below, on 11/16/16 :

    https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

    we, indeed, can use this general method described, to build an other regex, which achieve the same goal :-))

    So, starting with SnoringFrog’s original text, below :

    2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    As he said, in his second post :

    In English, it’s going through a log file to pick out any line that mentions any of the 3 terms ( TERM 1, TERM2 or TERM3 ) I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.

    If, in addition, we consider that all this text could be indented, the regex which matches, all what he would like to, is, therefore :

    SEARCH : (?-is)^.+(?:TERM1|TERM2|TERM3).*\R?(?:(?s)\h*java.+?(?=^\h*2))?

    Notes :

    • I wrote \R?, just in case a line, containing the string TERM(123], would end the current file, without any EOL character !

    • The java error block (?s)java.+?(?=^2), which is optional, is, then, written ((?s)java.+?(?=^2))?

    • The \h syntax matches, either, a single space ( \x20 ) , tabulation( \x09 ) or a no-breaking space ( \xa0 )

    • This regex contains only TWO non capturing groups, with syntax beginning by (?:....


    So, when this regex is, now, EMBEDDED in the global regex (?s)^.*?(Your regex to match)|(?s).*\z, described in :

    https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

    We obtain, for the SnoringFrog’s problem, the final S/R, below :

    SEARCH (?s)^.*?((?-is)^.+(?:TERM1|TERM2|TERM3).*\R?((?s)\h*java.+?(?=^\h*2))?)|(?s).*\z

    REPLACE (?1\1)

    Note : In replacement, I do not add a line break, \r\n, as the searched expression matched, already contains these EOL characters

    After replacement, we get, with that second method, the same expected text, below :

    2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
    2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
    java.util.concurrent.TimeoutException
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at com.java.class.functionClass.java:15
        at java.lang.Thread.run(Thread.java:745)
    2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
    

    Best Regards

    guy038


Log in to reply