Community
    • Login

    Remove unmarked lines (not just unbookmarked lines)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 5 Posters 16.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • SnoringFrogS
      SnoringFrog
      last edited by SnoringFrog

      Sure, here’s what I did:

      I used this regex to select the text I wanted:

      (TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*
      

      (or this one I have since switched to, using the . matches newline option: (TERM1|TERM2|TERM3).*?(?=^2) (both match the same selection of lines))

      In English, it’s going through a log file to pick out any line that mentions any of the 3 terms I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.

      Here’s a simulated example of what that log file looks like:

      2016-08-07 05:26:54,688	Place3-1463	ERROR	Logger3	Error  on randomThing
      java.util.concurrent.TimeoutException
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at java.lang.Thread.run(Thread.java:745)
      2016-08-07 05:26:54,704	Item1-8608	DEBUG	Logger2	Something happened
      2016-08-07 05:26:54,704	Item1-8608	DEBUG	Logger2	Last attempt not useful TERM3
      2016-08-07 05:26:54,719	Place3-1438	DEBUG	Logger2	delete successful for thething{id=123456}
      2016-08-07 05:26:54,719	Item1-8608	DEBUG	Logger2	Successfully closed TERM2
      2016-08-07 05:26:54,735	Place3-1438	DEBUG	Logger2	Sending somthing place://thing.rpc:15478
      2016-08-07 05:26:54,750	Item1-8607	DEBUG	Logger2	stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
      2016-08-07 05:26:54,782	Item1-8605	DEBUG	Logger2	function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
      2016-08-07 05:26:55,297	Place3-1463	DEBUG	Logger2	delete successful
      2016-08-07 05:26:55,329	Place3-1463	ERROR	Logger3	Error  on TERM2.
      java.util.concurrent.TimeoutException
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at com.java.class.functionClass.java:15
      	at java.lang.Thread.run(Thread.java:745)
      2016-08-07 05:26:54,750	Item1-8607	DEBUG	Logger2	stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
      2016-08-07 05:26:54,782	Item1-8605	DEBUG	Logger2	function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
      

      And when I apply the mark with that regex:
      marked log text

      If Bookmark line applied a bookmark to every marked line, there would be no issue, but as you can see on line 19, only the first line of a match is bookmarked, even though the entire match is marked. This means that when I use the Remove Unmarked Lines function:

      bookmark line

      Only the 3 bookmarked will be preserved, rather than all 12 marked lines

      So, I need a way to do one of the following:

      • Remove all unmarked lines (most helpful)
      • Bookmark every line of a multiline selection
      • Construct a new regex that will mark only the lines I don’t care about (least helpful, and I’ve already spent a few hours attempting this, to no avail)
      1 Reply Last reply Reply Quote 0
      • SnoringFrogS
        SnoringFrog
        last edited by

        Just noticed (too late to edit) that there was an inadvertent space at the beginning of my regex in that first screenshot. So line 17 should also be selected. With the regex corrected, this is what the marking produces:

        marked lines

        1 Reply Last reply Reply Quote 0
        • MAPJe71M
          MAPJe71
          last edited by

          The 5 marks (i.e. matches for your RE) should be preserved which you would like to have been 14 bookmarks instead of the shown 5.

          1 Reply Last reply Reply Quote 0
          • SnoringFrogS
            SnoringFrog
            last edited by

            @MAPJe71 The marks are not preserved though; most of the 4th mark (which spans 10 lines) is removed when I select Remove unmarked lines. As can be seen in the below gif:

            Remove unmarked lines

            If 14 bookmarks had been generated, Remove unmarked lines would have done what I expected. I imagine this is best implemented as an additional checkbox. If Bookmark line is checked, then the option Bookmark each line in multiline match would become available to be checked.

            Alternatively, if Remove unmarked lines actually removed lines that do not contain marks (rather than lines that do not contain bookmarks), that also would have accomplished what I needed. If that menu option was instead named Remove unbookmarked lines, I wouldn’t expect it to do what I wanted here.

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • MAPJe71M
              MAPJe71
              last edited by

              Maybe use a macro or a Python script?

              For the macro I was thinking of extending the Find-what RE to ^(?:\d{4}(?:-\d{2}){2}).*?(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)* (using the Find tab not the Mark tab), copy the match to a new document and run that multiple times.

              1 Reply Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank @SnoringFrog
                last edited by

                @SnoringFrog

                don’t see how this could be done using native npp functionality easily.
                A python script could look like this.

                regex = '(TERM1|TERM2|TERM3)(.*\r\n^[a-zA-Z](.*\r\n^[^2].*$)+)*'
                matches = []
                MARK_BOOKMARK = 24
                
                editor.research(regex, lambda m: matches.append(m.span()))
                for match in matches:
                    start = editor.lineFromPosition(match[0])
                    end = editor.lineFromPosition(match[1])
                    if start == end:
                        editor.markerAdd(start, MARK_BOOKMARK)
                    else:
                        for i in range(start,end+1):
                            editor.markerAdd(i, MARK_BOOKMARK)
                

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello SnoringFrog and All,

                  Sorry for that late reply. It’s just my summer holidays, in Brittany, with a wonderful weather, since a week !

                  SnoringFrog, finally, your problem is about the Java report error, which lies on several lines, in your file. So, why don’t you gather all these lines in an unique line ? By that means, you don’t even need to bookmark anything !

                  Just one assumption : the lines, of the report Java error, from the second one, always begin with some blank characters ( four spaces in your example)

                  Then, the job can be split in tree steps :

                  • Gather all lines of any Java report error block, into a single physical line.

                  • Delete any line, which does NOT contain the lower case word java AND the upper words TERM1, TERM2 and TERM3, as well as a possible line of a Java report error block, which may follow it.

                  • Restore the original form of all the remaining Java report error blocks.


                  Right ! Let’s go ( All the following S/R need the Regular expression search mode )

                  First of all, make a copy of your original log file ( One never knows ! )

                  Now, from your original example, below :

                  2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
                  java.util.concurrent.TimeoutException
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                  2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
                  2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                  2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
                  2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
                  2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                  java.util.concurrent.TimeoutException
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  

                  The following simple S/R !

                  SEARCH : \R(\h+at ) , with a space, between the string “at” and the closing round bracket

                  REPLACE : \1

                  gives the following text, below :

                  2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
                  java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                  2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
                  2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                  2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
                  2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
                  2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                  java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  

                  Then, the second S/R gets rid of all the unwanted lines :

                  SEARCH : (?-is)^(?!.*(java|TERM[123])).*\R(?:\h*java.*\R)?

                  REPLACE : Leave EMPTY

                  and gives the resulting text, below :

                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                  2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                  java.util.concurrent.TimeoutException    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at com.java.class.functionClass.java:15    at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  

                  Finally, the simple S/R, below :

                  SEARCH : \h+at\x20

                  REPLACE : \r\n$0

                  restore the original appearance of the Java report error block !

                  2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                  2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                  java.util.concurrent.TimeoutException
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at com.java.class.functionClass.java:15
                      at java.lang.Thread.run(Thread.java:745)
                  2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                  

                  This text does contain the lines 12, 14, 17, 19 and 30, of your original text, which contains the word TERM[123], as well as the 9 lines of the second Java report error, between lines 20 and 28 :-))


                  NOTES on the regexes :

                  • In the first S/R, the \R form stands for any kind of EOL characters and the \1 syntax, in replacement, represents the string at, with some leading blank characters and a space after, stored as group1, in the searched part

                  • The syntax (?-is), beginning the second S/R, forces the regex engine to consider the search as sensitive to the case and the DOT as the expression of a standard character, only

                  • Then, the negative look-ahead ^(?!.*(java|TERM[123])), try to match any line, which does NOT contain, from the beginning, the strings “java”, “TERM1”, “TERM2” and “TERM3”, in this exact case

                  • The following .*\R form, represents all the contents of that line

                  • Finally, a possible non-capturing group (?:\h*java.*\R)? stands for the line of a POSSIBLE Java report error, NOT preceded by a line, containing the string TERM[123] and which, consequently, must, also, be deleted

                  • In the third S/R, the $0 represents the totality of the searched string, that is to say, the word at, with some leading blank characters and a space after


                  Best regards,

                  guy038

                  P.S. :

                  In addition, the regexes, given above, still work if all your text, itself, is already indented with some blank characters :-))

                  SnoringFrogS 1 Reply Last reply Reply Quote 2
                  • SnoringFrogS
                    SnoringFrog @guy038
                    last edited by SnoringFrog

                    @guy038 This does seem like the best method I have available right now. I ended up changing some of your regexes a bit, but the basic idea was the same. Collapsing the java errors temporarily was definitely the trick I needed.

                    In case you are interested, here are the steps I used:

                    First S/R:
                    SEARCH: \R([^2])
                    REPLACE: qzqqz\1

                    I noticed a few lines in some java errors that did not begin with spaces (which my example does not account for), but I also know that java errors are the only lines in these logs that do not begin with “2”, so I used that to find them. Since I could not depend on \h to re-expand the errors later, I inserted “qzqqz” (a dummy value I am confident will not appear anywhere in the log) so that I could find these again later.

                    Second S/R:
                    SEARCH: ^(?!.*(TERM1|TERM2|TERM3)).*\R
                    REPLACE: leave empty

                    Because the change in the first S/R now also makes the Java errors get merged into the line that precedes them, I only need to search for lines that do not contain my search terms and remove them. This leaves me with only the lines I care about, including any associated Java errors.

                    Third S/R:
                    SEARCH: qzqqz
                    REPLACE: \r\n

                    This just replaces the dummy value that I used in the first S/R with the line break it represented, returning the errors to their original formatting.

                    Once again, thanks! Your method definitely helped me a lot.

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, All,

                      After a Scott Sumner’s e-email to me, who suggested a possible connection between the SnoringFrog’s problem and this newer post, below, on 11/16/16 :

                      https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

                      we, indeed, can use this general method described, to build an other regex, which achieve the same goal :-))

                      So, starting with SnoringFrog’s original text, below :

                      2016-08-07 05:26:54,688 Place3-1463 ERROR   Logger3 Error  on randomThing
                      java.util.concurrent.TimeoutException
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at java.lang.Thread.run(Thread.java:745)
                      2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Something happened
                      2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                      2016-08-07 05:26:54,719 Place3-1438 DEBUG   Logger2 delete successful for thething{id=123456}
                      2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                      2016-08-07 05:26:54,735 Place3-1438 DEBUG   Logger2 Sending somthing place://thing.rpc:15478
                      2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(12345) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                      2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                      2016-08-07 05:26:55,297 Place3-1463 DEBUG   Logger2 delete successful
                      2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                      java.util.concurrent.TimeoutException
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at java.lang.Thread.run(Thread.java:745)
                      2016-08-07 05:26:54,750 Item1-8607  DEBUG   Logger2 stuff(50509) Cmd: 30 10 49 38 e4 a8 00 12 1a 4b 1e  
                      2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                      

                      As he said, in his second post :

                      In English, it’s going through a log file to pick out any line that mentions any of the 3 terms ( TERM 1, TERM2 or TERM3 ) I care about. If a line with one of those terms is immediately followed by a java error (the only lines in the file that do not begin with “2”), the entirety of that error is selected as well.

                      If, in addition, we consider that all this text could be indented, the regex which matches, all what he would like to, is, therefore :

                      SEARCH : (?-is)^.+(?:TERM1|TERM2|TERM3).*\R?(?:(?s)\h*java.+?(?=^\h*2))?

                      Notes :

                      • I wrote \R?, just in case a line, containing the string TERM(123], would end the current file, without any EOL character !

                      • The java error block (?s)java.+?(?=^2), which is optional, is, then, written ((?s)java.+?(?=^2))?

                      • The \h syntax matches, either, a single space ( \x20 ) , tabulation( \x09 ) or a no-breaking space ( \xa0 )

                      • This regex contains only TWO non capturing groups, with syntax beginning by (?:....


                      So, when this regex is, now, EMBEDDED in the global regex (?s)^.*?(Your regex to match)|(?s).*\z, described in :

                      https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

                      We obtain, for the SnoringFrog’s problem, the final S/R, below :

                      SEARCH (?s)^.*?((?-is)^.+(?:TERM1|TERM2|TERM3).*\R?((?s)\h*java.+?(?=^\h*2))?)|(?s).*\z

                      REPLACE (?1\1)

                      Note : In replacement, I do not add a line break, \r\n, as the searched expression matched, already contains these EOL characters

                      After replacement, we get, with that second method, the same expected text, below :

                      2016-08-07 05:26:54,704 Item1-8608  DEBUG   Logger2 Last attempt not useful TERM3
                      2016-08-07 05:26:54,719 Item1-8608  DEBUG   Logger2 Successfully closed TERM2
                      2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                      2016-08-07 05:26:55,329 Place3-1463 ERROR   Logger3 Error  on TERM2.
                      java.util.concurrent.TimeoutException
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at com.java.class.functionClass.java:15
                          at java.lang.Thread.run(Thread.java:745)
                      2016-08-07 05:26:54,782 Item1-8605  DEBUG   Logger2 function(TERM1) Rsp: 30 10 49 38 e4 a8 00 12 1a 4b 1e
                      

                      Best Regards

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by

                        The marker ID used for bookmarks changed in Notepad++ 8.4.6 (and later). It is now 20, instead of 24. So, all references to 24 in this thread and/or its script(s), should be changed to 20.

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors