Delete all lines except some lines with specific words plus their top and bottom lines
-
@Neil Schipper
The ‘1112’ is the specific word repeated. As you see I need the lines containing this word plus two lines above and one line bottom of those lines. -
Can we think about this data as consisting of records, with each record consisting of 7 lines, with 1st line always a date, 2nd always exactly (or something like) Int SA/LK , next 4 lines always of form
dddd SA ddd
(where d is a decimal digit, and with the exception that 1st of the 4 lines has trailing space + dblQuote), and, 7th line always of formA=dd ...
?Can you say how uniform (regular, consistent) is each element of each line of each record of each file you need to process?
If this could be nailed down, the problem could become a fairly easy one, in which we identify (match) complete records based on reliable characteristics for keeping or removing.
-
This post is deleted! -
@Neil Schipper
1 - No, there are records with 8 and 11 lines, too.
2 - The headers are completely consistent in all records. -
@alimirzaei5778 said in Delete all lines except some lines with specific words plus their top and bottom lines:
1 - No, there are records with 8 and 11 lines, too.
And you’re not willing to elaborate on the characteristics of those additional lines, such as whether the additional lines are perfect or approximate repeats of lines 4-6 in the original 7 line records?
Records with a non-constant number of lines can still be matched and acted upon.
2 - The headers are completely consistent in all records.
And you’re expecting me to fucking guess what your definition of a header is?
Some advice:
Do the intellectual work necessary to establish a rock solid description of a record.
Don’t only communicate one or two sentences at a time like a 14 year old.
Don’t make people have to beg you for precision and completeness.
-
@alimirzaei5778
I agree entirely with @Neil-Schipper remarks (excluding the profanity). If you want help you do need to explain more fully. Possibly English isn’t your native language and we do try and make allowances for that.There is one thing you haven’t done and that is to put the examples inside of the “black box”. This is a method of preventing the posting engine possibly altering the example data as it tries to format the posts. There is a pinned post at the start of “Help Wanted” and “General Discussion” categories called “Please Read this before Posting”. It’s not there to look nice but to help the newcomer, that’s you, in showing how you might provide sufficient information to be clear enough for anyone who wants to help to do so.
So before you do any more postings, how about reading that and try to follow those guidelines.
That said, I do think I have an solution for you, however that relies on the example being correctly shown. As you didn’t use the “black box” method of showing the data it’s possible your real data may be “indented” (preceded by spaces) which are removed during posting so I will withhold my solution until you can provide more information, including the example data inside of a “black box”.
Terry
-
Here’s some good info on asking these types of questions: https://community.notepad-plus-plus.org/topic/22022/faq-desk-template-for-search-replace-questions
-
@Terry-R
@neil-schipper
First of all, I have to apologize for getting you all into trouble due to not following the instructions. I attach a larger piece oftext
in more detail which I hope could help.sunday 10-may-2020 00:00 cc 8M+ lp 3.3 svp4.3 ts 40 +2 lr 35' SA 168 ab 50 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 AB 20# 20 2 2# - -# - -# - -# 14 1109 SA 20 ' A 11# 62 3 4# 60 3 3# - -# - -# 86 1109 SA 21 ' B 11# 60 2 3# 62 2 3# 81 4 5# 61 3 3# 83 1109 SA 22 C 13# 0 0 0# 0 0 0# - -# - -# 15 A=35 B=<40> C=25 sunday 10-may-2020 00:00 cc 71 lp 1.3 svp1.3 ts 63 +0 lr 63" SA 378 ab 104 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 21# 93 6 9# 98 7 11> 104 6 11# - -# 92 1112 SA 379 B 21# 29 4 3# - -# - -# - -# 24 1112 SA 380 ^" A 12# 96 3 7# 30 2 2# - -# - -# 80 1112 SA 381 " C 16# 38 2 3# 64 4 4# - -# - -# 67 A=25 B=<50> C=25 sunday 10-may-2020 00:00 cc 31 lp 4.3 svp4.3 ts 62 +0 lr 53' SA 215 ab 64 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1086 SA 178 ' A 25# 18 3 2# 48 6 6# - -# - -# 68 1086 SA 179 A 25# 44 4 5# - -# - -# - -# 61 1086 SA 180 B 19# 19 2 1# - -# - -# - -# 23 1086 SA 181 B 19# 0 0 0# - -# - -# - -# 14 1086 SA 182 B 19# 46 4 4# - -# - -# - -# 46 1086 SA 183 B 19# 49 3 4# - -# - -# - -# 38 1086 SA 184 ' C 22# 44 5 4# 40 5 4# - -# - -# 61 1086 SA 185 C 22# 29 4 3# - -# - -# - -# 44 A=35 B=25 C=<40> sunday 10-may-2020 00:00 cc 11 lp 1.1 svp1.1 ts 66 +0 lr 65" SA 38 ab 31 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1116 SA 36 " AB 24# 13 2 1# 17 3 2# 29 4 3# - -# 36 1116 SA 37 ' CD 51# 9 3 2# 25 6 5# - -# - -# 41 1116 SA 38 " B 24# 23 6 2# 31 6 3# 31 7 3# - -# 38 1116 SA 39 ' D 26# 31 5 4# 30 5 4# 25 3 3# - -# 34 1116 SA 40 C 25# 24 5 3# - -# - -# - -# 31 A=1 B=<40> C=25 D=35 sunday 10-may-2020 00:00 cc 8M+ lp 1.3 svp1.3 ts 40 +0 lr 35' SA 167 ab 90 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 AB 26# 26 2 3# - -# - -# - -# 19 1109 SA 20 ' A 13# 69 4 5# 70 3 5# - -# - -# 74 1109 SA 21 ' B 15# 57 2 4# 53 2 4# 65 3 5# 62 4 4# 75 1109 SA 22 C 13# 34 2 2# 66 3 4# - -# - -# 35 A=30 B=<35> C=35 sunday 10-may-2020 00:00 cc 1 lp 2.3 svp2.3 ts 92 +0 lr 76" SA 1 ab 64 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1080 SA 1 D 16# 0 0 0# 34 3 2# 18 2 1# - -# 64 1080 SA 2 BC 52# 13 4 3# 29 6 8# 0 0 0# - -# 52 1080 SA 3 " AB 68# 40 14 13# 47 20 17# - -# - -# 51 1080 SA 4 A 32# 34 6 6# 0 0 0# - -# - -# 36 A=26 B=<34> C=15# D=24 sunday 10-may-2020 00:01 cc 71 lp 2.3 svp2.3 ts 61 +0 lr 50" SA 381 ab 73 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 25# 42 5 5# 65 7 9# 62 5 8# - -# 86 1112 SA 379 B 25# 31 5 4# - -# - -# - -# 28 1112 SA 380 ^" A 13# 49 3 4# 79 4 5# - -# - -# 77 1112 SA 381 " C 18# 73 3 6# 53 4 4# - -# - -# 74 A=35 B=<40> C=25 sunday 10-may-2020 00:01 cc 31 lp 4.3 svp4.3 ts 50 +0 lr 50' SA 215 ab 63 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1086 SA 178 ' A 24# 22 3 2# 56 6 7# - -# - -# 58 1086 SA 179 A 24# 49 6 5# - -# - -# - -# 52 1086 SA 180 B 19# 19 2 1# - -# - -# - -# 26 1086 SA 181 B 19# 0 0 0# - -# - -# - -# 5 1086 SA 182 B 19# 46 4 4# - -# - -# - -# 51 1086 SA 183 B 19# 49 3 4# - -# - -# - -# 49 1086 SA 184 ' C 23# 27 2 3# 46 6 5# - -# - -# 50 1086 SA 185 C 23# 34 4 3# - -# - -# - -# 39 A=35 B=25 C=<40> sunday 10-may-2020 00:01 cc 8M+ lp 1.3 svp1.3 ts 40 +0 lr 35' SA 167 ab 90 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 AB 26# 26 2 3# - -# - -# - -# 23 1109 SA 20 ' A 13# 69 4 5# 70 3 5# - -# - -# 60 1109 SA 21 ' B 15# 57 2 4# 53 2 4# 65 3 5# 62 4 4# 71 1109 SA 22 C 13# 34 2 2# 66 3 4# - -# - -# 55 A=30 B=<35> C=35 sunday 10-may-2020 00:01 cc 11 lp 1.1 svp1.1 ts 60 +0 lr 60" SA 37 ab 46 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1116 SA 36 " AB 27# 11 2 1# 25 5 3# 15 4 1# - -# 32 1116 SA 37 " CD 58# 11 4 2# 46 13 10# - -# - -# 47 1116 SA 38 " B 27# 30 4 3# 34 4 4# 35 10 4# - -# 33 1116 SA 39 " D 27# 30 5 4# 37 6 5# 31 3 4# - -# 36 1116 SA 40 C 31# 37 8 5# - -# - -# - -# 46 A=1 B=<40> C=25 D=35 sunday 10-may-2020 00:02 cc 8M+ lp 4.3 svp4.3 ts 49-10 lr 50' SA 167 ab 90 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 AB 38# 14 3 2# - -# - -# - -# 21 1109 SA 20 ' A 12# 99 6 7# 58 4 4# - -# - -# 72 1109 SA 21 ' B 28# 0 0 0# 22 2 3# 39 4 6# 21 3 3# 73 1109 SA 22 C 14# 41 2 3# 57 3 3# - -# - -# 65 A=30 B=<45> C=25 sunday 10-may-2020 00:02 cc 71 lp 2.3 svp3.3 ts 55 +0 lr 50" SA 381 ab 73 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 25# 42 5 5# 65 7 9# 62 5 8# - -# 74 1112 SA 379 B 25# 31 5 4# - -# - -# - -# 31 1112 SA 380 ^' A 16# 39 3 4# 39 2 3# - -# - -# 59 1112 SA 381 " C 18# 73 3 6# 53 4 4# - -# - -# 84 A=35 B=<40> C=25 sunday 10-may-2020 00:02 cc 31 lp 2.3 svp2.3 ts 45 +0 lr 43' SA 215 ab 65 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1086 SA 178 ' A 24# 22 3 2# 56 6 7# - -# - -# 65 1086 SA 179 A 24# 49 6 5# - -# - -# - -# 57 1086 SA 180 B 18# 0 0 0# - -# - -# - -# 12 1086 SA 181 B 18# 0 0 0# - -# - -# - -# 0 1086 SA 182 B 18# 35 3 3# - -# - -# - -# 51 1086 SA 183 B 18# 60 3 4# - -# - -# - -# 68 1086 SA 184 ' C 23# 27 2 3# 46 6 5# - -# - -# 53 1086 SA 185 C 23# 34 4 3# - -# - -# - -# 38 A=35 B=30 C=<35> sunday 10-may-2020 00:02 cc 1 lp 3.3 svp3.3 ts 83 +0 lr 74" SA 3 ab 51 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1080 SA 1 D 18# 42 2 3# 36 2 3# 28 2 2# - -# 49 1080 SA 2 BC 45# 21 4 5# 63 13 14# 0 0 0# - -# 61 1080 SA 3 " AB 56# 35 9 10# 48 13 14# - -# - -# 55 1080 SA 4 A 28# 46 7 7# 0 0 0# - -# - -# 39 A=28 B=<34> C=15# D=20 sunday 10-may-2020 00:02 cc 11 lp 2.1 svp2.1 ts 60 +0 lr 60" SA 40 ab 74 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1116 SA 36 " AB 27# 34 3 4# 59 4 7# 49 4 5# - -# 50 1116 SA 37 " CD 35# 27 4 3# 54 8 7# - -# - -# 53 1116 SA 38 " B 27# 39 4 4# 35 4 4# 24 3 3# - -# 39 1116 SA 39 " D 17# 34 2 3# 34 2 3# 32 2 3# - -# 37 1116 SA 40 C 18# 74 5 6# - -# - -# - -# 69 A=1 B=<45> C=25 D=30 sunday 10-may-2020 00:02 cc 8M+ lp 4.3 svp2.3 ts 54 -6 lr 35' SA 166 ab 90 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 ' AB 33# 39 3 6# - -# - -# - -# 29 1109 SA 20 ' A 12# 70 3 5# 28 2 2# - -# - -# 69 1109 SA 21 ' B 23# 10 2 1# 36 3 4# 42 4 5# 42 3 5# 67 1109 SA 22 C 13# 0 0 0# 28 2 2# - -# - -# 50 A=30 B=<45> C=25 sunday 10-may-2020 00:03 cc 31 lp 2.3 svp2.3 ts 45 +0 lr 45" SA 179 ab 72 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1086 SA 178 ' A 18# 52 4 4# 44 4 4# - -# - -# 64 1086 SA 179 ' A 18> 101 4 8# - -# - -# - -# 84 1086 SA 180 B 13> 102 4 5# - -# - -# - -# 51 1086 SA 181 B 13# 66 2 3# - -# - -# - -# 30 1086 SA 182 B 13# 75 4 4# - -# - -# - -# 61 1086 SA 183 B 13# 76 3 4# - -# - -# - -# 74 1086 SA 184 " C 23# 54 6 6# 67 7 8# - -# - -# 79 1086 SA 185 C 23# 61 6 6# - -# - -# - -# 66 A=35 B=30 C=<35> sunday 10-may-2020 00:03 cc 71 lp 1.3 svp1.3 ts 57 +0 lr 73" SA 378 ab 126 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 16# 66 4 5> 111 4 9> 126 6 10# - -# 92 1112 SA 379 B 16# 39 4 3# - -# - -# - -# 35 1112 SA 380 ^' A 17# 49 5 5# 50 2 4# - -# - -# 48 1112 SA 381 " C 16# 15 2 1# 31 3 2# - -# - -# 63 A=25 B=<50> C=25 sunday 10-may-2020 00:03 cc 11 lp 2.1 svp3.1 ts 60 +0 lr 60" SA 39 ab 79 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1116 SA 36 " AB 25# 17 2 2# 33 3 3# 38 3 3# - -# 50 1116 SA 37 " CD 34# 26 3 3# 52 6 6# - -# - -# 57 1116 SA 38 " B 25# 42 4 4# 76 7 8# 55 4 6# - -# 57 1116 SA 39 " D 20# 79 4 7# 63 5 7# 67 5 7# - -# 60 1116 SA 40 C 14# 30 3 2# - -# - -# - -# 58 A=1 B=<45> C=25 D=30 sunday 10-may-2020 00:03 cc 31 lp 4.3 svp4.3 ts 45 +0 lr 37' SA 215 ab 80 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1086 SA 178 ' A 16# 20 2 1# 60 5 5# - -# - -# 62 1086 SA 179 A 16# 56 3 4# - -# - -# - -# 77 1086 SA 180 B 14# 0 0 0# - -# - -# - -# 34 1086 SA 181 B 14# 27 2 1# - -# - -# - -# 34 1086 SA 182 B 14# 30 2 2# - -# - -# - -# 49 1086 SA 183 B 14# 0 0 0# - -# - -# - -# 43 1086 SA 184 ' C 17# 0 0 0# 60 3 5# - -# - -# 81 1086 SA 185 C 17# 53 4 4# - -# - -# - -# 70 A=35 B=25 C=<40> sunday 10-may-2020 00:03 cc 1 lp 3.3 svp1.3 ts 66 +0 lr 66" SA 2 ab 61 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1080 SA 1 D 17# 39 2 3# 84 4 6# 86 5 6# - -# 60 1080 SA 2 ' BC 44# 36 6 8# 75 14 16# 0 0 0# - -# 84 1080 SA 3 " AB 52# 32 8 8# 35 11 9# - -# - -# 48 1080 SA 4 A 24# 40 4 5# 0 0 0# - -# - -# 37 A=28 B=<29> C=15# D=20 sunday 10-may-2020 00:03 cc 8M+ lp 4.3 svp4.3 ts 53 +0 lr 39' SA 166 ab 84 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1109 SA 19 AB 26# 17 2 2# - -# - -# - -# 24 1109 SA 20 ' A 11# 79 4 5# 69 3 4# - -# - -# 68 1109 SA 21 ' B 17# 0 0 0# 54 3 4# 77 4 7# 39 3 3# 71 1109 SA 22 C 13# 68 2 4# 36 2 2# - -# - -# 57 A=30 B=<45> C=25
The output must be:
sunday 10-may-2020 00:00 cc 71 lp 1.3 svp1.3 ts 63 +0 lr 63" SA 378 ab 104 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 21# 93 6 9# 98 7 11> 104 6 11# - -# 92 1112 SA 379 B 21# 29 4 3# - -# - -# - -# 24 1112 SA 380 ^" A 12# 96 3 7# 30 2 2# - -# - -# 80 1112 SA 381 " C 16# 38 2 3# 64 4 4# - -# - -# 67 A=25 B=<50> C=25 sunday 10-may-2020 00:01 cc 71 lp 2.3 svp2.3 ts 61 +0 lr 50" SA 381 ab 73 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 25# 42 5 5# 65 7 9# 62 5 8# - -# 86 1112 SA 379 B 25# 31 5 4# - -# - -# - -# 28 1112 SA 380 ^" A 13# 49 3 4# 79 4 5# - -# - -# 77 1112 SA 381 " C 18# 73 3 6# 53 4 4# - -# - -# 74 A=35 B=<40> C=25 sunday 10-may-2020 00:02 cc 71 lp 2.3 svp3.3 ts 55 +0 lr 50" SA 381 ab 73 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 25# 42 5 5# 65 7 9# 62 5 8# - -# 74 1112 SA 379 B 25# 31 5 4# - -# - -# - -# 31 1112 SA 380 ^' A 16# 39 3 4# 39 2 3# - -# - -# 59 1112 SA 381 " C 18# 73 3 6# 53 4 4# - -# - -# 84 A=35 B=<40> C=25 sunday 10-may-2020 00:03 cc 71 lp 1.3 svp1.3 ts 57 +0 lr 73" SA 378 ab 126 Int SA/LK hp tp# ab cd ef# ab cd ef# ab cd ef# ab cd ef# Aab 1112 SA 378 " B 16# 66 4 5> 111 4 9> 126 6 10# - -# 92 1112 SA 379 B 16# 39 4 3# - -# - -# - -# 35 1112 SA 380 ^' A 17# 49 5 5# 50 2 4# - -# - -# 48 1112 SA 381 " C 16# 15 2 1# 31 3 2# - -# - -# 63 A=25 B=<50> C=25
-
A good amount of before and after text is a very big help for these kinds of problems, but they don’t eliminate the need for thinking about which features of text are guaranteed to be present, so they can be used as a match criteria.
You continue to not offer much guidance, so I made my own simplifying assumptions about each record:
- Line 1 always starts with <any 3-5 letters><day><space> and we won’t care about the rest (so assumption is there will never be lines that aren’t headers that start this exact way)
- Line 2 will be non-empty but otherwise we won’t care about contents
- Between 4 & 8 lines all of which start with <space><number used to identify records to keep><space> and we won’t care about the rest of the line
- Final line always starts with “A=” and we won’t care about the rest
Thus:
Ctl-h
Fi:^\w{3,5}day .*?\R.+?\R((?! 1112).*?\R){4,8}A=.*?(\R|\z)
Re: completely empty, checked that there are no spaces
Mode=regex; option box unchecked
Then: Replace AllYou can alter the to-keep records by carefully altering just the numeric string in the expression that is now 1112.
-
@neil-schipper said in Delete all lines except some lines with specific words plus their top and bottom lines:
^\w{3,5}day .?\R.+?\R((?! 1112).?\R){4,8}A=.*?(\R|\z)
When I try the suggested solution on the sample data provided by the OP, it retains the lines with 1116 in them. :-(
-
Super interesting.
I get just the four “1112” records, and I just rechecked after freshly copying the regex from my last post (the one embedded in your last post underwent asterisk chomping) and I get the same happy output. Checked on both v8.1.9 (32-bit) and v8.3.3 (64-bit).
Wormhole in the time dilating quark engine plasmatron, gotta be.
-
Hmm, I checked back to the source data I copied for the starting point (since I still had it). It appears to be “messed up”, not sure how it got that way, but user headspace error (mine) is likely to blame.
Copying a fresh set of data, your solution does indeed seem to work. Sorry for the misfire.
-
@Neil-Schipper
Thank you very much indeed. It works great. -
Hello, @alimirzaei5778, @neil-schipper, @terry-r, @alan-kilborn and All,
We can even speed up the search regex process with this syntax :
(?-is)^sunday.+\R.+\R(?:(?! 1112).+\R){4,}A=.+\R?
Indeed, if several consecutive blocks, without the string
\x201112
at beginning of lines , it will select all these blocks, in one go !
You could say why does it work that way ? Well, the regex part
(?:(?! 1112).+\R){4,}
finds any consecutive range of lines which do not begin with\x201112
And it’s particularly the case of the lines :
-
Beginning with
Sunday
-
Beginning with
Int
-
Beginning with
A=
Luckily, this accumulation of matched lines stops as soon at it meets a line beginning with
\x201112
. But, as it must also satisfy the end of the regexA=.+\R?
, the regex engine is forced to backtrack3
lines before ( by decreasing the quantifier of3
( one at a time ), in order to match the last line of the previous block, beginning withA=
!Best Regards,
guy038
-