How to delete ‘placeholder lines’ from a log file
-
Fellow Notepad++ Users,
Could you please help me understand how to properly create the syntax for this? I’m just not “getting it”, no matter how many posts I read here with similar circumstances.
I have to edit daily input logs for a radio station that I consult on programming. The logs are exported from music scheduling software incremented by one minute for each event (because the automation software reads and executes the logs sequentially). In order for events like commercial breaks to fall at the correct minutes for the automation to insert them from that other log, there are “placeholder lines” for each unused minute. However, those placeholders have to be removed before the log is loaded into the automation.
Here is a sample section of a log (“before” data):
09:01:00+J8 Legal ID / AUDIO 00:08 09:02:00+22405 THRILLER /MICHAEL JACKSON MUSIC 05:16 09:03:00+20913 THE SAFETY DANCE /MEN WITHOUT HATS MUSIC 03:00 09:04:00+MMS5 Music Marathon Sw/ AUDIO 00:10 09:05:00+22420 SHADOWS OF THE NI/PAT BENATAR MUSIC 03:32 09:06:00+10204 GOOD THING /FINE YOUNG CANNI MUSIC 03:07 09:07:00+J7 Sweeper - The Eig/ AUDIO 00:07 09:08:00+22116 TAKE ME HOME TONI/EDDIE MONEY MUSIC 03:23 09:09:00+33210 TOO MUCH TIME ON /STYX MUSIC 04:22 09:10:00+ 10:00 AUDIO 00:00 09:11:00+ 11:00 AUDIO 00:00 09:12:00+ 12:00 AUDIO 00:00 09:13:00+ 13:00 AUDIO 00:00 09:14:00+ 14:00 AUDIO 00:00 09:15:00+ 15:00 AUDIO 00:00 09:16:00+ 16:00 AUDIO 00:00 09:17:00+ 17:00 AUDIO 00:00 09:18:00+MME2 Music Marathon En/ AUDIO 00:05 09:19:00+ 19:00 AUDIO 00:00 09:20:00+ 20:00 AUDIO 00:00 09:21:00+ 21:00 AUDIO 00:00 09:22:00+ 22:00 AUDIO 00:00 09:23:00+ 23:00 AUDIO 00:00 09:24:00+J4F Shotgun 4 / AUDIO 00:05 09:25:00+21806 TAKE MY BREATH AW/BERLIN MUSIC 04:03 09:26:00+20802 LET'S DANCE /DAVID BOWIE MUSIC 03:53
Here is how that log needs to look for the automation to accept and run it:
09:01:00+J8 Legal ID / AUDIO 00:08 09:02:00+22405 THRILLER /MICHAEL JACKSON MUSIC 05:16 09:03:00+20913 THE SAFETY DANCE /MEN WITHOUT HATS MUSIC 03:00 09:04:00+MMS5 Music Marathon Sw/ AUDIO 00:10 09:05:00+22420 SHADOWS OF THE NI/PAT BENATAR MUSIC 03:32 09:06:00+10204 GOOD THING /FINE YOUNG CANNI MUSIC 03:07 09:07:00+J7 Sweeper - The Eig/ AUDIO 00:07 09:08:00+22116 TAKE ME HOME TONI/EDDIE MONEY MUSIC 03:23 09:09:00+33210 TOO MUCH TIME ON /STYX MUSIC 04:22 09:18:00+MME2 Music Marathon En/ AUDIO 00:05 09:24:00+J4F Shotgun 4 / AUDIO 00:05 09:25:00+21806 TAKE MY BREATH AW/BERLIN MUSIC 04:03 09:26:00+20802 LET'S DANCE /DAVID BOWIE MUSIC 03:53
I have tried using the Mark and Bookmark commands to find all instances of :00 but that marks every line with that string in it, including the lines that need to remain. And because each line also has to have nine spaces at the end, I can’t look for :00 with a space, either, because that would mark any line where the event length happens to end at a precise minute.
If I knew what I was doing, I would think that formatting the Mark search to only match the :00 string if it appears a specific number of characters into each line (23, in this case) … but I do not understand the syntax well enough to do that.
What I am hoping for is a solution that will accomplish this and let me learn more about doing this sort of thing in the future.
Thank you.
-
@K-M-Richards said in I don't understand the command line syntax to accomplish this:
If I knew what I was doing, I would think that formatting the Mark search to only match the :00 string if it appears a specific number of characters into each line (23, in this case) … but I do not understand the syntax well enough to do that.
It’s good that you have tried something. As you said, you only want to mark the line with those characters at specific columns.
Try using
^.{22}:00
The first character denotes the start of a line, but does not consume any characters. Then the dot character specifies a single character (what ever the character is). The following curly brackets with a number tells the regular expression exactly how many positions the dot character is to consume. At this point the next 3 characters are those you seek, that is if you counted correctly.
Try that and see how you go. Be aware that search mode MUST be regular expression as we are using meta characters that are recognized as meaning something special.
Terry
-
Looks like @Terry-R provided you with something workable (and compact), but I’d already typed this up, so I’ll post anyway. Maybe you’ll get some benefit from it.
A fairly loose spec to validate a time value in the song name column is:
<1 or more spaces><4 or 5 chars of format dd:dd><1 or more spaces><a non-space common character>
this building block does that:
\h+\d{1,2}:\d\d\h+\w
to match the whole line, we’d extend the above with
<start of line><anything><above spec><anything><newline sequence>
this find expression does that:
^.*?\h+\d{1,2}:\d\d\h+\w.*?\R
A somewhat tighter full line spec is:
<start of line><19 of anything><5 chars of format dd:dd><31 spaces><A or M keyword><anything><newline sequence>
Fi=
^.{19}\d{2}:\d\d\h{31}(AUDIO|MUSIC).*\R
The above would break if times earlier than 10:00 did not have a leading 0.
You have to choose the regex option, and leave that goofy little box to the right unchecked.
Also, although “(Book)Mark, visually check, Delete” is fine as a sequence, for more automation use ‘Replace All’, making sure the replacement text field is properly empty (no invisible spaces, checked with cursor and backspace)
-
Hello @k-m-richards, @terry-r, @neil-schipper and All,
Here is , to my mind, a nice solution, as it does not depend on the location of each column, supposing the same layout, of course !
So, from this INPUT text :
09:01:00+J8 Legal ID / AUDIO 00:08 09:02:00+22405 THRILLER /MICHAEL JACKSON MUSIC 05:16 09:03:00+20913 THE SAFETY DANCE /MEN WITHOUT HATS MUSIC 03:00 09:04:00+MMS5 Music Marathon Sw/ AUDIO 00:10 09:05:00+22420 SHADOWS OF THE NI/PAT BENATAR MUSIC 03:32 09:06:00+10204 GOOD THING /FINE YOUNG CANNI MUSIC 03:07 09:07:00+J7 Sweeper - The Eig/ AUDIO 00:07 09:08:00+22116 TAKE ME HOME TONI/EDDIE MONEY MUSIC 03:23 09:09:00+33210 TOO MUCH TIME ON /STYX MUSIC 04:22 09:10:00+ 10:00 AUDIO 00:00 09:11:00+ 11:00 AUDIO 00:00 09:12:00+ 12:00 AUDIO 00:00 09:13:00+ 13:00 AUDIO 00:00 09:14:00+ 14:00 AUDIO 00:00 09:15:00+ 15:00 AUDIO 00:00 09:16:00+ 16:00 AUDIO 00:00 09:17:00+ 17:00 AUDIO 00:00 09:18:00+MME2 Music Marathon En/ AUDIO 00:05 09:19:00+ 19:00 AUDIO 00:00 09:20:00+ 20:00 AUDIO 00:00 09:21:00+ 21:00 AUDIO 00:00 09:22:00+ 22:00 AUDIO 00:00 09:23:00+ 23:00 AUDIO 00:00 09:24:00+J4F Shotgun 4 / AUDIO 00:05 09:25:00+21806 TAKE MY BREATH AW/BERLIN MUSIC 04:03 09:26:00+20802 LET'S DANCE /DAVID BOWIE MUSIC 03:53
-
Open the Replace dilaog (
Ctrl + H
) -
SEARCH
(?x-si) ^ .+ AUDIO \x20+ 00:00 .* \R
-
REPLACE
Leave EMPTY
-
Untick all box options
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click once on the
Replace All
button
You should get your expected OUTPUT text :
09:01:00+J8 Legal ID / AUDIO 00:08 09:02:00+22405 THRILLER /MICHAEL JACKSON MUSIC 05:16 09:03:00+20913 THE SAFETY DANCE /MEN WITHOUT HATS MUSIC 03:00 09:04:00+MMS5 Music Marathon Sw/ AUDIO 00:10 09:05:00+22420 SHADOWS OF THE NI/PAT BENATAR MUSIC 03:32 09:06:00+10204 GOOD THING /FINE YOUNG CANNI MUSIC 03:07 09:07:00+J7 Sweeper - The Eig/ AUDIO 00:07 09:08:00+22116 TAKE ME HOME TONI/EDDIE MONEY MUSIC 03:23 09:09:00+33210 TOO MUCH TIME ON /STYX MUSIC 04:22 09:18:00+MME2 Music Marathon En/ AUDIO 00:05 09:24:00+J4F Shotgun 4 / AUDIO 00:05 09:25:00+21806 TAKE MY BREATH AW/BERLIN MUSIC 04:03 09:26:00+20802 LET'S DANCE /DAVID BOWIE MUSIC 03:53
Here you are !
Best Regards
guy038
-
-
@guy038 said in I don't understand the command line syntax to accomplish this:
as it does not depend on the location of each column
My first spec + expression did not depend on location!
Here’s another likely workable spec for ID’ing lines that should be removed:
<start of line><bunch of non-space chars><literal ‘+’><one space><anything><newline sequence>
Advantage of this one is eyeballs don’t have to stray far from the left.
-
@K-M-Richards said in I don't understand the command line syntax to accomplish this:
What I am hoping for is a solution that will accomplish this and let me learn more about doing this sort of thing in the future.
If you look at all of the proposed solutions, you will see a common theme, one of using regular expressions. The limitation you were having was due to looking for static text which wasn’t unique to just those lines you needed to remove and possibly also using the “normal” mode of searching.
So as you found; looking for static text; it is very hard to account for variability of content, that’s where regular expressions (called regex for short) come in handy. To learn more about regex, look in our FAQ section for a post titled “Where to find REGular EXpressions (RegEx) documentation ?”. Consider taking some lessons from those links. If you do, remember to start simply first. Regex has a lot to offer, but in the same way as building a house, it all starts with a good foundation, so understanding the basics will be helpful.
After looking more closely at your very well structured data I can see there seems to be an alternative using static text, one that @Guy038 alluded to in his regex. So you could keep the search mode as “normal” and in the find window enter:
AUDIO 00:00
I have 4 spaces at the start and another 4 between the AUDIO and the time. From your sample it does seem that this portion of the line is unique to ONLY those lines you wish to remove. If so, then it may be sufficient for you.So, although I have now provided a “normal” mode search solution, hopefully you may have also tried one or more of the regex solutions and have had your eyes opened to the possibilities of regex. As you said, “…and let me learn more about doing this sort of thing in the future.”.
If you do start learning and get stuck, remember the forum members are only a post away. We are only too keen to help someone on their Notepad++ journey!
Terry
-
Hi, @k-m-richards, @terry-r, @neil-schipper and All,
@neil-schipper : I completely apologize about my previous statement :
Here is , to my mind, a nice solution, as it does not depend on the location of each column, supposing the same layout, of course !
Indeed, In my previous post, I just searched for a solution which could work without any specification on the exact location of each column, forgetting to try your own regexes . And, indeed, your first (
\h+\d{1,2}:\d\d\h+\w
) and second regex (^.*?\h+\d{1,2}:\d\d\h+\w.*?\R
) do respect this condition !Well, in general, I try, at first, to think about a new problem without referring to other solutions already proposed. This is both to get a “fresh” look at the problem and to find my own solution.
However, very often, I end up with ways of doing things and expressions that turn out to be too sophisticated and lead me to think that the solutions already mentioned are the best ones !!
Best Regards
guy038
-
I can see … an alternative using static text … So you could keep the search mode as “normal”
This is true, and interesting, but in this case it only supports a multi-step (bookmark-and-delete) workflow, but not a single step ReplAll.
Well, in general, I try, at first, to think about a new problem without referring to other solutions … both to get a “fresh” look at the problem and to find my own solution.
I do that too, a good practice.
A small corrective: “command line syntax” applies to command line interfaces (most famously the DOS or Windows console CLI’s, but there are many others); “search expression syntax” is what we’ve been using here.
-
@Neil-Schipper said in I don't understand the command line syntax to accomplish this:
This is true, and interesting, but in this case it only supports a multi-step (bookmark-and-delete) workflow, but not a single step ReplAll.
And that’s what the OP was doing, using (Book)mark to find those lines. Of course, if he wasn’t aware, he will be now through these series of posts that he can achieve it all in 1 step.
However when I was first learning regex I used the bookmark and find functions quite a bit to understand the boundaries of my regex, to get a feel for how different “formulae” would work on data, I probably still do to be honest. I think it is a valuable tool in the learning process. Even if it does take multiple steps, it’s still far quicker then manually marking each line.
Terry
-
@Terry-R said in I don't understand the command line syntax to accomplish this:
… I used the bookmark and find functions quite a bit to understand the boundaries of my regex … I probably still do to be honest.
Interesting how our histories differ. I use bookmarks mainly for navigation and am hesitant to perform operations that would pollute the list, or blow them all away, and, I use find-match-mark to colorize, but I don’t think I’ve ever used find-match-bookmark to solve a practical problem of my own, and consider it an oddity.
-
Thank you all for your detailed suggestions and explanations of how they would work.
I am going to print out this entire thread and read it over several times and then experiment some more. It looks to me that the solution that will work best for me is somewhere in there!
It is also gratifying that there are still places where if you ask politely for help, you receive it with equal measures of politeness. You are all very nice people for trying to help.
-
With the topic named “I don’t understand the command line syntax to accomplish this”: as future readers come browsing or searching through old forum posts, they will have no clue what it’s about. And even if you come back some time later, reading through your old topic titles, you might not know what question you were asking originally.
Would you mind if I used admin powers to rename your topic to “how to delete ‘placeholder lines’ from a log file” ?
-
@PeterJones By all means, if you think it will help others with similar difficulties.
-
I just wanted to come back and thank everyone again for their help. As I had hoped, I took parts of multiple suggestions and came up with a sequence of search-and-replace steps which has reduced my log editing time to under ten minutes for a full week’s worth. Two months into the new process and not a single glitch!
Best of all, I increased my knowledge base in the process. I love it when solving a problem results in my learning something new!
Again, thank you all for your help.
-
This post is deleted!