How to delete ‘placeholder lines’ from a log file

K.M. Richards

Fellow Notepad++ Users,

Could you please help me understand how to properly create the syntax for this? I’m just not “getting it”, no matter how many posts I read here with similar circumstances.

I have to edit daily input logs for a radio station that I consult on programming. The logs are exported from music scheduling software incremented by one minute for each event (because the automation software reads and executes the logs sequentially). In order for events like commercial breaks to fall at the correct minutes for the automation to insert them from that other log, there are “placeholder lines” for each unused minute. However, those placeholders have to be removed before the log is loaded into the automation.

Here is a sample section of a log (“before” data):

09:01:00+J8        Legal ID         /                  AUDIO    00:08         
09:02:00+22405     THRILLER         /MICHAEL JACKSON   MUSIC    05:16         
09:03:00+20913     THE SAFETY DANCE /MEN WITHOUT HATS  MUSIC    03:00         
09:04:00+MMS5      Music Marathon Sw/                  AUDIO    00:10         
09:05:00+22420     SHADOWS OF THE NI/PAT BENATAR       MUSIC    03:32         
09:06:00+10204     GOOD THING       /FINE YOUNG CANNI  MUSIC    03:07         
09:07:00+J7        Sweeper - The Eig/                  AUDIO    00:07         
09:08:00+22116     TAKE ME HOME TONI/EDDIE MONEY       MUSIC    03:23         
09:09:00+33210     TOO MUCH TIME ON /STYX              MUSIC    04:22         
09:10:00+          10:00                               AUDIO    00:00         
09:11:00+          11:00                               AUDIO    00:00         
09:12:00+          12:00                               AUDIO    00:00         
09:13:00+          13:00                               AUDIO    00:00         
09:14:00+          14:00                               AUDIO    00:00         
09:15:00+          15:00                               AUDIO    00:00         
09:16:00+          16:00                               AUDIO    00:00         
09:17:00+          17:00                               AUDIO    00:00         
09:18:00+MME2      Music Marathon En/                  AUDIO    00:05         
09:19:00+          19:00                               AUDIO    00:00         
09:20:00+          20:00                               AUDIO    00:00         
09:21:00+          21:00                               AUDIO    00:00         
09:22:00+          22:00                               AUDIO    00:00         
09:23:00+          23:00                               AUDIO    00:00         
09:24:00+J4F       Shotgun 4        /                  AUDIO    00:05         
09:25:00+21806     TAKE MY BREATH AW/BERLIN            MUSIC    04:03         
09:26:00+20802     LET'S DANCE      /DAVID BOWIE       MUSIC    03:53

Here is how that log needs to look for the automation to accept and run it:

09:01:00+J8        Legal ID         /                  AUDIO    00:08         
09:02:00+22405     THRILLER         /MICHAEL JACKSON   MUSIC    05:16         
09:03:00+20913     THE SAFETY DANCE /MEN WITHOUT HATS  MUSIC    03:00         
09:04:00+MMS5      Music Marathon Sw/                  AUDIO    00:10         
09:05:00+22420     SHADOWS OF THE NI/PAT BENATAR       MUSIC    03:32         
09:06:00+10204     GOOD THING       /FINE YOUNG CANNI  MUSIC    03:07         
09:07:00+J7        Sweeper - The Eig/                  AUDIO    00:07         
09:08:00+22116     TAKE ME HOME TONI/EDDIE MONEY       MUSIC    03:23         
09:09:00+33210     TOO MUCH TIME ON /STYX              MUSIC    04:22         
09:18:00+MME2      Music Marathon En/                  AUDIO    00:05         
09:24:00+J4F       Shotgun 4        /                  AUDIO    00:05         
09:25:00+21806     TAKE MY BREATH AW/BERLIN            MUSIC    04:03         
09:26:00+20802     LET'S DANCE      /DAVID BOWIE       MUSIC    03:53

I have tried using the Mark and Bookmark commands to find all instances of :00 but that marks every line with that string in it, including the lines that need to remain. And because each line also has to have nine spaces at the end, I can’t look for :00 with a space, either, because that would mark any line where the event length happens to end at a precise minute.

If I knew what I was doing, I would think that formatting the Mark search to only match the :00 string if it appears a specific number of characters into each line (23, in this case) … but I do not understand the syntax well enough to do that.

What I am hoping for is a solution that will accomplish this and let me learn more about doing this sort of thing in the future.

Thank you.

Terry R

@K-M-Richards said in I don't understand the command line syntax to accomplish this:

If I knew what I was doing, I would think that formatting the Mark search to only match the :00 string if it appears a specific number of characters into each line (23, in this case) … but I do not understand the syntax well enough to do that.

It’s good that you have tried something. As you said, you only want to mark the line with those characters at specific columns.

Try using
^.{22}:00

The first character denotes the start of a line, but does not consume any characters. Then the dot character specifies a single character (what ever the character is). The following curly brackets with a number tells the regular expression exactly how many positions the dot character is to consume. At this point the next 3 characters are those you seek, that is if you counted correctly.

Try that and see how you go. Be aware that search mode MUST be regular expression as we are using meta characters that are recognized as meaning something special.

Terry

Neil Schipper

@K-M-Richards,

Looks like @Terry-R provided you with something workable (and compact), but I’d already typed this up, so I’ll post anyway. Maybe you’ll get some benefit from it.

A fairly loose spec to validate a time value in the song name column is:

<1 or more spaces><4 or 5 chars of format dd:dd><1 or more spaces><a non-space common character>

this building block does that: \h+\d{1,2}:\d\d\h+\w

to match the whole line, we’d extend the above with

<start of line><anything><above spec><anything><newline sequence>

this find expression does that: ^.*?\h+\d{1,2}:\d\d\h+\w.*?\R

A somewhat tighter full line spec is:

<start of line><19 of anything><5 chars of format dd:dd><31 spaces><A or M keyword><anything><newline sequence>

Fi=^.{19}\d{2}:\d\d\h{31}(AUDIO|MUSIC).*\R

The above would break if times earlier than 10:00 did not have a leading 0.

You have to choose the regex option, and leave that goofy little box to the right unchecked.

Also, although “(Book)Mark, visually check, Delete” is fine as a sequence, for more automation use ‘Replace All’, making sure the replacement text field is properly empty (no invisible spaces, checked with cursor and backspace)

guy038

Hello @k-m-richards, @terry-r, @neil-schipper and All,

Here is , to my mind, a nice solution, as it does not depend on the location of each column, supposing the same layout, of course !

So, from this INPUT text :

    09:01:00+J8            Legal ID         /                           AUDIO              00:08          
    09:02:00+22405         THRILLER         /MICHAEL JACKSON            MUSIC              05:16          
    09:03:00+20913         THE SAFETY DANCE /MEN WITHOUT HATS           MUSIC              03:00          
    09:04:00+MMS5          Music Marathon Sw/                           AUDIO              00:10          
    09:05:00+22420         SHADOWS OF THE NI/PAT BENATAR                MUSIC              03:32          
    09:06:00+10204         GOOD THING       /FINE YOUNG CANNI           MUSIC              03:07          
    09:07:00+J7            Sweeper - The Eig/                           AUDIO              00:07          
    09:08:00+22116         TAKE ME HOME TONI/EDDIE MONEY                MUSIC              03:23          
    09:09:00+33210         TOO MUCH TIME ON /STYX                       MUSIC              04:22          
    09:10:00+              10:00                                        AUDIO              00:00          
    09:11:00+              11:00                                        AUDIO              00:00          
    09:12:00+              12:00                                        AUDIO              00:00          
    09:13:00+              13:00                                        AUDIO              00:00          
    09:14:00+              14:00                                        AUDIO              00:00          
    09:15:00+              15:00                                        AUDIO              00:00          
    09:16:00+              16:00                                        AUDIO              00:00          
    09:17:00+              17:00                                        AUDIO              00:00          
    09:18:00+MME2          Music Marathon En/                           AUDIO              00:05          
    09:19:00+              19:00                                        AUDIO              00:00          
    09:20:00+              20:00                                        AUDIO              00:00          
    09:21:00+              21:00                                        AUDIO              00:00          
    09:22:00+              22:00                                        AUDIO              00:00          
    09:23:00+              23:00                                        AUDIO              00:00          
    09:24:00+J4F           Shotgun 4        /                           AUDIO              00:05          
    09:25:00+21806         TAKE MY BREATH AW/BERLIN                     MUSIC              04:03          
    09:26:00+20802         LET'S DANCE      /DAVID BOWIE                MUSIC              03:53

Open the Replace dilaog ( Ctrl + H )
SEARCH (?x-si) ^ .+ AUDIO \x20+ 00:00 .* \R
REPLACE Leave EMPTY
Untick all box options
Tick the Wrap around option
Select the Regular expression search mode
Click once on the Replace All button

You should get your expected OUTPUT text :

    09:01:00+J8            Legal ID         /                           AUDIO              00:08          
    09:02:00+22405         THRILLER         /MICHAEL JACKSON            MUSIC              05:16          
    09:03:00+20913         THE SAFETY DANCE /MEN WITHOUT HATS           MUSIC              03:00          
    09:04:00+MMS5          Music Marathon Sw/                           AUDIO              00:10          
    09:05:00+22420         SHADOWS OF THE NI/PAT BENATAR                MUSIC              03:32          
    09:06:00+10204         GOOD THING       /FINE YOUNG CANNI           MUSIC              03:07          
    09:07:00+J7            Sweeper - The Eig/                           AUDIO              00:07          
    09:08:00+22116         TAKE ME HOME TONI/EDDIE MONEY                MUSIC              03:23          
    09:09:00+33210         TOO MUCH TIME ON /STYX                       MUSIC              04:22          
    09:18:00+MME2          Music Marathon En/                           AUDIO              00:05          
    09:24:00+J4F           Shotgun 4        /                           AUDIO              00:05          
    09:25:00+21806         TAKE MY BREATH AW/BERLIN                     MUSIC              04:03          
    09:26:00+20802         LET'S DANCE      /DAVID BOWIE                MUSIC              03:53

Here you are !

Best Regards

guy038

Neil Schipper

@guy038 said in I don't understand the command line syntax to accomplish this:

as it does not depend on the location of each column

My first spec + expression did not depend on location!

Here’s another likely workable spec for ID’ing lines that should be removed:

<start of line><bunch of non-space chars><literal ‘+’><one space><anything><newline sequence>

Advantage of this one is eyeballs don’t have to stray far from the left.

Terry R

@K-M-Richards said in I don't understand the command line syntax to accomplish this:

What I am hoping for is a solution that will accomplish this and let me learn more about doing this sort of thing in the future.

If you look at all of the proposed solutions, you will see a common theme, one of using regular expressions. The limitation you were having was due to looking for static text which wasn’t unique to just those lines you needed to remove and possibly also using the “normal” mode of searching.

So as you found; looking for static text; it is very hard to account for variability of content, that’s where regular expressions (called regex for short) come in handy. To learn more about regex, look in our FAQ section for a post titled “Where to find REGular EXpressions (RegEx) documentation ?”. Consider taking some lessons from those links. If you do, remember to start simply first. Regex has a lot to offer, but in the same way as building a house, it all starts with a good foundation, so understanding the basics will be helpful.

After looking more closely at your very well structured data I can see there seems to be an alternative using static text, one that @Guy038 alluded to in his regex. So you could keep the search mode as “normal” and in the find window enter:
AUDIO 00:00 I have 4 spaces at the start and another 4 between the AUDIO and the time. From your sample it does seem that this portion of the line is unique to ONLY those lines you wish to remove. If so, then it may be sufficient for you.

So, although I have now provided a “normal” mode search solution, hopefully you may have also tried one or more of the regex solutions and have had your eyes opened to the possibilities of regex. As you said, “…and let me learn more about doing this sort of thing in the future.”.

If you do start learning and get stuck, remember the forum members are only a post away. We are only too keen to help someone on their Notepad++ journey!

Terry

guy038

Hi, @k-m-richards, @terry-r, @neil-schipper and All,

@neil-schipper : I completely apologize about my previous statement :

Here is , to my mind, a nice solution, as it does not depend on the location of each column, supposing the same layout, of course !

Indeed, In my previous post, I just searched for a solution which could work without any specification on the exact location of each column, forgetting to try your own regexes . And, indeed, your first ( \h+\d{1,2}:\d\d\h+\w ) and second regex ( ^.*?\h+\d{1,2}:\d\d\h+\w.*?\R ) do respect this condition !

Well, in general, I try, at first, to think about a new problem without referring to other solutions already proposed. This is both to get a “fresh” look at the problem and to find my own solution.

However, very often, I end up with ways of doing things and expressions that turn out to be too sophisticated and lead me to think that the solutions already mentioned are the best ones !!

Best Regards

guy038

Neil Schipper

@Terry-R

I can see … an alternative using static text … So you could keep the search mode as “normal”

This is true, and interesting, but in this case it only supports a multi-step (bookmark-and-delete) workflow, but not a single step ReplAll.

@guy038

Well, in general, I try, at first, to think about a new problem without referring to other solutions … both to get a “fresh” look at the problem and to find my own solution.

I do that too, a good practice.

@k-m-richards

A small corrective: “command line syntax” applies to command line interfaces (most famously the DOS or Windows console CLI’s, but there are many others); “search expression syntax” is what we’ve been using here.

Terry R

@Neil-Schipper said in I don't understand the command line syntax to accomplish this:

This is true, and interesting, but in this case it only supports a multi-step (bookmark-and-delete) workflow, but not a single step ReplAll.

And that’s what the OP was doing, using (Book)mark to find those lines. Of course, if he wasn’t aware, he will be now through these series of posts that he can achieve it all in 1 step.

However when I was first learning regex I used the bookmark and find functions quite a bit to understand the boundaries of my regex, to get a feel for how different “formulae” would work on data, I probably still do to be honest. I think it is a valuable tool in the learning process. Even if it does take multiple steps, it’s still far quicker then manually marking each line.

Terry

Neil Schipper

@Terry-R said in I don't understand the command line syntax to accomplish this:

… I used the bookmark and find functions quite a bit to understand the boundaries of my regex … I probably still do to be honest.

Interesting how our histories differ. I use bookmarks mainly for navigation and am hesitant to perform operations that would pollute the list, or blow them all away, and, I use find-match-mark to colorize, but I don’t think I’ve ever used find-match-bookmark to solve a practical problem of my own, and consider it an oddity.

K.M. Richards

Thank you all for your detailed suggestions and explanations of how they would work.

I am going to print out this entire thread and read it over several times and then experiment some more. It looks to me that the solution that will work best for me is somewhere in there!

It is also gratifying that there are still places where if you ask politely for help, you receive it with equal measures of politeness. You are all very nice people for trying to help.

PeterJones

@K-M-Richards ,

With the topic named “I don’t understand the command line syntax to accomplish this”: as future readers come browsing or searching through old forum posts, they will have no clue what it’s about. And even if you come back some time later, reading through your old topic titles, you might not know what question you were asking originally.

Would you mind if I used admin powers to rename your topic to “how to delete ‘placeholder lines’ from a log file” ?

K.M. Richards

@PeterJones By all means, if you think it will help others with similar difficulties.

K.M. Richards

I just wanted to come back and thank everyone again for their help. As I had hoped, I took parts of multiple suggestions and came up with a sequence of search-and-replace steps which has reduced my log editing time to under ten minutes for a full week’s worth. Two months into the new process and not a single glitch!

Best of all, I increased my knowledge base in the process. I love it when solving a problem results in my learning something new!

Again, thank you all for your help.

Zahid Mehmood

This post is deleted!