Removing duplicated lines out of log file.



  • Hello,

    I wan’t to clean up al loggin file by removing the duplicated rules between two errors.
    Because the logfile is timebased the rulers are not exact the same.
    And if the error reapeers later back is had to be logged again.
    Maybe a small example;

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:35:14.120 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.345 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:35:16.826 -04:00 [Debug] Button pressed
    2018-10-01 09:36:16.253 -04:00 [Debug] Button pressed
    2018-10-01 09:39:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:39:28.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:29.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:30.014 -04:00 [Debug] Stack light to Red
    2018-10-01 09:40:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:41:13.824 -04:00 [Debug] Button pressed
    2018-10-01 09:42:15.924 -04:00 [Debug] Button pressed
    2018-10-01 09:43:11.254 -04:00 [Debug] Button pressed
    2018-10-01 09:43:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:27.789 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:28.105 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:31.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:45:11.254 -04:00 [Debug] Button pressed

    needs to become

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:39:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:39:28.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:30.014 -04:00 [Debug] Stack light to Red
    2018-10-01 09:40:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:43:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:45:11.254 -04:00 [Debug] Button pressed

    Thanks in advance
    Danny



  • @D

    Try this:

    Invoke Replace dialog (default key: ctrl+h)
    Find what zone: (?-s)^(.{39}(.+)\R)(.{39}\2\R)+
    Replace with zone: \1
    Wrap around checkbox: ticked
    Search mode selection: Regular expression
    Action: Press Replace All button

    Here’s the details of how it works:

    THE FIND EXPRESSION:

    (?-s)^(.{39}(.+)\R)(.{39}\2\R)+

    • [Use these options for the whole regular expression][1 ] (?-s)
      • [(hyphen inverts the meaning of the letters that follow)][1 ] -
      • [Dot doesn’t match line breaks][1 ] s
    • [Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed)][2 ] ^
    • [Match the regex below and capture its match into backreference number 1][3 ] (.{39}(.+)\R)
      • [Match any single character that is NOT a line break character (line feed, carriage return, form feed)][4 ] .{39}
        • [Exactly 39 times][5 ] {39}
      • [Match the regex below and capture its match into backreference number 2][3 ] (.+)
        • [Match any single character that is NOT a line break character (line feed, carriage return, form feed)][4 ] .+
          • [Between one and unlimited times, as many times as possible, giving back as needed (greedy)][6 ] +
      • [Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed)][7 ] \R
    • [Match the regex below and capture its match into backreference number 3][3 ] (.{39}\2\R)+
      • [Between one and unlimited times, as many times as possible, giving back as needed (greedy)][6 ] +
        • [You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations.][8 ] +
        • [Or, if you don’t want to capture anything, replace the capturing group with a non-capturing group to make your regex more efficient.][8 ]
      • [Match any single character that is NOT a line break character (line feed, carriage return, form feed)][4 ] .{39}
        • [Exactly 39 times][5 ] {39}
      • [Match the same text that was most recently matched by capturing group number 2 (case sensitive; fail if the group did not participate in the match so far)][9 ] \2
      • [Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed)][7 ] \R

    THE REPLACE EXPRESSION:

    \1

    • [Insert the text that was last matched by capturing group number 1][10 ] \1

    Created with RegexBuddy

    [1 ]: https://www.regular-expressions.info/modifiers.html
    [2 ]: https://www.regular-expressions.info/anchors.html
    [3 ]: https://www.regular-expressions.info/brackets.html
    [4 ]: https://www.regular-expressions.info/dot.html
    [5 ]: https://www.regular-expressions.info/repeat.html#limit
    [6 ]: https://www.regular-expressions.info/repeat.html
    [7 ]: https://www.regular-expressions.info/nonprint.html
    [8 ]: https://www.regular-expressions.info/captureall.html
    [9 ]: https://www.regular-expressions.info/backref.html
    [10 ]: https://www.regular-expressions.info/replacebackref.html

    RegexBuddy settings to emulate N++ regex engine: Application=boost::regex 1.54-1.57 / flavor=Default flavor / replacement flavor=All flavor / ^$ match at line breaks / Numbered capture / Allow zero-length matches

    This will turn the original text:

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:35:14.120 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.345 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:35:16.826 -04:00 [Debug] Button pressed
    2018-10-01 09:36:16.253 -04:00 [Debug] Button pressed
    2018-10-01 09:39:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:39:28.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:29.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:30.014 -04:00 [Debug] Stack light to Red
    2018-10-01 09:40:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:41:13.824 -04:00 [Debug] Button pressed
    2018-10-01 09:42:15.924 -04:00 [Debug] Button pressed
    2018-10-01 09:43:11.254 -04:00 [Debug] Button pressed
    2018-10-01 09:43:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:27.789 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:28.105 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:44:31.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:45:11.254 -04:00 [Debug] Button pressed
    

    Into the desired text:

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:39:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:39:28.954 -04:00 [Debug] Current state changed: "MainState: Initial, PreviousMainState: Initial, AggregatedState: Operational, HardwareState: ControllerState: Undefined,AirflowState: Undefined,Airpressuretate: Undefined,VacuumState: Unknown,OutletState: Operational,FrontDoorState: Locked,BackDoorState: Locked,HandlingGateDoorState: Open,ExhaustState: On,IsVacuumApplied: False,VacuumFlow: 0.0979033783078194,AreAllDoorsClosed: False,IsSupplyDoorClosed: True,IsSupplyStationAlarm: False,IsTankInkLevelLow: False,IsEmergencyActive: False,LeftSensorTriggered: False,RightSensorTriggered: False,AirFlow1Active: False,
    2018-10-01 09:39:30.014 -04:00 [Debug] Stack light to Red
    2018-10-01 09:40:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:43:27.014 -04:00 [Debug] Stack light to Yellow
    2018-10-01 09:45:11.254 -04:00 [Debug] Button pressed
    


  • Thank you for your quick response and also for the clear explanation and links to a manual.
    This expression does exactly what I meant.



  • @D said:

    Thank you for your quick response and also for the clear explanation and links to a manual.

    THAT is what we helpers like to hear for a response! :-)

    Note that the search could be made more restrictive, if necessary. For instance, one could verify that the lines start with a date followed by a time followed by the string [Debug]…but I did not include this in my solution because it was more work. :-)



  • Some lines in my file, although there are a few different words, have been deleted according to the above formula. What if I just want to delete completely matching lines?
    And if I want to remove the overlapping lines and remove the same match, how? I mean delete both 2. Hope to get your answer



  • @Sarah-Duong , welcome to the forum.

    If I understand correctly, you have a data file – I’ll assume it looks similar to D’s original logfile – where there’s a possibility that two rows might be exactly alike, as in this example data:

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:35:14.120 -04:00 [Debug] Button pressed EXACTLY THE SAME
    2018-10-01 09:35:14.120 -04:00 [Debug] Button pressed EXACTLY THE SAME
    2018-10-01 09:35:14.120 -04:00 [Debug] Button pressed EXACTLY THE SAME
    2018-10-01 09:35:15.345 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:35:16.826 -04:00 [Debug] Button pressed EXACTLY THE SAME
    2018-10-01 09:35:16.826 -04:00 [Debug] Button pressed EXACTLY THE SAME
    2018-10-01 09:36:16.253 -04:00 [Debug] Button pressed
    

    And you want to delete both of the matching lines, so that the log above would be filtered down to

    2018-10-01 09:35:14.101 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.345 -04:00 [Debug] Button pressed
    2018-10-01 09:35:15.824 -04:00 [Debug] Button pressed
    2018-10-01 09:36:16.253 -04:00 [Debug] Button pressed
    

    If that’s the case, then

    • Find What = (?-s)^(.+\R)(\1)+
    • Replace With = (empty)
    • Search Mode = Regular Expression

    If that’s not the case, then you’ll need to provide more information – like example data, including both the BEFORE and the AFTER data, so we know exactly how you want it to change. In order to help us help you, please markup your post in a way that the data comes through exactly as you entered it: make use of your PREVIEW pane, and the formatting/Markdown-in-this-forum instructions given in the “FYI” section I am quoting below.

    -----
    FYI:

    This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.
    If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.
    Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.


Log in to reply