Wanting to Format 2000000 Lines of Text



  • Howdy and good Day to you all,

    I have a problem that I am hoping that someone will have a solution to, or something to point me in the right direction.

    I have a report about 2 000 000 Lines long, its a outputted formatted report from a *nux based access control system, that the client would like changed from what I’ve given them, to something that can be brought into excel or something similar.

    The issue is as follows, the outputted text, can have 1/2/3/4 lines per incident, which means that all I’ve come up with so far is to go through and delete out a lot of the text, I’ve been at it for a few hours, and I am sure there is an easier way to do this.

    They havent used TAB’s or Commas to delim the fields, rather they have used spaces… Why? I dont know, perhaps its something to do with the export procedure from the *nux box to the windows terminal, I am unsure (Sadly I cannot get back to the terminal to see if there is something else I can get, but when I exported it a few months ago (yes the client got back to me and said they want it in something they can view etc) the only output format available through the terminal was this text file.

    So the basic structure is

    Line 1:
    Line 2:
    Line 3:
    Line 4:

    The basic Fields are

    Field 1 - Incident Number (194xxxx —> Onwards)
    Field 2 - Date
    Field 3 - Time
    Field 4 - The Access Controls Time (So the time in the equipment as apposed to field 2/3 which is the time on the terminal)
    Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

    or

    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
    6123: Michelle Coghlan at Main Entry Gate
    1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at ACUTE CARE AT
    HOME

    I’ve tried a few things, but basically all I’ve come up with is removing the (spaces) on the second and third line, then going to the end of each line and deleting the cr/lf and repeating…

    I am really hoping someone can help with this…

    Thanks!



  • @Mark-Allen

    You’ve given a pretty good example of what you are starting with, but maybe you should give an example (using the same data) as what you want to end up with as well (instead of just loosely describing it: “all I’ve come up with…”).

    You say the results of your effort should be “brought into Excel or…similar”, so I’d suggest that you also verify that the example transformed data actually imports into Excel cleanly and as you expect.

    At that point probably someone here will jump in and offer some helpful advice on how to transform your data.



  • AND!

    Every 80-90 Lines of Text there is a page header as well that I would like to remove, I can simply run CTRL-H and replace the text and make white space and delete it, bit is there a nicer way, or did I hit in on the head the first time?

    What would like to end up with is either

    1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 6512: Stephen Knynenburg at ACUTE CARE AT HOME

    All on one line, rather than being spread over multiple lines OR

    1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER 6512: Stephen Knynenburg at ACUTE CARE AT HOME

    Something that is delimited so I can get some columns going with excel, or such product.

    Thanks!



  • Essentially I would like to find a way to remove all the mutliple lines that belong to the same lines data, so that its easier to read for the client, and they can import it directly into an excel

    eg current if you where to pull it into excel you get ( For : 1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME)

    (The output from the TXT File is as follows

    1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    )

    Line 1: Cell 1 Cell 2 Cell 3 Cell 4
    Line 2: Cell 5
    Line 3: Cell 6 (sometimes)
    Line 4: Cell 7 (Sometimes) (Depends on the length of the verbose output and how many characters it has.
    Example
    Line 1: Cell 1 1947729 Cell 2 Sun Nov 26 06:59:32 2017 Cell 3 (CH: Nov 26 07:02:35) Cell 4 ENTRY GRANTED to USER6
    Line 2: Cell 5 123: Michelle Coghlan at ACUTE CARE AT HOME

    Now when I pull the raw txt file into excel I get lines like this

    Screen Capture 1



  • @Mark-Allen

    If I were you, I’d do it in several steps.

    First, remove your page header. Since you didn’t provide any detail, you’re on your own on that. Hopefully you know what to do with this.

    Second, combine your multi-line data into a single line. Basically what you want to do here is to look at how a line starts and if it doesn’t start with a 7-digit number followed by a day of the week, then you want to combine that line with the following line. Here’s what I’d do:

    Find what zone: (?-is)^(\d{7} (?:Sun|Mon|Tue).+)\R(?!\d{7} (?:Sun|Mon|Tue))(.+)
    Replace with zone: \1\2
    Search mode: Regular expression
    Wrap around: ticked
    Action: Press Replace All button (do this multiple times, I guess up to 4 or 5 based upon your original problem statement, until all of the broken lines are combined

    Note: I left out the remaining days of the week, you can easily fill those in between | characters. Exercise for the reader.

    Third, reformat the data for Excel purposes. That part isn’t so hard, so again, left as exercise for the reader.



  • Hello, @mark-allen, and All,

    This morning ( in France ), I succeeded to find a correct regex which should do what your want to. But, I was away from home, this afternoon ! This regex changes each multi-lines transaction into a single line transaction and, in addition, separates the different fields with commas ;-))

    Of course, I needed some time to figure out all the problem ! And, as Scott said, in his last post, we need to know the structure of your page header in order to build the correct final regex ! In the meanwhile, I supposed that your file does not contain any header

    In your first post, yous said :

    the outputted text, can have 1/2/3/4 lines per incident

    and

    Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

    So, I imagined the simple test text , below, that you’ll paste in a new tab :

    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
    <Output_2>
    <Output_3>
    <Output_4>
    1941234 .........
    
    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
    <Output_2>
    <Output_3>
    1941234 .........
    
    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
    <Output_2>
    1941234 .........
    

    As you can notice, I supposed that the verbose output is always split in, at least, 2 lines, and sometimes in 3 or 4 lines ( Hence, no verbose output in a single line only ! ). If I’m wrong about it, just tell in order to modify the regex ;-)


    Now, open the Replace dialog

    Insert the regex (?-s)(^194\d{4})\x20(.+)\x20(\(.+\))(.+)\R(.+)(?:\R(.+)(?:\R(.+))??)??(?=\R194\d{4}\x20|\Z) , in the Find what: zone

    Insert the regex \1,\2,\3,\4\5\6\7 , in the Replace with: zone

    Tick the Wrap around option

    Click, once, on the Replace All button or several times on the Replace button

    We get the following text :

    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
    1941234 .........
    
    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
    1941234 .........
    
    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
    1941234 .........
    

    So the important question is : Do you expect such a text ?


    Secondly, using your list of transactions, also given in your first post :

    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
    6123: Michelle Coghlan at Main Entry Gate
    1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at ACUTE CARE AT
    HOME
    

    This same regex S/R produced :

    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
    1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
    

    May be, you noticed that the last line, after replacement, ends with at ACUTE CARE ATHOME, where the words AT and HOME are attached. Quite logical as there is, only, a line-break between …at ACUTE CARE AT and the last line HOME

    Then, in a new Excel page, I inserted that text as External data, indicating that the fields were comma separated and saved it as a common .xls file, without any problem :-)


    Finally, in your first post, you also said :

    I have a report about 2 000 000 Lines long,

    So, I tried to determine if this regex S/R could be done in a sensible time, without any bug or catastrophic backtracking of the N++ regex engine !

    I duplicated your 15 lines text, below :

    1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
    6123: Michelle Coghlan at Main Entry Gate
    1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
    6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at Main Entry Gate
    1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
    6512: Stephen Knynenburg at ACUTE CARE AT
    HOME
    

    68,000 times ! So, I was left with a file of 58,412,000 bytes, containing 1,020,000 lines. Then running the above regex S/R, on my old (2005 ) Windows XP laptop ( with only 1GB of RAM ! ) the 476,000 replacements took 69s, getting a file of 57,800,000 bytes, containing 476,000 lines ;-))

    Consequently, I suppose that your file with 2,000,000 lines should be, correctly, handled, with a recent laptop and OS !

    So, just tell us about your header and it’ll be only a matter of minutes to get the job done ;-))

    See you later,

    Best Regards,

    guy038



  • Morning All! (Or Afternoon or Evening!)

    Scott! You are correct I didnt give you the header!

    This is the header, which is static throughout, the only thing that changes is the Page #

    This is the header, I can go through and essentially remove it manually or use CTRL-H and replace the Text with a “space”

    Door User Activity Reported on Fri Mar 2 09:50:00 2018
    Initiated by Master from Main Office Page 6292
    Messia Plains Ambulatory

    I’ve tried your expression, and sadly it didnt work for me,

    In the Find What Line :(?-is)^(\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+)\R(?!\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat))(.+)
    In the Replace with: \1\2

    Each time it has come up with Zero Lines Replaced

    I made the sample size smaller, just 1 Page, and the same issue, made it just one line, and he same issue (Zero Lines replaced), I thought perhaps i might have an outdated version, and I’ve updated to the latest version effective today (7.5.6 32 Bit) I have no plug-ins enabled or installed

    Sample Set 1
    Expression 1

    I then thought perhaps the data because it only has monday (Mon) in it, was breaking the expression (I know, I doubted it as well, but its a bit above my head, but wanted to try!)

    I removed Tuesday-Sunday, and left monday only, Still the same result.

    Monday Only Expression

    So at a bit of a loss with that one, I HAVE to be missing something there or some plig in or something for that… :(

    I’ve put up 4 pages of the text so you can see the formatting

    Sample Text

    @guy038

    Thanks for your response as well!!

    I’ve tried your expression as well, and I get the same response, it tells me that there are zero changes, I’ve hit the button many times just to ensure I am not the reason for this

    One thing I’ve noticed is that there is a Space before the first 7 digit character, will that upset things?

    Space before first 7 Digit Number

    What you have outputted is exactly what I am after

    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
    1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME

    One Line, with comma’s that can be brought into excel nice and neat!

    I will keep playing, but any thoughts would be appreciated! and I will be sure to save this away if asked again!

    Mark

    nb. Sorry dont know the syntax to nicely put things into code box’s



  • @Mark-Allen

    The simplest way to represent your data accurately here is by pasting it between a line containing ```z and a line containing ```

    Thus, if you do the following:

    ```z
    (copy and paste your real data HERE)
    ```

    It (some of what I think your real data is) will render as:

     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                      HOME
     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                      HOME
    

    And really, accurately representing your data is the only way to get good advice. Otherwise, responders are guessing, and when you try their advice you get Replace: No occurrence was found. Then when you say, “Oh, that’s not really what I meant…” and attempt to clarify, the helpers have lost interest and have moved on to more interesting pursuits…

    Note: You can also indent your entire dataset by 4 spaces and it will render like the following, but I find the above method easier (and the colors are often wacky in what follows).

     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                      HOME
     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                      HOME
    

    Also note: Screenshots aren’t going to cut it. Nobody is going to retype your data from a screenshot to test out a proposed match expression.



  • @Mark-Allen

    I forgot to mention that your “Sample Text” link is good in that it has copyable text, but it is bad in that it doesn’t preserve the leading spaces (as shown in some of the screenshot versions) on the lines! :-(



  • agh! Cant Win :) But this is the data

    1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                      1313: State Government Security at Main 
                                      Entry Gate 
     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at Main Entry Gate 
     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Main Entry Gate 
     1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Main Entry Gate 
     1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                      HOME 
     1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at Population Health 
                                      Rear Entry 
     1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                      HOME 
     1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
     1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
     1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                      6005: Rose Gorman at Main Entry Gate 
     1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                      6005: Rose Gorman at ACUTE CARE AT HOME 
     1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at Main Entry Gate 
     1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at ACUTE CARE AT HOME 
     1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at Population Health 
                                      Rear Entry 
     1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
    
    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
    Initiated by Master from Main Office                                    Page 2
    Messia Plains Ambulatory
    ================================================================================
    


  • Hello, @mark-allen, @scott-sumner and All,

    Ah ! Of course, it’s easy to understand why my regex could not match any string ;-)) Indeed, when regular expressions are involved, even a single space, added or deleted in text, is of importance !

    From your recent picture, I noticed four particularities, in your list :

    • Any incident number is separated from beginning of line by a space character

    • The verbose output zones are, both :

      • Preceded by 1 or more space character(s)

      • Followed by 1 space character

    • Each header block :

      • Begins with the Form Feed character ( the ASCII page-breaking control character, of Unicode value \x{000C}, which can be matched, in regular expressions, with the \f syntax

      • Ends with several equal signs ( = )

    Refer to https://en.wikipedia.org/wiki/Page_break#Form_feed

    With all these elements, it’s not difficult to modify my previous regex, to be close to your real text ;-))


    So again, let’s begin with this short example :

     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                      <Output_2> 
                                      <Output_3> 
                                      <Output_4> 
     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                      <Output_2> 
                                      <Output_3> 
    
    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
    Initiated by Master from Main Office                                    Page 2
    Messia Plains Ambulatory
    ================================================================================
     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                      <Output_2> 
     1941234 ......... 
    

    Using my new regex, below :

    SEARCH (?-s)^\x20(194\d{4})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

    REPLACE ?1\1,\2,\3,\4\5\6\7

    We get the following text :

    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
     1941234 ......... 
    

    Secondly, given your list of transactions, below, of your LAST post, where I simply added :

    • One space before the first incident number, which was missing

    • A header block, after the 5th and 11th transactions

     1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                      1313: State Government Security at Main 
                                      Entry Gate 
     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at Main Entry Gate 
     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
    
    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
    Initiated by Master from Main Office                                    Page 26
    Messia Plains Ambulatory
    ================================================================================
     1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Main Entry Gate 
     1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Main Entry Gate 
     1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                      HOME 
     1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at Population Health 
                                      Rear Entry 
     1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                      HOME 
     1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
    
    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
    Initiated by Master from Main Office                                    Page 27
    Messia Plains Ambulatory
    ================================================================================
     1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
     1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                      6005: Rose Gorman at Main Entry Gate 
     1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                      6005: Rose Gorman at ACUTE CARE AT HOME 
     1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at Main Entry Gate 
     1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at ACUTE CARE AT HOME 
     1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                      6513: Sheela Tuazon at Population Health 
                                      Rear Entry 
     1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                      6512: Stephen Knynenburg at Population 
                                      Health Rear Entry 
    
    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
    Initiated by Master from Main Office                                    Page 28
    Messia Plains Ambulatory
    ================================================================================
    

    After replacement, the new regex S/R gives :

    1947727,Sun Nov 26 00:44:12 2017,(CH: Nov 26 00:47:14),ENTRY GRANTED to USER1313: State Government Security at MainEntry Gate
    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
    1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
    1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
    1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
    1947747,Sun Nov 26 07:27:03 2017,(CH: Nov 26 07:30:07),ENTRY GRANTED to USER6123: Michelle Coghlan at Population HealthRear Entry
    1947748,Sun Nov 26 08:08:34 2017,(CH: Nov 26 08:11:39),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
    1947749,Sun Nov 26 09:08:22 2017,(CH: Nov 26 09:11:27),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
    1947750,Sun Nov 26 09:43:14 2017,(CH: Nov 26 09:46:19),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
    1947751,Sun Nov 26 09:46:48 2017,(CH: Nov 26 09:49:53),ENTRY GRANTED to USER6005: Rose Gorman at Main Entry Gate
    1947752,Sun Nov 26 09:49:50 2017,(CH: Nov 26 09:52:55),ENTRY GRANTED to USER6005: Rose Gorman at ACUTE CARE AT HOME
    1947753,Sun Nov 26 09:56:07 2017,(CH: Nov 26 09:59:11),ENTRY GRANTED to USER6513: Sheela Tuazon at Main Entry Gate
    1947754,Sun Nov 26 09:58:25 2017,(CH: Nov 26 10:01:29),ENTRY GRANTED to USER6513: Sheela Tuazon at ACUTE CARE AT HOME
    1947755,Sun Nov 26 10:01:56 2017,(CH: Nov 26 10:05:00),ENTRY GRANTED to USER6513: Sheela Tuazon at Population HealthRear Entry
    1947756,Sun Nov 26 10:08:53 2017,(CH: Nov 26 10:11:58),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
    

    Voilà !

    Quite sure that it’s what you expect to, don’t you ?

    Cheers,

    guy038

    P.S. :

    • I’ll give some explanations on my regex S/R, in a next post, if the resulting replacement is OK for you !

    • I also assume that the last non-blank line, of your file, is, either :

      • The line, with equal signs, of a header block, followed by a Windows line-break ( \r\n )

      • A verbose output line, possibly followed with any range of \r, \n or \f character(s)



  • Hello All!

    Sorry All! Was away from the desk for a few days, BUT I ran it and it works fine, for the first 30 pages, then after then first 30 pages it does this

    1949977,Tue Nov 28 11:48:52 2017,(CH: Nov 28 11:52:11),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
    
    1949978,Tue Nov 28 11:48:55 2017,(CH: Nov 28 11:52:13),ENTRY GRANTED to USER40: Authorised Student Card 17 at CH RearEntry
    1949979,Tue Nov 28 11:49:06 2017,(CH: Nov 28 11:52:25),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
    1949980,Tue Nov 28 11:49:14 2017,(CH: Nov 28 11:52:33),ENTRY GRANTED to USER6048: Elizabeth Apro at Basement comms Room
    1949981,Tue Nov 28 11:49:29 2017,(CH: Nov 28 11:52:48),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
    1949982,Tue Nov 28 11:49:35 2017,(CH: Nov 28 11:52:54),ENTRY GRANTED to USER6130: Narelle Broad at OH Front Entry
    1949983,Tue Nov 28 11:50:23 2017,(CH: Nov 28 11:53:41),ENTRY GRANTED to USER6048: Elizabeth Apro at Plant Room to CommsRoom door
    1949984,Tue Nov 28 11:50:36 2017,(CH: Nov 28 11:53:54),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Front Entry
    1949985,Tue Nov 28 11:50:40 2017,(CH: Nov 28 11:53:58),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
    1949986,Tue Nov 28 11:51:03 2017,(CH: Nov 28 11:54:22),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
    1949987,Tue Nov 28 11:51:09 2017,(CH: Nov 28 11:54:28),ENTRY GRANTED to USER9828: Rosalinda Whittmaack at AMH Rear Entry
    1949988,Tue Nov 28 11:51:32 2017,(CH: Nov 28 11:54:51),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Rear Entry
    1949989,Tue Nov 28 11:51:51 2017,(CH: Nov 28 11:55:10),ENTRY GRANTED to USER6048: Elizabeth Apro at Staff Entry GroundEast
    1949990,Tue Nov 28 11:54:54 2017,(CH: Nov 28 11:58:13),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
    1949991,Tue Nov 28 11:55:00 2017,(CH: Nov 28 11:58:18),ENTRY GRANTED to USER6295: Rachael Downie at AMH Consult B08
    1949992,Tue Nov 28 11:55:05 2017,(CH: Nov 28 11:58:24),ENTRY GRANTED to USER11235: Kristi Davies at Admin Entry
    1949993,Tue Nov 28 11:55:08 2017,(CH: Nov 28 11:58:27),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
    1949994,Tue Nov 28 11:55:10 2017,(CH: Nov 28 11:58:28),ENTRY GRANTED to USER6375: Ahmed Mansour at AMH Reception
    1949995,Tue Nov 28 11:57:23 2017,(CH: Nov 28 12:00:41),ENTRY GRANTED to USER10593: Emma Phelan at AMH Rear Entry
    1949996,Tue Nov 28 11:57:47 2017,(CH: Nov 28 12:01:06),ENTRY GRANTED to USER6127: Daniel Nguyen at AMH Reception
    1949997,Tue Nov 28 11:57:49 2017,(CH: Nov 28 12:01:07),ENTRY GRANTED to USER10593: Emma Phelan at AMH Clinic B15
    1949998,Tue Nov 28 11:57:56 2017,(CH: Nov 28 12:01:15),ENTRY GRANTED to USER7656: Angela McCoy at A & P Rear Entry
     1949999 Tue Nov 28 11:58:30 2017 (CH: Nov 28 12:01:48)ENTRY GRANTED to USER 
                                      7656: Angela McCoy at Admin Entry 
     1950000 Tue Nov 28 11:58:43 2017 (CH: Nov 28 12:02:01)ENTRY GRANTED to USER 
                                      9886: Janet Meade at Main Entry Gate 
     1950001 Tue Nov 28 11:58:59 2017 (CH: Nov 28 12:02:17)ENTRY GRANTED to USER 
                                      10041: Christina Davidson at A & P Foyer 
                                      Entry 
     1950002 Tue Nov 28 11:59:06 2017 (CH: Nov 28 12:02:24)ENTRY GRANTED to USER 
                                      7656: Angela McCoy at Gym Internal Entry 
    
     1950003 Tue Nov 28 11:59:22 2017 (CH: Nov 28 12:02:40)ENTRY GRANTED to USER 
                                      10041: Christina Davidson at A & P Rear 
                                      Entry 
     1950004 Tue Nov 28 12:00:21 2017 (CH: Nov 28 12:03:39)ENTRY GRANTED to USER 
                                      9886: Janet Meade at Staff Entry Ground East 
     1950005 Tue Nov 28 12:00:40 2017 (CH: Nov 28 12:03:58)ENTRY GRANTED to USER 
                                      6296: Peta Picton at Admin Entry 
     1950006 Tue Nov 28 12:00:52 2017 (CH: Nov 28 12:04:10)ENTRY GRANTED to USER 
                                      6048: Elizabeth Apro at CH Rear Entry 
     1950007 Tue Nov 28 12:01:29 2017 (CH: Nov 28 12:04:47)ENTRY GRANTED to USER 
                                      6296: Peta Picton at A & P Foyer Entry 
     1950008 Tue Nov 28 12:01:42 2017 (CH: Nov 28 12:05:00)ENTRY GRANTED to USER 
                                      10184: Jasmin-marie Lim at Gym Internal 
                                      Entry 
     1950009 Tue Nov 28 12:02:02 2017 (CH: Nov 28 12:05:20)ENTRY GRANTED to USER 
                                      10184: Jasmin-marie Lim at A & P Foyer Entry 
     1950010 Tue Nov 28 12:02:27 2017 (CH: Nov 28 12:05:45)ENTRY GRANTED to USER 
                                      6016: Rhonda Rainbird at Conference External 
                                      Entry 
     1950011 Tue Nov 28 12:02:29 2017 (CH: Nov 28 12:05:47)ENTRY GRANTED to USER 
                                      10184: Jasmin-marie Lim at A & P Rear Entry 
     1950012 Tue Nov 28 12:02:35 2017 (CH: Nov 28 12:05:53)ENTRY GRANTED to USER 
                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
     1950013 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                      6016: Rhonda Rainbird at OH Front Entry 
     1950014 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                      6286: Laura Fitzalan at CH Rear Entry 
     1950015 Tue Nov 28 12:03:05 2017 (CH: Nov 28 12:06:24)ENTRY GRANTED to USER 
                                      6048: Elizabeth Apro at Admin Entry 
     1950016 Tue Nov 28 12:03:10 2017 (CH: Nov 28 12:06:29)ENTRY GRANTED to USER 
                                      6103: Helen McGregor at Staff Entry Ground 
                                      East 
     1950017 Tue Nov 28 12:03:41 2017 (CH: Nov 28 12:06:59)ENTRY GRANTED to USER 
                                      7656: Angela McCoy at Gym Internal Entry 
     1950018 Tue Nov 28 12:03:57 2017 (CH: Nov 28 12:07:16)ENTRY GRANTED to USER 
                                      10041: Christina Davidson at Staff Entry 
                                      Ground East 
     1950019 Tue Nov 28 12:04:12 2017 (CH: Nov 28 12:07:31)ENTRY GRANTED to USER 
                                      10041: Christina Davidson at Basement comms 
                                      Room 
    

    [Output 1(https://ibb.co/cNXgexl)

    Repeating the process doesnt yield any changes, changing

    (?-s)^\x20(194\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

    to

    (?-s)^\x20(195\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

    For what I am guessing is it so filter the 195(next 4 digits) normally blanks the rest of the document and you end up with a single line of text somtimes :(

    I copied 10 pages from the 1950001 to 195xxxx and re-ran the expression, with the 194 changed to 195, and it would normally only change the last line of the page, sometimes get two lines merged into one line

    SO CLOSE! Any thoughts? I could email you the source file if you like?

    Full Source



  • Hello, @mark-allen, @scott-sumner and All,

    Ah, sorry ! I was really wrong about the Incident Number description… I should have read carefully, because you said, in your first post :

    Field 1 - Incident Number (194xxxx —> Onwards)

    And also :

    I have a report about 2 000 000 Lines long,

    So as each transaction is, at least, split in two lines, this means that your file could contain, roughly, 1,000,000 transactions, or a bit less !. As the first number is 1940000 => The last Incident Number should be nearby 2940000

    So I changed the regex, matching the Incident Number, to [12]\d{6}). Note that this regex definition, of an Incident Number, is present, twice, in the overall regex :

    • At its beginning (?-s)^\x20([12]\d{6})\x20(.+)\x20......

    • In the look-ahead part (?=\Z|\R(\f|\x20[12]\d{6}\x20))

    I, also, changed the regex matching the Header, to (?s)\f.*?===\R\R, as there is a pure blank line, after the ===== line !


    Now, the last version of the regex is :

    SEARCH (?-s)^\x20([12]\d{6})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20[12]\d{6}\x20))|(?s)\f.*?===\R\R

    REPLACE ?1\1,\2,\3,\4\5\6\7

    Running that regex S/R against your attached file out.txt ( 14,399,825 bytes and 202,315 lines ) left me with the expected comma separated file, of 9,698,482 bytes, containing 82,068 lines

    Voilà :-))


    Here is, below, some notes about this regex S/R :

    (?-s)                                       #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
    
    ^\x20([12]\d{6})\x20                        #  An 'Incident Number' ([12]\d{6}), as Group 1, PREDEDED and FOLLOWED with a SPACE character \x20
    
    (.+)\x20                                    #  The Date and Time (.+), as Group 2, which ENDS with a SPACE character \x20
    
    (\(.+\))                                    #  The 'Access Control Time' (\(.+\)), as Group 3, surrounded with PARENTHESES, which are ESCAPED to be considered as LITTERALS
    
    (.+)\x20\R                                  #  The FIRST 'Verbose Output' zone (.+), as Group 4, FOLLOWED with a SPACE character \x20 and a LINE-BREAK \R
    
    \x20+(.+)\x20                               #  The SECOND 'Verbose Output' zone (.+), as Group 5, NEXT line, PRECEDED with SPACE characters \x20+ and FOLLOWED with a SPACE character \x20
    
    (?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??  #  The part, between the OUTER parentheses, (?: and )??, may be ASBENT or PRESENT, with priority to ABSENCE
                                                #    It represents the possible THIRD and FOURTH 'Verbose Output' zones (.+), as Groups 6 and 7 :
                                                #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                #        - FOLLOWED with a SPACE character \x20
                                                #  
                                                #  Again, the part, between the INNER parentheses, (?: and )??, may be ABSENT or PRESENT, with priority to ABSENCE
                                                #    It represents the possible FOURTH 'Verbose Output' zone (.+), as Group 7 :
                                                #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                #        - FOLLOWED with a SPACE character \x20
    
    (?=\Z|\R(\f|\x20[12]\d{6}\x20))             #  IF followed with, EITHER :  ( LOOK-AHEAD syntax )
                                                #      - \Z that is to say SOME \r,\n or \f characters, at the VERY END of file
                                                #      - a LINE-BREAK \R, followed with, EITHER :
                                                #          - A CONTROL character \f, beginning a HEADER block
                                                #          - An INCIDENT number, beginning with '1' or '2', [12]\d{6}, PRECEDED and FOLLOWED with a SPACE character \x20
    
    |                                           #  OR
    
    (?s)                                        #  The DOT regex character matches, ABSOLUTELY, ANY character
                                                       
    \f.*?===\R\R                                #   The SHORTEST range of ANY character *.?, EVEN on SEVERAL lines, between :
                                                #       - A FORM-FEED character \f
                                                #       - THREE EQUAL signs, FOLLOWED with 2 LINE-BREAKS \R\R
                                                #  
                                                #   => The COMPLETE HEADER block, with LINE-BREAKS
    

    To end with, here is this same regex S/R, displayed in Free-spacing mode, with a minimum of comments ! You may run this syntax, as well. Just select all the text, below, and paste it, in the Find what: zone of the Replace dialog :-))

    (?x)                                            #  FRE-SPACING regex mode
                                                    #
                                                    #  IMPORTANT : In this mode, any \R syntax must be REPLACED with \r\n 
                                                    #  ---------
    
    (?-s)                                           #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                                                    
    ^\x20([12]\d{6})\x20                            #  Group 1        ( Incident Number )
                                                    
    (.+)\x20                                        #  Group 2        ( Date and Time )
                                                    
    (\(.+\))                                        #  Group 3        ( Access Control Time )
                                                    
    (.+)\x20\r\n                                    #  Group 4        ( 1ST Verbose Output )
                                                    
    \x20+(.+)\x20                                   #  Group 5        ( 2ND Verbose Output )
    
    (?:\r\n\x20+(.+)\x20(?:\r\n\x20+(.+)\x20)??)??  #  Group 6 and 7  ( 3RD and 4TH Verbose Output )
    
    (?=\Z|\r\n(\f|\x20[12]\d{6}\x20))               #  IF the VERY END of file or IF a LINE-BREAK followed with an INCIDENT NUMBER or a HEADER
    
    |                                               #  OR
    
    (?s)                                            #  The DOT regex character matches, ABSOLUTELY, ANY character
    
    \f.*?===\r\n\r\n                                #  The COMPLETE HEADER block, with its LINE-BREAKS
    

    Remember that the Find what: zone cannot store more than 2046 characters !

    Cheers,

    guy038



  • Mate

    Thank you so so so Much for your help with this, I truly mean that, it worked perfectly, If its okay, I am going to distribute this way to the supplier of the product so we can use it to make better reports from this program!!!

    Truly Thank you heaps, and thanks for Explaination as well, I should be able to use this in the future to write up my own rules…

    Again thanks heaps!!!

    Mark


Log in to reply