Community
    • Login

    Wanting to Format 2000000 Lines of Text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    formatting text
    14 Posts 3 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mark AllenM
      Mark Allen
      last edited by

      Howdy and good Day to you all,

      I have a problem that I am hoping that someone will have a solution to, or something to point me in the right direction.

      I have a report about 2 000 000 Lines long, its a outputted formatted report from a *nux based access control system, that the client would like changed from what I’ve given them, to something that can be brought into excel or something similar.

      The issue is as follows, the outputted text, can have 1/2/3/4 lines per incident, which means that all I’ve come up with so far is to go through and delete out a lot of the text, I’ve been at it for a few hours, and I am sure there is an easier way to do this.

      They havent used TAB’s or Commas to delim the fields, rather they have used spaces… Why? I dont know, perhaps its something to do with the export procedure from the *nux box to the windows terminal, I am unsure (Sadly I cannot get back to the terminal to see if there is something else I can get, but when I exported it a few months ago (yes the client got back to me and said they want it in something they can view etc) the only output format available through the terminal was this text file.

      So the basic structure is

      Line 1:
      Line 2:
      Line 3:
      Line 4:

      The basic Fields are

      Field 1 - Incident Number (194xxxx —> Onwards)
      Field 2 - Date
      Field 3 - Time
      Field 4 - The Access Controls Time (So the time in the equipment as apposed to field 2/3 which is the time on the terminal)
      Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

      or

      1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
      6123: Michelle Coghlan at Main Entry Gate
      1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
      6123: Michelle Coghlan at ACUTE CARE AT HOME
      1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
      6123: Michelle Coghlan at ACUTE CARE AT HOME
      1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
      6123: Michelle Coghlan at ACUTE CARE AT HOME
      1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
      6512: Stephen Knynenburg at Main Entry Gate
      1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
      6512: Stephen Knynenburg at Main Entry Gate
      1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
      6512: Stephen Knynenburg at ACUTE CARE AT
      HOME

      I’ve tried a few things, but basically all I’ve come up with is removing the (spaces) on the second and third line, then going to the end of each line and deleting the cr/lf and repeating…

      I am really hoping someone can help with this…

      Thanks!

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @Mark Allen
        last edited by Scott Sumner

        @Mark-Allen

        You’ve given a pretty good example of what you are starting with, but maybe you should give an example (using the same data) as what you want to end up with as well (instead of just loosely describing it: “all I’ve come up with…”).

        You say the results of your effort should be “brought into Excel or…similar”, so I’d suggest that you also verify that the example transformed data actually imports into Excel cleanly and as you expect.

        At that point probably someone here will jump in and offer some helpful advice on how to transform your data.

        1 Reply Last reply Reply Quote 0
        • Mark AllenM
          Mark Allen
          last edited by

          AND!

          Every 80-90 Lines of Text there is a page header as well that I would like to remove, I can simply run CTRL-H and replace the text and make white space and delete it, bit is there a nicer way, or did I hit in on the head the first time?

          What would like to end up with is either

          1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 6512: Stephen Knynenburg at ACUTE CARE AT HOME

          All on one line, rather than being spread over multiple lines OR

          1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER 6512: Stephen Knynenburg at ACUTE CARE AT HOME

          Something that is delimited so I can get some columns going with excel, or such product.

          Thanks!

          1 Reply Last reply Reply Quote 1
          • Mark AllenM
            Mark Allen
            last edited by

            Essentially I would like to find a way to remove all the mutliple lines that belong to the same lines data, so that its easier to read for the client, and they can import it directly into an excel

            eg current if you where to pull it into excel you get ( For : 1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME)

            (The output from the TXT File is as follows

            1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
            6123: Michelle Coghlan at ACUTE CARE AT HOME
            )

            Line 1: Cell 1 Cell 2 Cell 3 Cell 4
            Line 2: Cell 5
            Line 3: Cell 6 (sometimes)
            Line 4: Cell 7 (Sometimes) (Depends on the length of the verbose output and how many characters it has.
            Example
            Line 1: Cell 1 1947729 Cell 2 Sun Nov 26 06:59:32 2017 Cell 3 (CH: Nov 26 07:02:35) Cell 4 ENTRY GRANTED to USER6
            Line 2: Cell 5 123: Michelle Coghlan at ACUTE CARE AT HOME

            Now when I pull the raw txt file into excel I get lines like this

            Screen Capture 1

            Scott SumnerS 1 Reply Last reply Reply Quote 0
            • Scott SumnerS
              Scott Sumner @Mark Allen
              last edited by

              @Mark-Allen

              If I were you, I’d do it in several steps.

              First, remove your page header. Since you didn’t provide any detail, you’re on your own on that. Hopefully you know what to do with this.

              Second, combine your multi-line data into a single line. Basically what you want to do here is to look at how a line starts and if it doesn’t start with a 7-digit number followed by a day of the week, then you want to combine that line with the following line. Here’s what I’d do:

              Find what zone: (?-is)^(\d{7} (?:Sun|Mon|Tue).+)\R(?!\d{7} (?:Sun|Mon|Tue))(.+)
              Replace with zone: \1\2
              Search mode: Regular expression
              Wrap around: ticked
              Action: Press Replace All button (do this multiple times, I guess up to 4 or 5 based upon your original problem statement, until all of the broken lines are combined

              Note: I left out the remaining days of the week, you can easily fill those in between | characters. Exercise for the reader.

              Third, reformat the data for Excel purposes. That part isn’t so hard, so again, left as exercise for the reader.

              Mark AllenM 1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @mark-allen, and All,

                This morning ( in France ), I succeeded to find a correct regex which should do what your want to. But, I was away from home, this afternoon ! This regex changes each multi-lines transaction into a single line transaction and, in addition, separates the different fields with commas ;-))

                Of course, I needed some time to figure out all the problem ! And, as Scott said, in his last post, we need to know the structure of your page header in order to build the correct final regex ! In the meanwhile, I supposed that your file does not contain any header

                In your first post, yous said :

                the outputted text, can have 1/2/3/4 lines per incident

                and

                Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

                So, I imagined the simple test text , below, that you’ll paste in a new tab :

                1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
                <Output_2>
                <Output_3>
                <Output_4>
                1941234 .........
                
                1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
                <Output_2>
                <Output_3>
                1941234 .........
                
                1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
                <Output_2>
                1941234 .........
                

                As you can notice, I supposed that the verbose output is always split in, at least, 2 lines, and sometimes in 3 or 4 lines ( Hence, no verbose output in a single line only ! ). If I’m wrong about it, just tell in order to modify the regex ;-)


                Now, open the Replace dialog

                Insert the regex (?-s)(^194\d{4})\x20(.+)\x20(\(.+\))(.+)\R(.+)(?:\R(.+)(?:\R(.+))??)??(?=\R194\d{4}\x20|\Z) , in the Find what: zone

                Insert the regex \1,\2,\3,\4\5\6\7 , in the Replace with: zone

                Tick the Wrap around option

                Click, once, on the Replace All button or several times on the Replace button

                We get the following text :

                1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
                1941234 .........
                
                1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
                1941234 .........
                
                1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
                1941234 .........
                

                So the important question is : Do you expect such a text ?


                Secondly, using your list of transactions, also given in your first post :

                1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
                6123: Michelle Coghlan at Main Entry Gate
                1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at Main Entry Gate
                1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at Main Entry Gate
                1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at ACUTE CARE AT
                HOME
                

                This same regex S/R produced :

                1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
                1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                

                May be, you noticed that the last line, after replacement, ends with at ACUTE CARE ATHOME, where the words AT and HOME are attached. Quite logical as there is, only, a line-break between …at ACUTE CARE AT and the last line HOME

                Then, in a new Excel page, I inserted that text as External data, indicating that the fields were comma separated and saved it as a common .xls file, without any problem :-)


                Finally, in your first post, you also said :

                I have a report about 2 000 000 Lines long,

                So, I tried to determine if this regex S/R could be done in a sensible time, without any bug or catastrophic backtracking of the N++ regex engine !

                I duplicated your 15 lines text, below :

                1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
                6123: Michelle Coghlan at Main Entry Gate
                1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
                6123: Michelle Coghlan at ACUTE CARE AT HOME
                1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at Main Entry Gate
                1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at Main Entry Gate
                1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
                6512: Stephen Knynenburg at ACUTE CARE AT
                HOME
                

                68,000 times ! So, I was left with a file of 58,412,000 bytes, containing 1,020,000 lines. Then running the above regex S/R, on my old (2005 ) Windows XP laptop ( with only 1GB of RAM ! ) the 476,000 replacements took 69s, getting a file of 57,800,000 bytes, containing 476,000 lines ;-))

                Consequently, I suppose that your file with 2,000,000 lines should be, correctly, handled, with a recent laptop and OS !

                So, just tell us about your header and it’ll be only a matter of minutes to get the job done ;-))

                See you later,

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 0
                • Mark AllenM
                  Mark Allen @Scott Sumner
                  last edited by

                  Morning All! (Or Afternoon or Evening!)

                  Scott! You are correct I didnt give you the header!

                  This is the header, which is static throughout, the only thing that changes is the Page #

                  This is the header, I can go through and essentially remove it manually or use CTRL-H and replace the Text with a “space”

                  Door User Activity Reported on Fri Mar 2 09:50:00 2018
                  Initiated by Master from Main Office Page 6292
                  Messia Plains Ambulatory

                  I’ve tried your expression, and sadly it didnt work for me,

                  In the Find What Line :(?-is)^(\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+)\R(?!\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat))(.+)
                  In the Replace with: \1\2

                  Each time it has come up with Zero Lines Replaced

                  I made the sample size smaller, just 1 Page, and the same issue, made it just one line, and he same issue (Zero Lines replaced), I thought perhaps i might have an outdated version, and I’ve updated to the latest version effective today (7.5.6 32 Bit) I have no plug-ins enabled or installed

                  Sample Set 1
                  Expression 1

                  I then thought perhaps the data because it only has monday (Mon) in it, was breaking the expression (I know, I doubted it as well, but its a bit above my head, but wanted to try!)

                  I removed Tuesday-Sunday, and left monday only, Still the same result.

                  Monday Only Expression

                  So at a bit of a loss with that one, I HAVE to be missing something there or some plig in or something for that… :(

                  I’ve put up 4 pages of the text so you can see the formatting

                  Sample Text

                  @guy038

                  Thanks for your response as well!!

                  I’ve tried your expression as well, and I get the same response, it tells me that there are zero changes, I’ve hit the button many times just to ensure I am not the reason for this

                  One thing I’ve noticed is that there is a Space before the first 7 digit character, will that upset things?

                  Space before first 7 Digit Number

                  What you have outputted is exactly what I am after

                  1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
                  1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                  1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                  1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME

                  One Line, with comma’s that can be brought into excel nice and neat!

                  I will keep playing, but any thoughts would be appreciated! and I will be sure to save this away if asked again!

                  Mark

                  nb. Sorry dont know the syntax to nicely put things into code box’s

                  1 Reply Last reply Reply Quote 0
                  • Scott SumnerS
                    Scott Sumner
                    last edited by Scott Sumner

                    @Mark-Allen

                    The simplest way to represent your data accurately here is by pasting it between a line containing ```z and a line containing ```

                    Thus, if you do the following:

                    ```z
                    (copy and paste your real data HERE)
                    ```

                    It (some of what I think your real data is) will render as:

                     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                                      HOME
                     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                                      HOME
                    

                    And really, accurately representing your data is the only way to get good advice. Otherwise, responders are guessing, and when you try their advice you get Replace: No occurrence was found. Then when you say, “Oh, that’s not really what I meant…” and attempt to clarify, the helpers have lost interest and have moved on to more interesting pursuits…

                    Note: You can also indent your entire dataset by 4 spaces and it will render like the following, but I find the above method easier (and the colors are often wacky in what follows).

                     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                                      HOME
                     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                      6123: Michelle Coghlan at ACUTE CARE AT 
                                                      HOME
                    

                    Also note: Screenshots aren’t going to cut it. Nobody is going to retype your data from a screenshot to test out a proposed match expression.

                    Scott SumnerS 1 Reply Last reply Reply Quote 0
                    • Scott SumnerS
                      Scott Sumner @Scott Sumner
                      last edited by

                      @Mark-Allen

                      I forgot to mention that your “Sample Text” link is good in that it has copyable text, but it is bad in that it doesn’t preserve the leading spaces (as shown in some of the screenshot versions) on the lines! :-(

                      1 Reply Last reply Reply Quote 0
                      • Mark AllenM
                        Mark Allen
                        last edited by

                        agh! Cant Win :) But this is the data

                        1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                          1313: State Government Security at Main 
                                                          Entry Gate 
                         1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                          6123: Michelle Coghlan at Main Entry Gate 
                         1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                          6123: Michelle Coghlan at ACUTE CARE AT HOME 
                         1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                          6123: Michelle Coghlan at ACUTE CARE AT HOME 
                         1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                          6123: Michelle Coghlan at ACUTE CARE AT HOME 
                         1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at Main Entry Gate 
                         1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at Main Entry Gate 
                         1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at ACUTE CARE AT 
                                                          HOME 
                         1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                          6123: Michelle Coghlan at Population Health 
                                                          Rear Entry 
                         1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at ACUTE CARE AT 
                                                          HOME 
                         1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at Population 
                                                          Health Rear Entry 
                         1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at Population 
                                                          Health Rear Entry 
                         1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                          6005: Rose Gorman at Main Entry Gate 
                         1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                          6005: Rose Gorman at ACUTE CARE AT HOME 
                         1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                          6513: Sheela Tuazon at Main Entry Gate 
                         1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                          6513: Sheela Tuazon at ACUTE CARE AT HOME 
                         1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                          6513: Sheela Tuazon at Population Health 
                                                          Rear Entry 
                         1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                          6512: Stephen Knynenburg at Population 
                                                          Health Rear Entry 
                        
                        Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                        Initiated by Master from Main Office                                    Page 2
                        Messia Plains Ambulatory
                        ================================================================================
                        
                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @mark-allen, @scott-sumner and All,

                          Ah ! Of course, it’s easy to understand why my regex could not match any string ;-)) Indeed, when regular expressions are involved, even a single space, added or deleted in text, is of importance !

                          From your recent picture, I noticed four particularities, in your list :

                          • Any incident number is separated from beginning of line by a space character

                          • The verbose output zones are, both :

                            • Preceded by 1 or more space character(s)

                            • Followed by 1 space character

                          • Each header block :

                            • Begins with the Form Feed character ( the ASCII page-breaking control character, of Unicode value \x{000C}, which can be matched, in regular expressions, with the \f syntax

                            • Ends with several equal signs ( = )

                          Refer to https://en.wikipedia.org/wiki/Page_break#Form_feed

                          With all these elements, it’s not difficult to modify my previous regex, to be close to your real text ;-))


                          So again, let’s begin with this short example :

                           1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                            <Output_2> 
                                                            <Output_3> 
                                                            <Output_4> 
                           1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                            <Output_2> 
                                                            <Output_3> 
                          
                          Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                          Initiated by Master from Main Office                                    Page 2
                          Messia Plains Ambulatory
                          ================================================================================
                           1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                            <Output_2> 
                           1941234 ......... 
                          

                          Using my new regex, below :

                          SEARCH (?-s)^\x20(194\d{4})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                          REPLACE ?1\1,\2,\3,\4\5\6\7

                          We get the following text :

                          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
                          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
                          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
                           1941234 ......... 
                          

                          Secondly, given your list of transactions, below, of your LAST post, where I simply added :

                          • One space before the first incident number, which was missing

                          • A header block, after the 5th and 11th transactions

                           1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                            1313: State Government Security at Main 
                                                            Entry Gate 
                           1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                            6123: Michelle Coghlan at Main Entry Gate 
                           1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                            6123: Michelle Coghlan at ACUTE CARE AT HOME 
                           1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                            6123: Michelle Coghlan at ACUTE CARE AT HOME 
                           1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                            6123: Michelle Coghlan at ACUTE CARE AT HOME 
                          
                          Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                          Initiated by Master from Main Office                                    Page 26
                          Messia Plains Ambulatory
                          ================================================================================
                           1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at Main Entry Gate 
                           1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at Main Entry Gate 
                           1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at ACUTE CARE AT 
                                                            HOME 
                           1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                            6123: Michelle Coghlan at Population Health 
                                                            Rear Entry 
                           1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at ACUTE CARE AT 
                                                            HOME 
                           1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at Population 
                                                            Health Rear Entry 
                          
                          Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                          Initiated by Master from Main Office                                    Page 27
                          Messia Plains Ambulatory
                          ================================================================================
                           1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at Population 
                                                            Health Rear Entry 
                           1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                            6005: Rose Gorman at Main Entry Gate 
                           1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                            6005: Rose Gorman at ACUTE CARE AT HOME 
                           1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                            6513: Sheela Tuazon at Main Entry Gate 
                           1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                            6513: Sheela Tuazon at ACUTE CARE AT HOME 
                           1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                            6513: Sheela Tuazon at Population Health 
                                                            Rear Entry 
                           1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                            6512: Stephen Knynenburg at Population 
                                                            Health Rear Entry 
                          
                          Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                          Initiated by Master from Main Office                                    Page 28
                          Messia Plains Ambulatory
                          ================================================================================
                          

                          After replacement, the new regex S/R gives :

                          1947727,Sun Nov 26 00:44:12 2017,(CH: Nov 26 00:47:14),ENTRY GRANTED to USER1313: State Government Security at MainEntry Gate
                          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
                          1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                          1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                          1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                          1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                          1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                          1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                          1947747,Sun Nov 26 07:27:03 2017,(CH: Nov 26 07:30:07),ENTRY GRANTED to USER6123: Michelle Coghlan at Population HealthRear Entry
                          1947748,Sun Nov 26 08:08:34 2017,(CH: Nov 26 08:11:39),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                          1947749,Sun Nov 26 09:08:22 2017,(CH: Nov 26 09:11:27),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                          1947750,Sun Nov 26 09:43:14 2017,(CH: Nov 26 09:46:19),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                          1947751,Sun Nov 26 09:46:48 2017,(CH: Nov 26 09:49:53),ENTRY GRANTED to USER6005: Rose Gorman at Main Entry Gate
                          1947752,Sun Nov 26 09:49:50 2017,(CH: Nov 26 09:52:55),ENTRY GRANTED to USER6005: Rose Gorman at ACUTE CARE AT HOME
                          1947753,Sun Nov 26 09:56:07 2017,(CH: Nov 26 09:59:11),ENTRY GRANTED to USER6513: Sheela Tuazon at Main Entry Gate
                          1947754,Sun Nov 26 09:58:25 2017,(CH: Nov 26 10:01:29),ENTRY GRANTED to USER6513: Sheela Tuazon at ACUTE CARE AT HOME
                          1947755,Sun Nov 26 10:01:56 2017,(CH: Nov 26 10:05:00),ENTRY GRANTED to USER6513: Sheela Tuazon at Population HealthRear Entry
                          1947756,Sun Nov 26 10:08:53 2017,(CH: Nov 26 10:11:58),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                          

                          Voilà !

                          Quite sure that it’s what you expect to, don’t you ?

                          Cheers,

                          guy038

                          P.S. :

                          • I’ll give some explanations on my regex S/R, in a next post, if the resulting replacement is OK for you !

                          • I also assume that the last non-blank line, of your file, is, either :

                            • The line, with equal signs, of a header block, followed by a Windows line-break ( \r\n )

                            • A verbose output line, possibly followed with any range of \r, \n or \f character(s)

                          1 Reply Last reply Reply Quote 1
                          • Mark AllenM
                            Mark Allen
                            last edited by

                            Hello All!

                            Sorry All! Was away from the desk for a few days, BUT I ran it and it works fine, for the first 30 pages, then after then first 30 pages it does this

                            1949977,Tue Nov 28 11:48:52 2017,(CH: Nov 28 11:52:11),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                            
                            1949978,Tue Nov 28 11:48:55 2017,(CH: Nov 28 11:52:13),ENTRY GRANTED to USER40: Authorised Student Card 17 at CH RearEntry
                            1949979,Tue Nov 28 11:49:06 2017,(CH: Nov 28 11:52:25),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                            1949980,Tue Nov 28 11:49:14 2017,(CH: Nov 28 11:52:33),ENTRY GRANTED to USER6048: Elizabeth Apro at Basement comms Room
                            1949981,Tue Nov 28 11:49:29 2017,(CH: Nov 28 11:52:48),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                            1949982,Tue Nov 28 11:49:35 2017,(CH: Nov 28 11:52:54),ENTRY GRANTED to USER6130: Narelle Broad at OH Front Entry
                            1949983,Tue Nov 28 11:50:23 2017,(CH: Nov 28 11:53:41),ENTRY GRANTED to USER6048: Elizabeth Apro at Plant Room to CommsRoom door
                            1949984,Tue Nov 28 11:50:36 2017,(CH: Nov 28 11:53:54),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Front Entry
                            1949985,Tue Nov 28 11:50:40 2017,(CH: Nov 28 11:53:58),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                            1949986,Tue Nov 28 11:51:03 2017,(CH: Nov 28 11:54:22),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                            1949987,Tue Nov 28 11:51:09 2017,(CH: Nov 28 11:54:28),ENTRY GRANTED to USER9828: Rosalinda Whittmaack at AMH Rear Entry
                            1949988,Tue Nov 28 11:51:32 2017,(CH: Nov 28 11:54:51),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Rear Entry
                            1949989,Tue Nov 28 11:51:51 2017,(CH: Nov 28 11:55:10),ENTRY GRANTED to USER6048: Elizabeth Apro at Staff Entry GroundEast
                            1949990,Tue Nov 28 11:54:54 2017,(CH: Nov 28 11:58:13),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                            1949991,Tue Nov 28 11:55:00 2017,(CH: Nov 28 11:58:18),ENTRY GRANTED to USER6295: Rachael Downie at AMH Consult B08
                            1949992,Tue Nov 28 11:55:05 2017,(CH: Nov 28 11:58:24),ENTRY GRANTED to USER11235: Kristi Davies at Admin Entry
                            1949993,Tue Nov 28 11:55:08 2017,(CH: Nov 28 11:58:27),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                            1949994,Tue Nov 28 11:55:10 2017,(CH: Nov 28 11:58:28),ENTRY GRANTED to USER6375: Ahmed Mansour at AMH Reception
                            1949995,Tue Nov 28 11:57:23 2017,(CH: Nov 28 12:00:41),ENTRY GRANTED to USER10593: Emma Phelan at AMH Rear Entry
                            1949996,Tue Nov 28 11:57:47 2017,(CH: Nov 28 12:01:06),ENTRY GRANTED to USER6127: Daniel Nguyen at AMH Reception
                            1949997,Tue Nov 28 11:57:49 2017,(CH: Nov 28 12:01:07),ENTRY GRANTED to USER10593: Emma Phelan at AMH Clinic B15
                            1949998,Tue Nov 28 11:57:56 2017,(CH: Nov 28 12:01:15),ENTRY GRANTED to USER7656: Angela McCoy at A & P Rear Entry
                             1949999 Tue Nov 28 11:58:30 2017 (CH: Nov 28 12:01:48)ENTRY GRANTED to USER 
                                                              7656: Angela McCoy at Admin Entry 
                             1950000 Tue Nov 28 11:58:43 2017 (CH: Nov 28 12:02:01)ENTRY GRANTED to USER 
                                                              9886: Janet Meade at Main Entry Gate 
                             1950001 Tue Nov 28 11:58:59 2017 (CH: Nov 28 12:02:17)ENTRY GRANTED to USER 
                                                              10041: Christina Davidson at A & P Foyer 
                                                              Entry 
                             1950002 Tue Nov 28 11:59:06 2017 (CH: Nov 28 12:02:24)ENTRY GRANTED to USER 
                                                              7656: Angela McCoy at Gym Internal Entry 
                            
                             1950003 Tue Nov 28 11:59:22 2017 (CH: Nov 28 12:02:40)ENTRY GRANTED to USER 
                                                              10041: Christina Davidson at A & P Rear 
                                                              Entry 
                             1950004 Tue Nov 28 12:00:21 2017 (CH: Nov 28 12:03:39)ENTRY GRANTED to USER 
                                                              9886: Janet Meade at Staff Entry Ground East 
                             1950005 Tue Nov 28 12:00:40 2017 (CH: Nov 28 12:03:58)ENTRY GRANTED to USER 
                                                              6296: Peta Picton at Admin Entry 
                             1950006 Tue Nov 28 12:00:52 2017 (CH: Nov 28 12:04:10)ENTRY GRANTED to USER 
                                                              6048: Elizabeth Apro at CH Rear Entry 
                             1950007 Tue Nov 28 12:01:29 2017 (CH: Nov 28 12:04:47)ENTRY GRANTED to USER 
                                                              6296: Peta Picton at A & P Foyer Entry 
                             1950008 Tue Nov 28 12:01:42 2017 (CH: Nov 28 12:05:00)ENTRY GRANTED to USER 
                                                              10184: Jasmin-marie Lim at Gym Internal 
                                                              Entry 
                             1950009 Tue Nov 28 12:02:02 2017 (CH: Nov 28 12:05:20)ENTRY GRANTED to USER 
                                                              10184: Jasmin-marie Lim at A & P Foyer Entry 
                             1950010 Tue Nov 28 12:02:27 2017 (CH: Nov 28 12:05:45)ENTRY GRANTED to USER 
                                                              6016: Rhonda Rainbird at Conference External 
                                                              Entry 
                             1950011 Tue Nov 28 12:02:29 2017 (CH: Nov 28 12:05:47)ENTRY GRANTED to USER 
                                                              10184: Jasmin-marie Lim at A & P Rear Entry 
                             1950012 Tue Nov 28 12:02:35 2017 (CH: Nov 28 12:05:53)ENTRY GRANTED to USER 
                                                              6123: Michelle Coghlan at ACUTE CARE AT HOME 
                             1950013 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                              6016: Rhonda Rainbird at OH Front Entry 
                             1950014 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                              6286: Laura Fitzalan at CH Rear Entry 
                             1950015 Tue Nov 28 12:03:05 2017 (CH: Nov 28 12:06:24)ENTRY GRANTED to USER 
                                                              6048: Elizabeth Apro at Admin Entry 
                             1950016 Tue Nov 28 12:03:10 2017 (CH: Nov 28 12:06:29)ENTRY GRANTED to USER 
                                                              6103: Helen McGregor at Staff Entry Ground 
                                                              East 
                             1950017 Tue Nov 28 12:03:41 2017 (CH: Nov 28 12:06:59)ENTRY GRANTED to USER 
                                                              7656: Angela McCoy at Gym Internal Entry 
                             1950018 Tue Nov 28 12:03:57 2017 (CH: Nov 28 12:07:16)ENTRY GRANTED to USER 
                                                              10041: Christina Davidson at Staff Entry 
                                                              Ground East 
                             1950019 Tue Nov 28 12:04:12 2017 (CH: Nov 28 12:07:31)ENTRY GRANTED to USER 
                                                              10041: Christina Davidson at Basement comms 
                                                              Room 
                            

                            [Output 1(https://ibb.co/cNXgexl)

                            Repeating the process doesnt yield any changes, changing

                            (?-s)^\x20(194\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                            to

                            (?-s)^\x20(195\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                            For what I am guessing is it so filter the 195(next 4 digits) normally blanks the rest of the document and you end up with a single line of text somtimes :(

                            I copied 10 pages from the 1950001 to 195xxxx and re-ran the expression, with the 194 changed to 195, and it would normally only change the last line of the page, sometimes get two lines merged into one line

                            SO CLOSE! Any thoughts? I could email you the source file if you like?

                            Full Source

                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038

                              Hello, @mark-allen, @scott-sumner and All,

                              Ah, sorry ! I was really wrong about the Incident Number description… I should have read carefully, because you said, in your first post :

                              Field 1 - Incident Number (194xxxx —> Onwards)

                              And also :

                              I have a report about 2 000 000 Lines long,

                              So as each transaction is, at least, split in two lines, this means that your file could contain, roughly, 1,000,000 transactions, or a bit less !. As the first number is 1940000 => The last Incident Number should be nearby 2940000

                              So I changed the regex, matching the Incident Number, to [12]\d{6}). Note that this regex definition, of an Incident Number, is present, twice, in the overall regex :

                              • At its beginning (?-s)^\x20([12]\d{6})\x20(.+)\x20......

                              • In the look-ahead part (?=\Z|\R(\f|\x20[12]\d{6}\x20))

                              I, also, changed the regex matching the Header, to (?s)\f.*?===\R\R, as there is a pure blank line, after the ===== line !


                              Now, the last version of the regex is :

                              SEARCH (?-s)^\x20([12]\d{6})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20[12]\d{6}\x20))|(?s)\f.*?===\R\R

                              REPLACE ?1\1,\2,\3,\4\5\6\7

                              Running that regex S/R against your attached file out.txt ( 14,399,825 bytes and 202,315 lines ) left me with the expected comma separated file, of 9,698,482 bytes, containing 82,068 lines

                              Voilà :-))


                              Here is, below, some notes about this regex S/R :

                              (?-s)                                       #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                              
                              ^\x20([12]\d{6})\x20                        #  An 'Incident Number' ([12]\d{6}), as Group 1, PREDEDED and FOLLOWED with a SPACE character \x20
                              
                              (.+)\x20                                    #  The Date and Time (.+), as Group 2, which ENDS with a SPACE character \x20
                              
                              (\(.+\))                                    #  The 'Access Control Time' (\(.+\)), as Group 3, surrounded with PARENTHESES, which are ESCAPED to be considered as LITTERALS
                              
                              (.+)\x20\R                                  #  The FIRST 'Verbose Output' zone (.+), as Group 4, FOLLOWED with a SPACE character \x20 and a LINE-BREAK \R
                              
                              \x20+(.+)\x20                               #  The SECOND 'Verbose Output' zone (.+), as Group 5, NEXT line, PRECEDED with SPACE characters \x20+ and FOLLOWED with a SPACE character \x20
                              
                              (?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??  #  The part, between the OUTER parentheses, (?: and )??, may be ASBENT or PRESENT, with priority to ABSENCE
                                                                          #    It represents the possible THIRD and FOURTH 'Verbose Output' zones (.+), as Groups 6 and 7 :
                                                                          #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                          #        - FOLLOWED with a SPACE character \x20
                                                                          #  
                                                                          #  Again, the part, between the INNER parentheses, (?: and )??, may be ABSENT or PRESENT, with priority to ABSENCE
                                                                          #    It represents the possible FOURTH 'Verbose Output' zone (.+), as Group 7 :
                                                                          #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                          #        - FOLLOWED with a SPACE character \x20
                              
                              (?=\Z|\R(\f|\x20[12]\d{6}\x20))             #  IF followed with, EITHER :  ( LOOK-AHEAD syntax )
                                                                          #      - \Z that is to say SOME \r,\n or \f characters, at the VERY END of file
                                                                          #      - a LINE-BREAK \R, followed with, EITHER :
                                                                          #          - A CONTROL character \f, beginning a HEADER block
                                                                          #          - An INCIDENT number, beginning with '1' or '2', [12]\d{6}, PRECEDED and FOLLOWED with a SPACE character \x20
                              
                              |                                           #  OR
                              
                              (?s)                                        #  The DOT regex character matches, ABSOLUTELY, ANY character
                                                                                 
                              \f.*?===\R\R                                #   The SHORTEST range of ANY character *.?, EVEN on SEVERAL lines, between :
                                                                          #       - A FORM-FEED character \f
                                                                          #       - THREE EQUAL signs, FOLLOWED with 2 LINE-BREAKS \R\R
                                                                          #  
                                                                          #   => The COMPLETE HEADER block, with LINE-BREAKS
                              

                              To end with, here is this same regex S/R, displayed in Free-spacing mode, with a minimum of comments ! You may run this syntax, as well. Just select all the text, below, and paste it, in the Find what: zone of the Replace dialog :-))

                              (?x)                                            #  FRE-SPACING regex mode
                                                                              #
                                                                              #  IMPORTANT : In this mode, any \R syntax must be REPLACED with \r\n 
                                                                              #  ---------
                              
                              (?-s)                                           #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                                                                              
                              ^\x20([12]\d{6})\x20                            #  Group 1        ( Incident Number )
                                                                              
                              (.+)\x20                                        #  Group 2        ( Date and Time )
                                                                              
                              (\(.+\))                                        #  Group 3        ( Access Control Time )
                                                                              
                              (.+)\x20\r\n                                    #  Group 4        ( 1ST Verbose Output )
                                                                              
                              \x20+(.+)\x20                                   #  Group 5        ( 2ND Verbose Output )
                              
                              (?:\r\n\x20+(.+)\x20(?:\r\n\x20+(.+)\x20)??)??  #  Group 6 and 7  ( 3RD and 4TH Verbose Output )
                              
                              (?=\Z|\r\n(\f|\x20[12]\d{6}\x20))               #  IF the VERY END of file or IF a LINE-BREAK followed with an INCIDENT NUMBER or a HEADER
                              
                              |                                               #  OR
                              
                              (?s)                                            #  The DOT regex character matches, ABSOLUTELY, ANY character
                              
                              \f.*?===\r\n\r\n                                #  The COMPLETE HEADER block, with its LINE-BREAKS
                              

                              Remember that the Find what: zone cannot store more than 2046 characters !

                              Cheers,

                              guy038

                              1 Reply Last reply Reply Quote 0
                              • Mark AllenM
                                Mark Allen
                                last edited by

                                Mate

                                Thank you so so so Much for your help with this, I truly mean that, it worked perfectly, If its okay, I am going to distribute this way to the supplier of the product so we can use it to make better reports from this program!!!

                                Truly Thank you heaps, and thanks for Explaination as well, I should be able to use this in the future to write up my own rules…

                                Again thanks heaps!!!

                                Mark

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors