Community
    • Login

    Wanting to Format 2000000 Lines of Text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    formatting text
    14 Posts 3 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner @Mark Allen
      last edited by

      @Mark-Allen

      If I were you, I’d do it in several steps.

      First, remove your page header. Since you didn’t provide any detail, you’re on your own on that. Hopefully you know what to do with this.

      Second, combine your multi-line data into a single line. Basically what you want to do here is to look at how a line starts and if it doesn’t start with a 7-digit number followed by a day of the week, then you want to combine that line with the following line. Here’s what I’d do:

      Find what zone: (?-is)^(\d{7} (?:Sun|Mon|Tue).+)\R(?!\d{7} (?:Sun|Mon|Tue))(.+)
      Replace with zone: \1\2
      Search mode: Regular expression
      Wrap around: ticked
      Action: Press Replace All button (do this multiple times, I guess up to 4 or 5 based upon your original problem statement, until all of the broken lines are combined

      Note: I left out the remaining days of the week, you can easily fill those in between | characters. Exercise for the reader.

      Third, reformat the data for Excel purposes. That part isn’t so hard, so again, left as exercise for the reader.

      Mark AllenM 1 Reply Last reply Reply Quote 1
      • guy038G
        guy038
        last edited by guy038

        Hello, @mark-allen, and All,

        This morning ( in France ), I succeeded to find a correct regex which should do what your want to. But, I was away from home, this afternoon ! This regex changes each multi-lines transaction into a single line transaction and, in addition, separates the different fields with commas ;-))

        Of course, I needed some time to figure out all the problem ! And, as Scott said, in his last post, we need to know the structure of your page header in order to build the correct final regex ! In the meanwhile, I supposed that your file does not contain any header

        In your first post, yous said :

        the outputted text, can have 1/2/3/4 lines per incident

        and

        Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

        So, I imagined the simple test text , below, that you’ll paste in a new tab :

        1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
        <Output_2>
        <Output_3>
        <Output_4>
        1941234 .........
        
        1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
        <Output_2>
        <Output_3>
        1941234 .........
        
        1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
        <Output_2>
        1941234 .........
        

        As you can notice, I supposed that the verbose output is always split in, at least, 2 lines, and sometimes in 3 or 4 lines ( Hence, no verbose output in a single line only ! ). If I’m wrong about it, just tell in order to modify the regex ;-)


        Now, open the Replace dialog

        Insert the regex (?-s)(^194\d{4})\x20(.+)\x20(\(.+\))(.+)\R(.+)(?:\R(.+)(?:\R(.+))??)??(?=\R194\d{4}\x20|\Z) , in the Find what: zone

        Insert the regex \1,\2,\3,\4\5\6\7 , in the Replace with: zone

        Tick the Wrap around option

        Click, once, on the Replace All button or several times on the Replace button

        We get the following text :

        1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
        1941234 .........
        
        1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
        1941234 .........
        
        1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
        1941234 .........
        

        So the important question is : Do you expect such a text ?


        Secondly, using your list of transactions, also given in your first post :

        1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
        6123: Michelle Coghlan at Main Entry Gate
        1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at Main Entry Gate
        1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at Main Entry Gate
        1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at ACUTE CARE AT
        HOME
        

        This same regex S/R produced :

        1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
        1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
        1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
        1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
        

        May be, you noticed that the last line, after replacement, ends with at ACUTE CARE ATHOME, where the words AT and HOME are attached. Quite logical as there is, only, a line-break between …at ACUTE CARE AT and the last line HOME

        Then, in a new Excel page, I inserted that text as External data, indicating that the fields were comma separated and saved it as a common .xls file, without any problem :-)


        Finally, in your first post, you also said :

        I have a report about 2 000 000 Lines long,

        So, I tried to determine if this regex S/R could be done in a sensible time, without any bug or catastrophic backtracking of the N++ regex engine !

        I duplicated your 15 lines text, below :

        1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
        6123: Michelle Coghlan at Main Entry Gate
        1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
        6123: Michelle Coghlan at ACUTE CARE AT HOME
        1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at Main Entry Gate
        1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at Main Entry Gate
        1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
        6512: Stephen Knynenburg at ACUTE CARE AT
        HOME
        

        68,000 times ! So, I was left with a file of 58,412,000 bytes, containing 1,020,000 lines. Then running the above regex S/R, on my old (2005 ) Windows XP laptop ( with only 1GB of RAM ! ) the 476,000 replacements took 69s, getting a file of 57,800,000 bytes, containing 476,000 lines ;-))

        Consequently, I suppose that your file with 2,000,000 lines should be, correctly, handled, with a recent laptop and OS !

        So, just tell us about your header and it’ll be only a matter of minutes to get the job done ;-))

        See you later,

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • Mark AllenM
          Mark Allen @Scott Sumner
          last edited by

          Morning All! (Or Afternoon or Evening!)

          Scott! You are correct I didnt give you the header!

          This is the header, which is static throughout, the only thing that changes is the Page #

          This is the header, I can go through and essentially remove it manually or use CTRL-H and replace the Text with a “space”

          Door User Activity Reported on Fri Mar 2 09:50:00 2018
          Initiated by Master from Main Office Page 6292
          Messia Plains Ambulatory

          I’ve tried your expression, and sadly it didnt work for me,

          In the Find What Line :(?-is)^(\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+)\R(?!\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat))(.+)
          In the Replace with: \1\2

          Each time it has come up with Zero Lines Replaced

          I made the sample size smaller, just 1 Page, and the same issue, made it just one line, and he same issue (Zero Lines replaced), I thought perhaps i might have an outdated version, and I’ve updated to the latest version effective today (7.5.6 32 Bit) I have no plug-ins enabled or installed

          Sample Set 1
          Expression 1

          I then thought perhaps the data because it only has monday (Mon) in it, was breaking the expression (I know, I doubted it as well, but its a bit above my head, but wanted to try!)

          I removed Tuesday-Sunday, and left monday only, Still the same result.

          Monday Only Expression

          So at a bit of a loss with that one, I HAVE to be missing something there or some plig in or something for that… :(

          I’ve put up 4 pages of the text so you can see the formatting

          Sample Text

          @guy038

          Thanks for your response as well!!

          I’ve tried your expression as well, and I get the same response, it tells me that there are zero changes, I’ve hit the button many times just to ensure I am not the reason for this

          One thing I’ve noticed is that there is a Space before the first 7 digit character, will that upset things?

          Space before first 7 Digit Number

          What you have outputted is exactly what I am after

          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
          1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME

          One Line, with comma’s that can be brought into excel nice and neat!

          I will keep playing, but any thoughts would be appreciated! and I will be sure to save this away if asked again!

          Mark

          nb. Sorry dont know the syntax to nicely put things into code box’s

          1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner
            last edited by Scott Sumner

            @Mark-Allen

            The simplest way to represent your data accurately here is by pasting it between a line containing ```z and a line containing ```

            Thus, if you do the following:

            ```z
            (copy and paste your real data HERE)
            ```

            It (some of what I think your real data is) will render as:

             1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                              6123: Michelle Coghlan at ACUTE CARE AT 
                                              HOME
             1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                              6123: Michelle Coghlan at ACUTE CARE AT 
                                              HOME
            

            And really, accurately representing your data is the only way to get good advice. Otherwise, responders are guessing, and when you try their advice you get Replace: No occurrence was found. Then when you say, “Oh, that’s not really what I meant…” and attempt to clarify, the helpers have lost interest and have moved on to more interesting pursuits…

            Note: You can also indent your entire dataset by 4 spaces and it will render like the following, but I find the above method easier (and the colors are often wacky in what follows).

             1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                              6123: Michelle Coghlan at ACUTE CARE AT 
                                              HOME
             1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                              6123: Michelle Coghlan at ACUTE CARE AT 
                                              HOME
            

            Also note: Screenshots aren’t going to cut it. Nobody is going to retype your data from a screenshot to test out a proposed match expression.

            Scott SumnerS 1 Reply Last reply Reply Quote 0
            • Scott SumnerS
              Scott Sumner @Scott Sumner
              last edited by

              @Mark-Allen

              I forgot to mention that your “Sample Text” link is good in that it has copyable text, but it is bad in that it doesn’t preserve the leading spaces (as shown in some of the screenshot versions) on the lines! :-(

              1 Reply Last reply Reply Quote 0
              • Mark AllenM
                Mark Allen
                last edited by

                agh! Cant Win :) But this is the data

                1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                  1313: State Government Security at Main 
                                                  Entry Gate 
                 1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                  6123: Michelle Coghlan at Main Entry Gate 
                 1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                  6123: Michelle Coghlan at ACUTE CARE AT HOME 
                 1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                  6123: Michelle Coghlan at ACUTE CARE AT HOME 
                 1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                  6123: Michelle Coghlan at ACUTE CARE AT HOME 
                 1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at Main Entry Gate 
                 1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at Main Entry Gate 
                 1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at ACUTE CARE AT 
                                                  HOME 
                 1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                  6123: Michelle Coghlan at Population Health 
                                                  Rear Entry 
                 1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at ACUTE CARE AT 
                                                  HOME 
                 1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at Population 
                                                  Health Rear Entry 
                 1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at Population 
                                                  Health Rear Entry 
                 1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                  6005: Rose Gorman at Main Entry Gate 
                 1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                  6005: Rose Gorman at ACUTE CARE AT HOME 
                 1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                  6513: Sheela Tuazon at Main Entry Gate 
                 1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                  6513: Sheela Tuazon at ACUTE CARE AT HOME 
                 1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                  6513: Sheela Tuazon at Population Health 
                                                  Rear Entry 
                 1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                  6512: Stephen Knynenburg at Population 
                                                  Health Rear Entry 
                
                Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                Initiated by Master from Main Office                                    Page 2
                Messia Plains Ambulatory
                ================================================================================
                
                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @mark-allen, @scott-sumner and All,

                  Ah ! Of course, it’s easy to understand why my regex could not match any string ;-)) Indeed, when regular expressions are involved, even a single space, added or deleted in text, is of importance !

                  From your recent picture, I noticed four particularities, in your list :

                  • Any incident number is separated from beginning of line by a space character

                  • The verbose output zones are, both :

                    • Preceded by 1 or more space character(s)

                    • Followed by 1 space character

                  • Each header block :

                    • Begins with the Form Feed character ( the ASCII page-breaking control character, of Unicode value \x{000C}, which can be matched, in regular expressions, with the \f syntax

                    • Ends with several equal signs ( = )

                  Refer to https://en.wikipedia.org/wiki/Page_break#Form_feed

                  With all these elements, it’s not difficult to modify my previous regex, to be close to your real text ;-))


                  So again, let’s begin with this short example :

                   1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                    <Output_2> 
                                                    <Output_3> 
                                                    <Output_4> 
                   1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                    <Output_2> 
                                                    <Output_3> 
                  
                  Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                  Initiated by Master from Main Office                                    Page 2
                  Messia Plains Ambulatory
                  ================================================================================
                   1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                    <Output_2> 
                   1941234 ......... 
                  

                  Using my new regex, below :

                  SEARCH (?-s)^\x20(194\d{4})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                  REPLACE ?1\1,\2,\3,\4\5\6\7

                  We get the following text :

                  1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
                  1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
                  1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
                   1941234 ......... 
                  

                  Secondly, given your list of transactions, below, of your LAST post, where I simply added :

                  • One space before the first incident number, which was missing

                  • A header block, after the 5th and 11th transactions

                   1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                    1313: State Government Security at Main 
                                                    Entry Gate 
                   1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at Main Entry Gate 
                   1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                   1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                   1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                  
                  Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                  Initiated by Master from Main Office                                    Page 26
                  Messia Plains Ambulatory
                  ================================================================================
                   1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Main Entry Gate 
                   1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Main Entry Gate 
                   1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at ACUTE CARE AT 
                                                    HOME 
                   1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at Population Health 
                                                    Rear Entry 
                   1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at ACUTE CARE AT 
                                                    HOME 
                   1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                  
                  Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                  Initiated by Master from Main Office                                    Page 27
                  Messia Plains Ambulatory
                  ================================================================================
                   1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                   1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                    6005: Rose Gorman at Main Entry Gate 
                   1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                    6005: Rose Gorman at ACUTE CARE AT HOME 
                   1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at Main Entry Gate 
                   1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at ACUTE CARE AT HOME 
                   1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at Population Health 
                                                    Rear Entry 
                   1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                  
                  Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                  Initiated by Master from Main Office                                    Page 28
                  Messia Plains Ambulatory
                  ================================================================================
                  

                  After replacement, the new regex S/R gives :

                  1947727,Sun Nov 26 00:44:12 2017,(CH: Nov 26 00:47:14),ENTRY GRANTED to USER1313: State Government Security at MainEntry Gate
                  1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
                  1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                  1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                  1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                  1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                  1947747,Sun Nov 26 07:27:03 2017,(CH: Nov 26 07:30:07),ENTRY GRANTED to USER6123: Michelle Coghlan at Population HealthRear Entry
                  1947748,Sun Nov 26 08:08:34 2017,(CH: Nov 26 08:11:39),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                  1947749,Sun Nov 26 09:08:22 2017,(CH: Nov 26 09:11:27),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                  1947750,Sun Nov 26 09:43:14 2017,(CH: Nov 26 09:46:19),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                  1947751,Sun Nov 26 09:46:48 2017,(CH: Nov 26 09:49:53),ENTRY GRANTED to USER6005: Rose Gorman at Main Entry Gate
                  1947752,Sun Nov 26 09:49:50 2017,(CH: Nov 26 09:52:55),ENTRY GRANTED to USER6005: Rose Gorman at ACUTE CARE AT HOME
                  1947753,Sun Nov 26 09:56:07 2017,(CH: Nov 26 09:59:11),ENTRY GRANTED to USER6513: Sheela Tuazon at Main Entry Gate
                  1947754,Sun Nov 26 09:58:25 2017,(CH: Nov 26 10:01:29),ENTRY GRANTED to USER6513: Sheela Tuazon at ACUTE CARE AT HOME
                  1947755,Sun Nov 26 10:01:56 2017,(CH: Nov 26 10:05:00),ENTRY GRANTED to USER6513: Sheela Tuazon at Population HealthRear Entry
                  1947756,Sun Nov 26 10:08:53 2017,(CH: Nov 26 10:11:58),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                  

                  Voilà !

                  Quite sure that it’s what you expect to, don’t you ?

                  Cheers,

                  guy038

                  P.S. :

                  • I’ll give some explanations on my regex S/R, in a next post, if the resulting replacement is OK for you !

                  • I also assume that the last non-blank line, of your file, is, either :

                    • The line, with equal signs, of a header block, followed by a Windows line-break ( \r\n )

                    • A verbose output line, possibly followed with any range of \r, \n or \f character(s)

                  1 Reply Last reply Reply Quote 1
                  • Mark AllenM
                    Mark Allen
                    last edited by

                    Hello All!

                    Sorry All! Was away from the desk for a few days, BUT I ran it and it works fine, for the first 30 pages, then after then first 30 pages it does this

                    1949977,Tue Nov 28 11:48:52 2017,(CH: Nov 28 11:52:11),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                    
                    1949978,Tue Nov 28 11:48:55 2017,(CH: Nov 28 11:52:13),ENTRY GRANTED to USER40: Authorised Student Card 17 at CH RearEntry
                    1949979,Tue Nov 28 11:49:06 2017,(CH: Nov 28 11:52:25),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                    1949980,Tue Nov 28 11:49:14 2017,(CH: Nov 28 11:52:33),ENTRY GRANTED to USER6048: Elizabeth Apro at Basement comms Room
                    1949981,Tue Nov 28 11:49:29 2017,(CH: Nov 28 11:52:48),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                    1949982,Tue Nov 28 11:49:35 2017,(CH: Nov 28 11:52:54),ENTRY GRANTED to USER6130: Narelle Broad at OH Front Entry
                    1949983,Tue Nov 28 11:50:23 2017,(CH: Nov 28 11:53:41),ENTRY GRANTED to USER6048: Elizabeth Apro at Plant Room to CommsRoom door
                    1949984,Tue Nov 28 11:50:36 2017,(CH: Nov 28 11:53:54),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Front Entry
                    1949985,Tue Nov 28 11:50:40 2017,(CH: Nov 28 11:53:58),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                    1949986,Tue Nov 28 11:51:03 2017,(CH: Nov 28 11:54:22),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                    1949987,Tue Nov 28 11:51:09 2017,(CH: Nov 28 11:54:28),ENTRY GRANTED to USER9828: Rosalinda Whittmaack at AMH Rear Entry
                    1949988,Tue Nov 28 11:51:32 2017,(CH: Nov 28 11:54:51),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Rear Entry
                    1949989,Tue Nov 28 11:51:51 2017,(CH: Nov 28 11:55:10),ENTRY GRANTED to USER6048: Elizabeth Apro at Staff Entry GroundEast
                    1949990,Tue Nov 28 11:54:54 2017,(CH: Nov 28 11:58:13),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                    1949991,Tue Nov 28 11:55:00 2017,(CH: Nov 28 11:58:18),ENTRY GRANTED to USER6295: Rachael Downie at AMH Consult B08
                    1949992,Tue Nov 28 11:55:05 2017,(CH: Nov 28 11:58:24),ENTRY GRANTED to USER11235: Kristi Davies at Admin Entry
                    1949993,Tue Nov 28 11:55:08 2017,(CH: Nov 28 11:58:27),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                    1949994,Tue Nov 28 11:55:10 2017,(CH: Nov 28 11:58:28),ENTRY GRANTED to USER6375: Ahmed Mansour at AMH Reception
                    1949995,Tue Nov 28 11:57:23 2017,(CH: Nov 28 12:00:41),ENTRY GRANTED to USER10593: Emma Phelan at AMH Rear Entry
                    1949996,Tue Nov 28 11:57:47 2017,(CH: Nov 28 12:01:06),ENTRY GRANTED to USER6127: Daniel Nguyen at AMH Reception
                    1949997,Tue Nov 28 11:57:49 2017,(CH: Nov 28 12:01:07),ENTRY GRANTED to USER10593: Emma Phelan at AMH Clinic B15
                    1949998,Tue Nov 28 11:57:56 2017,(CH: Nov 28 12:01:15),ENTRY GRANTED to USER7656: Angela McCoy at A & P Rear Entry
                     1949999 Tue Nov 28 11:58:30 2017 (CH: Nov 28 12:01:48)ENTRY GRANTED to USER 
                                                      7656: Angela McCoy at Admin Entry 
                     1950000 Tue Nov 28 11:58:43 2017 (CH: Nov 28 12:02:01)ENTRY GRANTED to USER 
                                                      9886: Janet Meade at Main Entry Gate 
                     1950001 Tue Nov 28 11:58:59 2017 (CH: Nov 28 12:02:17)ENTRY GRANTED to USER 
                                                      10041: Christina Davidson at A & P Foyer 
                                                      Entry 
                     1950002 Tue Nov 28 11:59:06 2017 (CH: Nov 28 12:02:24)ENTRY GRANTED to USER 
                                                      7656: Angela McCoy at Gym Internal Entry 
                    
                     1950003 Tue Nov 28 11:59:22 2017 (CH: Nov 28 12:02:40)ENTRY GRANTED to USER 
                                                      10041: Christina Davidson at A & P Rear 
                                                      Entry 
                     1950004 Tue Nov 28 12:00:21 2017 (CH: Nov 28 12:03:39)ENTRY GRANTED to USER 
                                                      9886: Janet Meade at Staff Entry Ground East 
                     1950005 Tue Nov 28 12:00:40 2017 (CH: Nov 28 12:03:58)ENTRY GRANTED to USER 
                                                      6296: Peta Picton at Admin Entry 
                     1950006 Tue Nov 28 12:00:52 2017 (CH: Nov 28 12:04:10)ENTRY GRANTED to USER 
                                                      6048: Elizabeth Apro at CH Rear Entry 
                     1950007 Tue Nov 28 12:01:29 2017 (CH: Nov 28 12:04:47)ENTRY GRANTED to USER 
                                                      6296: Peta Picton at A & P Foyer Entry 
                     1950008 Tue Nov 28 12:01:42 2017 (CH: Nov 28 12:05:00)ENTRY GRANTED to USER 
                                                      10184: Jasmin-marie Lim at Gym Internal 
                                                      Entry 
                     1950009 Tue Nov 28 12:02:02 2017 (CH: Nov 28 12:05:20)ENTRY GRANTED to USER 
                                                      10184: Jasmin-marie Lim at A & P Foyer Entry 
                     1950010 Tue Nov 28 12:02:27 2017 (CH: Nov 28 12:05:45)ENTRY GRANTED to USER 
                                                      6016: Rhonda Rainbird at Conference External 
                                                      Entry 
                     1950011 Tue Nov 28 12:02:29 2017 (CH: Nov 28 12:05:47)ENTRY GRANTED to USER 
                                                      10184: Jasmin-marie Lim at A & P Rear Entry 
                     1950012 Tue Nov 28 12:02:35 2017 (CH: Nov 28 12:05:53)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
                     1950013 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                      6016: Rhonda Rainbird at OH Front Entry 
                     1950014 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                      6286: Laura Fitzalan at CH Rear Entry 
                     1950015 Tue Nov 28 12:03:05 2017 (CH: Nov 28 12:06:24)ENTRY GRANTED to USER 
                                                      6048: Elizabeth Apro at Admin Entry 
                     1950016 Tue Nov 28 12:03:10 2017 (CH: Nov 28 12:06:29)ENTRY GRANTED to USER 
                                                      6103: Helen McGregor at Staff Entry Ground 
                                                      East 
                     1950017 Tue Nov 28 12:03:41 2017 (CH: Nov 28 12:06:59)ENTRY GRANTED to USER 
                                                      7656: Angela McCoy at Gym Internal Entry 
                     1950018 Tue Nov 28 12:03:57 2017 (CH: Nov 28 12:07:16)ENTRY GRANTED to USER 
                                                      10041: Christina Davidson at Staff Entry 
                                                      Ground East 
                     1950019 Tue Nov 28 12:04:12 2017 (CH: Nov 28 12:07:31)ENTRY GRANTED to USER 
                                                      10041: Christina Davidson at Basement comms 
                                                      Room 
                    

                    [Output 1(https://ibb.co/cNXgexl)

                    Repeating the process doesnt yield any changes, changing

                    (?-s)^\x20(194\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                    to

                    (?-s)^\x20(195\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                    For what I am guessing is it so filter the 195(next 4 digits) normally blanks the rest of the document and you end up with a single line of text somtimes :(

                    I copied 10 pages from the 1950001 to 195xxxx and re-ran the expression, with the 194 changed to 195, and it would normally only change the last line of the page, sometimes get two lines merged into one line

                    SO CLOSE! Any thoughts? I could email you the source file if you like?

                    Full Source

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @mark-allen, @scott-sumner and All,

                      Ah, sorry ! I was really wrong about the Incident Number description… I should have read carefully, because you said, in your first post :

                      Field 1 - Incident Number (194xxxx —> Onwards)

                      And also :

                      I have a report about 2 000 000 Lines long,

                      So as each transaction is, at least, split in two lines, this means that your file could contain, roughly, 1,000,000 transactions, or a bit less !. As the first number is 1940000 => The last Incident Number should be nearby 2940000

                      So I changed the regex, matching the Incident Number, to [12]\d{6}). Note that this regex definition, of an Incident Number, is present, twice, in the overall regex :

                      • At its beginning (?-s)^\x20([12]\d{6})\x20(.+)\x20......

                      • In the look-ahead part (?=\Z|\R(\f|\x20[12]\d{6}\x20))

                      I, also, changed the regex matching the Header, to (?s)\f.*?===\R\R, as there is a pure blank line, after the ===== line !


                      Now, the last version of the regex is :

                      SEARCH (?-s)^\x20([12]\d{6})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20[12]\d{6}\x20))|(?s)\f.*?===\R\R

                      REPLACE ?1\1,\2,\3,\4\5\6\7

                      Running that regex S/R against your attached file out.txt ( 14,399,825 bytes and 202,315 lines ) left me with the expected comma separated file, of 9,698,482 bytes, containing 82,068 lines

                      Voilà :-))


                      Here is, below, some notes about this regex S/R :

                      (?-s)                                       #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                      
                      ^\x20([12]\d{6})\x20                        #  An 'Incident Number' ([12]\d{6}), as Group 1, PREDEDED and FOLLOWED with a SPACE character \x20
                      
                      (.+)\x20                                    #  The Date and Time (.+), as Group 2, which ENDS with a SPACE character \x20
                      
                      (\(.+\))                                    #  The 'Access Control Time' (\(.+\)), as Group 3, surrounded with PARENTHESES, which are ESCAPED to be considered as LITTERALS
                      
                      (.+)\x20\R                                  #  The FIRST 'Verbose Output' zone (.+), as Group 4, FOLLOWED with a SPACE character \x20 and a LINE-BREAK \R
                      
                      \x20+(.+)\x20                               #  The SECOND 'Verbose Output' zone (.+), as Group 5, NEXT line, PRECEDED with SPACE characters \x20+ and FOLLOWED with a SPACE character \x20
                      
                      (?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??  #  The part, between the OUTER parentheses, (?: and )??, may be ASBENT or PRESENT, with priority to ABSENCE
                                                                  #    It represents the possible THIRD and FOURTH 'Verbose Output' zones (.+), as Groups 6 and 7 :
                                                                  #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                  #        - FOLLOWED with a SPACE character \x20
                                                                  #  
                                                                  #  Again, the part, between the INNER parentheses, (?: and )??, may be ABSENT or PRESENT, with priority to ABSENCE
                                                                  #    It represents the possible FOURTH 'Verbose Output' zone (.+), as Group 7 :
                                                                  #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                  #        - FOLLOWED with a SPACE character \x20
                      
                      (?=\Z|\R(\f|\x20[12]\d{6}\x20))             #  IF followed with, EITHER :  ( LOOK-AHEAD syntax )
                                                                  #      - \Z that is to say SOME \r,\n or \f characters, at the VERY END of file
                                                                  #      - a LINE-BREAK \R, followed with, EITHER :
                                                                  #          - A CONTROL character \f, beginning a HEADER block
                                                                  #          - An INCIDENT number, beginning with '1' or '2', [12]\d{6}, PRECEDED and FOLLOWED with a SPACE character \x20
                      
                      |                                           #  OR
                      
                      (?s)                                        #  The DOT regex character matches, ABSOLUTELY, ANY character
                                                                         
                      \f.*?===\R\R                                #   The SHORTEST range of ANY character *.?, EVEN on SEVERAL lines, between :
                                                                  #       - A FORM-FEED character \f
                                                                  #       - THREE EQUAL signs, FOLLOWED with 2 LINE-BREAKS \R\R
                                                                  #  
                                                                  #   => The COMPLETE HEADER block, with LINE-BREAKS
                      

                      To end with, here is this same regex S/R, displayed in Free-spacing mode, with a minimum of comments ! You may run this syntax, as well. Just select all the text, below, and paste it, in the Find what: zone of the Replace dialog :-))

                      (?x)                                            #  FRE-SPACING regex mode
                                                                      #
                                                                      #  IMPORTANT : In this mode, any \R syntax must be REPLACED with \r\n 
                                                                      #  ---------
                      
                      (?-s)                                           #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                                                                      
                      ^\x20([12]\d{6})\x20                            #  Group 1        ( Incident Number )
                                                                      
                      (.+)\x20                                        #  Group 2        ( Date and Time )
                                                                      
                      (\(.+\))                                        #  Group 3        ( Access Control Time )
                                                                      
                      (.+)\x20\r\n                                    #  Group 4        ( 1ST Verbose Output )
                                                                      
                      \x20+(.+)\x20                                   #  Group 5        ( 2ND Verbose Output )
                      
                      (?:\r\n\x20+(.+)\x20(?:\r\n\x20+(.+)\x20)??)??  #  Group 6 and 7  ( 3RD and 4TH Verbose Output )
                      
                      (?=\Z|\r\n(\f|\x20[12]\d{6}\x20))               #  IF the VERY END of file or IF a LINE-BREAK followed with an INCIDENT NUMBER or a HEADER
                      
                      |                                               #  OR
                      
                      (?s)                                            #  The DOT regex character matches, ABSOLUTELY, ANY character
                      
                      \f.*?===\r\n\r\n                                #  The COMPLETE HEADER block, with its LINE-BREAKS
                      

                      Remember that the Find what: zone cannot store more than 2046 characters !

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Mark AllenM
                        Mark Allen
                        last edited by

                        Mate

                        Thank you so so so Much for your help with this, I truly mean that, it worked perfectly, If its okay, I am going to distribute this way to the supplier of the product so we can use it to make better reports from this program!!!

                        Truly Thank you heaps, and thanks for Explaination as well, I should be able to use this in the future to write up my own rules…

                        Again thanks heaps!!!

                        Mark

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors