Community
    • Login

    Wanting to Format 2000000 Lines of Text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    formatting text
    14 Posts 3 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mark AllenM
      Mark Allen
      last edited by

      Essentially I would like to find a way to remove all the mutliple lines that belong to the same lines data, so that its easier to read for the client, and they can import it directly into an excel

      eg current if you where to pull it into excel you get ( For : 1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME)

      (The output from the TXT File is as follows

      1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
      6123: Michelle Coghlan at ACUTE CARE AT HOME
      )

      Line 1: Cell 1 Cell 2 Cell 3 Cell 4
      Line 2: Cell 5
      Line 3: Cell 6 (sometimes)
      Line 4: Cell 7 (Sometimes) (Depends on the length of the verbose output and how many characters it has.
      Example
      Line 1: Cell 1 1947729 Cell 2 Sun Nov 26 06:59:32 2017 Cell 3 (CH: Nov 26 07:02:35) Cell 4 ENTRY GRANTED to USER6
      Line 2: Cell 5 123: Michelle Coghlan at ACUTE CARE AT HOME

      Now when I pull the raw txt file into excel I get lines like this

      Screen Capture 1

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @Mark Allen
        last edited by

        @Mark-Allen

        If I were you, I’d do it in several steps.

        First, remove your page header. Since you didn’t provide any detail, you’re on your own on that. Hopefully you know what to do with this.

        Second, combine your multi-line data into a single line. Basically what you want to do here is to look at how a line starts and if it doesn’t start with a 7-digit number followed by a day of the week, then you want to combine that line with the following line. Here’s what I’d do:

        Find what zone: (?-is)^(\d{7} (?:Sun|Mon|Tue).+)\R(?!\d{7} (?:Sun|Mon|Tue))(.+)
        Replace with zone: \1\2
        Search mode: Regular expression
        Wrap around: ticked
        Action: Press Replace All button (do this multiple times, I guess up to 4 or 5 based upon your original problem statement, until all of the broken lines are combined

        Note: I left out the remaining days of the week, you can easily fill those in between | characters. Exercise for the reader.

        Third, reformat the data for Excel purposes. That part isn’t so hard, so again, left as exercise for the reader.

        Mark AllenM 1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @mark-allen, and All,

          This morning ( in France ), I succeeded to find a correct regex which should do what your want to. But, I was away from home, this afternoon ! This regex changes each multi-lines transaction into a single line transaction and, in addition, separates the different fields with commas ;-))

          Of course, I needed some time to figure out all the problem ! And, as Scott said, in his last post, we need to know the structure of your page header in order to build the correct final regex ! In the meanwhile, I supposed that your file does not contain any header

          In your first post, yous said :

          the outputted text, can have 1/2/3/4 lines per incident

          and

          Field 5 - The verbose output from the transaction, which is split over 1/2/3 lines (sometimes 4)

          So, I imagined the simple test text , below, that you’ll paste in a new tab :

          1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
          <Output_2>
          <Output_3>
          <Output_4>
          1941234 .........
          
          1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
          <Output_2>
          <Output_3>
          1941234 .........
          
          1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1>
          <Output_2>
          1941234 .........
          

          As you can notice, I supposed that the verbose output is always split in, at least, 2 lines, and sometimes in 3 or 4 lines ( Hence, no verbose output in a single line only ! ). If I’m wrong about it, just tell in order to modify the regex ;-)


          Now, open the Replace dialog

          Insert the regex (?-s)(^194\d{4})\x20(.+)\x20(\(.+\))(.+)\R(.+)(?:\R(.+)(?:\R(.+))??)??(?=\R194\d{4}\x20|\Z) , in the Find what: zone

          Insert the regex \1,\2,\3,\4\5\6\7 , in the Replace with: zone

          Tick the Wrap around option

          Click, once, on the Replace All button or several times on the Replace button

          We get the following text :

          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
          1941234 .........
          
          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
          1941234 .........
          
          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
          1941234 .........
          

          So the important question is : Do you expect such a text ?


          Secondly, using your list of transactions, also given in your first post :

          1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
          6123: Michelle Coghlan at Main Entry Gate
          1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at Main Entry Gate
          1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at Main Entry Gate
          1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at ACUTE CARE AT
          HOME
          

          This same regex S/R produced :

          1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
          1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
          1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
          

          May be, you noticed that the last line, after replacement, ends with at ACUTE CARE ATHOME, where the words AT and HOME are attached. Quite logical as there is, only, a line-break between …at ACUTE CARE AT and the last line HOME

          Then, in a new Excel page, I inserted that text as External data, indicating that the fields were comma separated and saved it as a common .xls file, without any problem :-)


          Finally, in your first post, you also said :

          I have a report about 2 000 000 Lines long,

          So, I tried to determine if this regex S/R could be done in a sensible time, without any bug or catastrophic backtracking of the N++ regex engine !

          I duplicated your 15 lines text, below :

          1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER
          6123: Michelle Coghlan at Main Entry Gate
          1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER
          6123: Michelle Coghlan at ACUTE CARE AT HOME
          1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at Main Entry Gate
          1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at Main Entry Gate
          1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER
          6512: Stephen Knynenburg at ACUTE CARE AT
          HOME
          

          68,000 times ! So, I was left with a file of 58,412,000 bytes, containing 1,020,000 lines. Then running the above regex S/R, on my old (2005 ) Windows XP laptop ( with only 1GB of RAM ! ) the 476,000 replacements took 69s, getting a file of 57,800,000 bytes, containing 476,000 lines ;-))

          Consequently, I suppose that your file with 2,000,000 lines should be, correctly, handled, with a recent laptop and OS !

          So, just tell us about your header and it’ll be only a matter of minutes to get the job done ;-))

          See you later,

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • Mark AllenM
            Mark Allen @Scott Sumner
            last edited by

            Morning All! (Or Afternoon or Evening!)

            Scott! You are correct I didnt give you the header!

            This is the header, which is static throughout, the only thing that changes is the Page #

            This is the header, I can go through and essentially remove it manually or use CTRL-H and replace the Text with a “space”

            Door User Activity Reported on Fri Mar 2 09:50:00 2018
            Initiated by Master from Main Office Page 6292
            Messia Plains Ambulatory

            I’ve tried your expression, and sadly it didnt work for me,

            In the Find What Line :(?-is)^(\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+)\R(?!\d{7} (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat))(.+)
            In the Replace with: \1\2

            Each time it has come up with Zero Lines Replaced

            I made the sample size smaller, just 1 Page, and the same issue, made it just one line, and he same issue (Zero Lines replaced), I thought perhaps i might have an outdated version, and I’ve updated to the latest version effective today (7.5.6 32 Bit) I have no plug-ins enabled or installed

            Sample Set 1
            Expression 1

            I then thought perhaps the data because it only has monday (Mon) in it, was breaking the expression (I know, I doubted it as well, but its a bit above my head, but wanted to try!)

            I removed Tuesday-Sunday, and left monday only, Still the same result.

            Monday Only Expression

            So at a bit of a loss with that one, I HAVE to be missing something there or some plig in or something for that… :(

            I’ve put up 4 pages of the text so you can see the formatting

            Sample Text

            @guy038

            Thanks for your response as well!!

            I’ve tried your expression as well, and I get the same response, it tells me that there are zero changes, I’ve hit the button many times just to ensure I am not the reason for this

            One thing I’ve noticed is that there is a Space before the first 7 digit character, will that upset things?

            Space before first 7 Digit Number

            What you have outputted is exactly what I am after

            1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
            1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
            1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
            1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
            1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
            1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
            1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME

            One Line, with comma’s that can be brought into excel nice and neat!

            I will keep playing, but any thoughts would be appreciated! and I will be sure to save this away if asked again!

            Mark

            nb. Sorry dont know the syntax to nicely put things into code box’s

            1 Reply Last reply Reply Quote 0
            • Scott SumnerS
              Scott Sumner
              last edited by Scott Sumner

              @Mark-Allen

              The simplest way to represent your data accurately here is by pasting it between a line containing ```z and a line containing ```

              Thus, if you do the following:

              ```z
              (copy and paste your real data HERE)
              ```

              It (some of what I think your real data is) will render as:

               1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                6123: Michelle Coghlan at ACUTE CARE AT 
                                                HOME
               1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                6123: Michelle Coghlan at ACUTE CARE AT 
                                                HOME
              

              And really, accurately representing your data is the only way to get good advice. Otherwise, responders are guessing, and when you try their advice you get Replace: No occurrence was found. Then when you say, “Oh, that’s not really what I meant…” and attempt to clarify, the helpers have lost interest and have moved on to more interesting pursuits…

              Note: You can also indent your entire dataset by 4 spaces and it will render like the following, but I find the above method easier (and the colors are often wacky in what follows).

               1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                6123: Michelle Coghlan at ACUTE CARE AT 
                                                HOME
               1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER
                                                6123: Michelle Coghlan at ACUTE CARE AT 
                                                HOME
              

              Also note: Screenshots aren’t going to cut it. Nobody is going to retype your data from a screenshot to test out a proposed match expression.

              Scott SumnerS 1 Reply Last reply Reply Quote 0
              • Scott SumnerS
                Scott Sumner @Scott Sumner
                last edited by

                @Mark-Allen

                I forgot to mention that your “Sample Text” link is good in that it has copyable text, but it is bad in that it doesn’t preserve the leading spaces (as shown in some of the screenshot versions) on the lines! :-(

                1 Reply Last reply Reply Quote 0
                • Mark AllenM
                  Mark Allen
                  last edited by

                  agh! Cant Win :) But this is the data

                  1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                    1313: State Government Security at Main 
                                                    Entry Gate 
                   1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at Main Entry Gate 
                   1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                   1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                   1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at ACUTE CARE AT HOME 
                   1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Main Entry Gate 
                   1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Main Entry Gate 
                   1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at ACUTE CARE AT 
                                                    HOME 
                   1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                    6123: Michelle Coghlan at Population Health 
                                                    Rear Entry 
                   1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at ACUTE CARE AT 
                                                    HOME 
                   1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                   1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                   1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                    6005: Rose Gorman at Main Entry Gate 
                   1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                    6005: Rose Gorman at ACUTE CARE AT HOME 
                   1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at Main Entry Gate 
                   1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at ACUTE CARE AT HOME 
                   1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                    6513: Sheela Tuazon at Population Health 
                                                    Rear Entry 
                   1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                    6512: Stephen Knynenburg at Population 
                                                    Health Rear Entry 
                  
                  Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                  Initiated by Master from Main Office                                    Page 2
                  Messia Plains Ambulatory
                  ================================================================================
                  
                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @mark-allen, @scott-sumner and All,

                    Ah ! Of course, it’s easy to understand why my regex could not match any string ;-)) Indeed, when regular expressions are involved, even a single space, added or deleted in text, is of importance !

                    From your recent picture, I noticed four particularities, in your list :

                    • Any incident number is separated from beginning of line by a space character

                    • The verbose output zones are, both :

                      • Preceded by 1 or more space character(s)

                      • Followed by 1 space character

                    • Each header block :

                      • Begins with the Form Feed character ( the ASCII page-breaking control character, of Unicode value \x{000C}, which can be matched, in regular expressions, with the \f syntax

                      • Ends with several equal signs ( = )

                    Refer to https://en.wikipedia.org/wiki/Page_break#Form_feed

                    With all these elements, it’s not difficult to modify my previous regex, to be close to your real text ;-))


                    So again, let’s begin with this short example :

                     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                      <Output_2> 
                                                      <Output_3> 
                                                      <Output_4> 
                     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                      <Output_2> 
                                                      <Output_3> 
                    
                    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                    Initiated by Master from Main Office                                    Page 2
                    Messia Plains Ambulatory
                    ================================================================================
                     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)<Output_1> 
                                                      <Output_2> 
                     1941234 ......... 
                    

                    Using my new regex, below :

                    SEARCH (?-s)^\x20(194\d{4})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                    REPLACE ?1\1,\2,\3,\4\5\6\7

                    We get the following text :

                    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3><Output_4>
                    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2><Output_3>
                    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),<Output_1><Output_2>
                     1941234 ......... 
                    

                    Secondly, given your list of transactions, below, of your LAST post, where I simply added :

                    • One space before the first incident number, which was missing

                    • A header block, after the 5th and 11th transactions

                     1947727 Sun Nov 26 00:44:12 2017 (CH: Nov 26 00:47:14)ENTRY GRANTED to USER 
                                                      1313: State Government Security at Main 
                                                      Entry Gate 
                     1947728 Sun Nov 26 06:57:54 2017 (CH: Nov 26 07:00:57)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at Main Entry Gate 
                     1947729 Sun Nov 26 06:59:32 2017 (CH: Nov 26 07:02:35)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
                     1947730 Sun Nov 26 06:59:33 2017 (CH: Nov 26 07:02:36)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
                     1947743 Sun Nov 26 06:59:42 2017 (CH: Nov 26 07:02:38)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at ACUTE CARE AT HOME 
                    
                    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                    Initiated by Master from Main Office                                    Page 26
                    Messia Plains Ambulatory
                    ================================================================================
                     1947744 Sun Nov 26 07:00:38 2017 (CH: Nov 26 07:03:42)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at Main Entry Gate 
                     1947745 Sun Nov 26 07:00:39 2017 (CH: Nov 26 07:03:43)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at Main Entry Gate 
                     1947746 Sun Nov 26 07:01:58 2017 (CH: Nov 26 07:05:02)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                                      HOME 
                     1947747 Sun Nov 26 07:27:03 2017 (CH: Nov 26 07:30:07)ENTRY GRANTED to USER 
                                                      6123: Michelle Coghlan at Population Health 
                                                      Rear Entry 
                     1947748 Sun Nov 26 08:08:34 2017 (CH: Nov 26 08:11:39)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at ACUTE CARE AT 
                                                      HOME 
                     1947749 Sun Nov 26 09:08:22 2017 (CH: Nov 26 09:11:27)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at Population 
                                                      Health Rear Entry 
                    
                    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                    Initiated by Master from Main Office                                    Page 27
                    Messia Plains Ambulatory
                    ================================================================================
                     1947750 Sun Nov 26 09:43:14 2017 (CH: Nov 26 09:46:19)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at Population 
                                                      Health Rear Entry 
                     1947751 Sun Nov 26 09:46:48 2017 (CH: Nov 26 09:49:53)ENTRY GRANTED to USER 
                                                      6005: Rose Gorman at Main Entry Gate 
                     1947752 Sun Nov 26 09:49:50 2017 (CH: Nov 26 09:52:55)ENTRY GRANTED to USER 
                                                      6005: Rose Gorman at ACUTE CARE AT HOME 
                     1947753 Sun Nov 26 09:56:07 2017 (CH: Nov 26 09:59:11)ENTRY GRANTED to USER 
                                                      6513: Sheela Tuazon at Main Entry Gate 
                     1947754 Sun Nov 26 09:58:25 2017 (CH: Nov 26 10:01:29)ENTRY GRANTED to USER 
                                                      6513: Sheela Tuazon at ACUTE CARE AT HOME 
                     1947755 Sun Nov 26 10:01:56 2017 (CH: Nov 26 10:05:00)ENTRY GRANTED to USER 
                                                      6513: Sheela Tuazon at Population Health 
                                                      Rear Entry 
                     1947756 Sun Nov 26 10:08:53 2017 (CH: Nov 26 10:11:58)ENTRY GRANTED to USER 
                                                      6512: Stephen Knynenburg at Population 
                                                      Health Rear Entry 
                    
                    Door User Activity                        Reported on Fri Mar  2 09:50:00 2018
                    Initiated by Master from Main Office                                    Page 28
                    Messia Plains Ambulatory
                    ================================================================================
                    

                    After replacement, the new regex S/R gives :

                    1947727,Sun Nov 26 00:44:12 2017,(CH: Nov 26 00:47:14),ENTRY GRANTED to USER1313: State Government Security at MainEntry Gate
                    1947728,Sun Nov 26 06:57:54 2017,(CH: Nov 26 07:00:57),ENTRY GRANTED to USER6123: Michelle Coghlan at Main Entry Gate
                    1947729,Sun Nov 26 06:59:32 2017,(CH: Nov 26 07:02:35),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                    1947730,Sun Nov 26 06:59:33 2017,(CH: Nov 26 07:02:36),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                    1947743,Sun Nov 26 06:59:42 2017,(CH: Nov 26 07:02:38),ENTRY GRANTED to USER6123: Michelle Coghlan at ACUTE CARE AT HOME
                    1947744,Sun Nov 26 07:00:38 2017,(CH: Nov 26 07:03:42),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                    1947745,Sun Nov 26 07:00:39 2017,(CH: Nov 26 07:03:43),ENTRY GRANTED to USER6512: Stephen Knynenburg at Main Entry Gate
                    1947746,Sun Nov 26 07:01:58 2017,(CH: Nov 26 07:05:02),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                    1947747,Sun Nov 26 07:27:03 2017,(CH: Nov 26 07:30:07),ENTRY GRANTED to USER6123: Michelle Coghlan at Population HealthRear Entry
                    1947748,Sun Nov 26 08:08:34 2017,(CH: Nov 26 08:11:39),ENTRY GRANTED to USER6512: Stephen Knynenburg at ACUTE CARE ATHOME
                    1947749,Sun Nov 26 09:08:22 2017,(CH: Nov 26 09:11:27),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                    1947750,Sun Nov 26 09:43:14 2017,(CH: Nov 26 09:46:19),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                    1947751,Sun Nov 26 09:46:48 2017,(CH: Nov 26 09:49:53),ENTRY GRANTED to USER6005: Rose Gorman at Main Entry Gate
                    1947752,Sun Nov 26 09:49:50 2017,(CH: Nov 26 09:52:55),ENTRY GRANTED to USER6005: Rose Gorman at ACUTE CARE AT HOME
                    1947753,Sun Nov 26 09:56:07 2017,(CH: Nov 26 09:59:11),ENTRY GRANTED to USER6513: Sheela Tuazon at Main Entry Gate
                    1947754,Sun Nov 26 09:58:25 2017,(CH: Nov 26 10:01:29),ENTRY GRANTED to USER6513: Sheela Tuazon at ACUTE CARE AT HOME
                    1947755,Sun Nov 26 10:01:56 2017,(CH: Nov 26 10:05:00),ENTRY GRANTED to USER6513: Sheela Tuazon at Population HealthRear Entry
                    1947756,Sun Nov 26 10:08:53 2017,(CH: Nov 26 10:11:58),ENTRY GRANTED to USER6512: Stephen Knynenburg at PopulationHealth Rear Entry
                    

                    Voilà !

                    Quite sure that it’s what you expect to, don’t you ?

                    Cheers,

                    guy038

                    P.S. :

                    • I’ll give some explanations on my regex S/R, in a next post, if the resulting replacement is OK for you !

                    • I also assume that the last non-blank line, of your file, is, either :

                      • The line, with equal signs, of a header block, followed by a Windows line-break ( \r\n )

                      • A verbose output line, possibly followed with any range of \r, \n or \f character(s)

                    1 Reply Last reply Reply Quote 1
                    • Mark AllenM
                      Mark Allen
                      last edited by

                      Hello All!

                      Sorry All! Was away from the desk for a few days, BUT I ran it and it works fine, for the first 30 pages, then after then first 30 pages it does this

                      1949977,Tue Nov 28 11:48:52 2017,(CH: Nov 28 11:52:11),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                      
                      1949978,Tue Nov 28 11:48:55 2017,(CH: Nov 28 11:52:13),ENTRY GRANTED to USER40: Authorised Student Card 17 at CH RearEntry
                      1949979,Tue Nov 28 11:49:06 2017,(CH: Nov 28 11:52:25),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                      1949980,Tue Nov 28 11:49:14 2017,(CH: Nov 28 11:52:33),ENTRY GRANTED to USER6048: Elizabeth Apro at Basement comms Room
                      1949981,Tue Nov 28 11:49:29 2017,(CH: Nov 28 11:52:48),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                      1949982,Tue Nov 28 11:49:35 2017,(CH: Nov 28 11:52:54),ENTRY GRANTED to USER6130: Narelle Broad at OH Front Entry
                      1949983,Tue Nov 28 11:50:23 2017,(CH: Nov 28 11:53:41),ENTRY GRANTED to USER6048: Elizabeth Apro at Plant Room to CommsRoom door
                      1949984,Tue Nov 28 11:50:36 2017,(CH: Nov 28 11:53:54),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Front Entry
                      1949985,Tue Nov 28 11:50:40 2017,(CH: Nov 28 11:53:58),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Reception
                      1949986,Tue Nov 28 11:51:03 2017,(CH: Nov 28 11:54:22),ENTRY GRANTED to USER10885: Anne Volling at CH Rear Entry
                      1949987,Tue Nov 28 11:51:09 2017,(CH: Nov 28 11:54:28),ENTRY GRANTED to USER9828: Rosalinda Whittmaack at AMH Rear Entry
                      1949988,Tue Nov 28 11:51:32 2017,(CH: Nov 28 11:54:51),ENTRY GRANTED to USER5990: Tracy Brighouse at AMH Rear Entry
                      1949989,Tue Nov 28 11:51:51 2017,(CH: Nov 28 11:55:10),ENTRY GRANTED to USER6048: Elizabeth Apro at Staff Entry GroundEast
                      1949990,Tue Nov 28 11:54:54 2017,(CH: Nov 28 11:58:13),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                      1949991,Tue Nov 28 11:55:00 2017,(CH: Nov 28 11:58:18),ENTRY GRANTED to USER6295: Rachael Downie at AMH Consult B08
                      1949992,Tue Nov 28 11:55:05 2017,(CH: Nov 28 11:58:24),ENTRY GRANTED to USER11235: Kristi Davies at Admin Entry
                      1949993,Tue Nov 28 11:55:08 2017,(CH: Nov 28 11:58:27),ENTRY GRANTED to USER6295: Rachael Downie at AMH Rear Entry
                      1949994,Tue Nov 28 11:55:10 2017,(CH: Nov 28 11:58:28),ENTRY GRANTED to USER6375: Ahmed Mansour at AMH Reception
                      1949995,Tue Nov 28 11:57:23 2017,(CH: Nov 28 12:00:41),ENTRY GRANTED to USER10593: Emma Phelan at AMH Rear Entry
                      1949996,Tue Nov 28 11:57:47 2017,(CH: Nov 28 12:01:06),ENTRY GRANTED to USER6127: Daniel Nguyen at AMH Reception
                      1949997,Tue Nov 28 11:57:49 2017,(CH: Nov 28 12:01:07),ENTRY GRANTED to USER10593: Emma Phelan at AMH Clinic B15
                      1949998,Tue Nov 28 11:57:56 2017,(CH: Nov 28 12:01:15),ENTRY GRANTED to USER7656: Angela McCoy at A & P Rear Entry
                       1949999 Tue Nov 28 11:58:30 2017 (CH: Nov 28 12:01:48)ENTRY GRANTED to USER 
                                                        7656: Angela McCoy at Admin Entry 
                       1950000 Tue Nov 28 11:58:43 2017 (CH: Nov 28 12:02:01)ENTRY GRANTED to USER 
                                                        9886: Janet Meade at Main Entry Gate 
                       1950001 Tue Nov 28 11:58:59 2017 (CH: Nov 28 12:02:17)ENTRY GRANTED to USER 
                                                        10041: Christina Davidson at A & P Foyer 
                                                        Entry 
                       1950002 Tue Nov 28 11:59:06 2017 (CH: Nov 28 12:02:24)ENTRY GRANTED to USER 
                                                        7656: Angela McCoy at Gym Internal Entry 
                      
                       1950003 Tue Nov 28 11:59:22 2017 (CH: Nov 28 12:02:40)ENTRY GRANTED to USER 
                                                        10041: Christina Davidson at A & P Rear 
                                                        Entry 
                       1950004 Tue Nov 28 12:00:21 2017 (CH: Nov 28 12:03:39)ENTRY GRANTED to USER 
                                                        9886: Janet Meade at Staff Entry Ground East 
                       1950005 Tue Nov 28 12:00:40 2017 (CH: Nov 28 12:03:58)ENTRY GRANTED to USER 
                                                        6296: Peta Picton at Admin Entry 
                       1950006 Tue Nov 28 12:00:52 2017 (CH: Nov 28 12:04:10)ENTRY GRANTED to USER 
                                                        6048: Elizabeth Apro at CH Rear Entry 
                       1950007 Tue Nov 28 12:01:29 2017 (CH: Nov 28 12:04:47)ENTRY GRANTED to USER 
                                                        6296: Peta Picton at A & P Foyer Entry 
                       1950008 Tue Nov 28 12:01:42 2017 (CH: Nov 28 12:05:00)ENTRY GRANTED to USER 
                                                        10184: Jasmin-marie Lim at Gym Internal 
                                                        Entry 
                       1950009 Tue Nov 28 12:02:02 2017 (CH: Nov 28 12:05:20)ENTRY GRANTED to USER 
                                                        10184: Jasmin-marie Lim at A & P Foyer Entry 
                       1950010 Tue Nov 28 12:02:27 2017 (CH: Nov 28 12:05:45)ENTRY GRANTED to USER 
                                                        6016: Rhonda Rainbird at Conference External 
                                                        Entry 
                       1950011 Tue Nov 28 12:02:29 2017 (CH: Nov 28 12:05:47)ENTRY GRANTED to USER 
                                                        10184: Jasmin-marie Lim at A & P Rear Entry 
                       1950012 Tue Nov 28 12:02:35 2017 (CH: Nov 28 12:05:53)ENTRY GRANTED to USER 
                                                        6123: Michelle Coghlan at ACUTE CARE AT HOME 
                       1950013 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                        6016: Rhonda Rainbird at OH Front Entry 
                       1950014 Tue Nov 28 12:02:52 2017 (CH: Nov 28 12:06:11)ENTRY GRANTED to USER 
                                                        6286: Laura Fitzalan at CH Rear Entry 
                       1950015 Tue Nov 28 12:03:05 2017 (CH: Nov 28 12:06:24)ENTRY GRANTED to USER 
                                                        6048: Elizabeth Apro at Admin Entry 
                       1950016 Tue Nov 28 12:03:10 2017 (CH: Nov 28 12:06:29)ENTRY GRANTED to USER 
                                                        6103: Helen McGregor at Staff Entry Ground 
                                                        East 
                       1950017 Tue Nov 28 12:03:41 2017 (CH: Nov 28 12:06:59)ENTRY GRANTED to USER 
                                                        7656: Angela McCoy at Gym Internal Entry 
                       1950018 Tue Nov 28 12:03:57 2017 (CH: Nov 28 12:07:16)ENTRY GRANTED to USER 
                                                        10041: Christina Davidson at Staff Entry 
                                                        Ground East 
                       1950019 Tue Nov 28 12:04:12 2017 (CH: Nov 28 12:07:31)ENTRY GRANTED to USER 
                                                        10041: Christina Davidson at Basement comms 
                                                        Room 
                      

                      [Output 1(https://ibb.co/cNXgexl)

                      Repeating the process doesnt yield any changes, changing

                      (?-s)^\x20(194\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                      to

                      (?-s)^\x20(195\d{4})\x20(.+)\x20((.+))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20194\d{4}\x20))|(?s)\f.*?===\R

                      For what I am guessing is it so filter the 195(next 4 digits) normally blanks the rest of the document and you end up with a single line of text somtimes :(

                      I copied 10 pages from the 1950001 to 195xxxx and re-ran the expression, with the 194 changed to 195, and it would normally only change the last line of the page, sometimes get two lines merged into one line

                      SO CLOSE! Any thoughts? I could email you the source file if you like?

                      Full Source

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @mark-allen, @scott-sumner and All,

                        Ah, sorry ! I was really wrong about the Incident Number description… I should have read carefully, because you said, in your first post :

                        Field 1 - Incident Number (194xxxx —> Onwards)

                        And also :

                        I have a report about 2 000 000 Lines long,

                        So as each transaction is, at least, split in two lines, this means that your file could contain, roughly, 1,000,000 transactions, or a bit less !. As the first number is 1940000 => The last Incident Number should be nearby 2940000

                        So I changed the regex, matching the Incident Number, to [12]\d{6}). Note that this regex definition, of an Incident Number, is present, twice, in the overall regex :

                        • At its beginning (?-s)^\x20([12]\d{6})\x20(.+)\x20......

                        • In the look-ahead part (?=\Z|\R(\f|\x20[12]\d{6}\x20))

                        I, also, changed the regex matching the Header, to (?s)\f.*?===\R\R, as there is a pure blank line, after the ===== line !


                        Now, the last version of the regex is :

                        SEARCH (?-s)^\x20([12]\d{6})\x20(.+)\x20(\(.+\))(.+)\x20\R\x20+(.+)\x20(?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??(?=\Z|\R(\f|\x20[12]\d{6}\x20))|(?s)\f.*?===\R\R

                        REPLACE ?1\1,\2,\3,\4\5\6\7

                        Running that regex S/R against your attached file out.txt ( 14,399,825 bytes and 202,315 lines ) left me with the expected comma separated file, of 9,698,482 bytes, containing 82,068 lines

                        Voilà :-))


                        Here is, below, some notes about this regex S/R :

                        (?-s)                                       #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                        
                        ^\x20([12]\d{6})\x20                        #  An 'Incident Number' ([12]\d{6}), as Group 1, PREDEDED and FOLLOWED with a SPACE character \x20
                        
                        (.+)\x20                                    #  The Date and Time (.+), as Group 2, which ENDS with a SPACE character \x20
                        
                        (\(.+\))                                    #  The 'Access Control Time' (\(.+\)), as Group 3, surrounded with PARENTHESES, which are ESCAPED to be considered as LITTERALS
                        
                        (.+)\x20\R                                  #  The FIRST 'Verbose Output' zone (.+), as Group 4, FOLLOWED with a SPACE character \x20 and a LINE-BREAK \R
                        
                        \x20+(.+)\x20                               #  The SECOND 'Verbose Output' zone (.+), as Group 5, NEXT line, PRECEDED with SPACE characters \x20+ and FOLLOWED with a SPACE character \x20
                        
                        (?:\R\x20+(.+)\x20(?:\R\x20+(.+)\x20)??)??  #  The part, between the OUTER parentheses, (?: and )??, may be ASBENT or PRESENT, with priority to ABSENCE
                                                                    #    It represents the possible THIRD and FOURTH 'Verbose Output' zones (.+), as Groups 6 and 7 :
                                                                    #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                    #        - FOLLOWED with a SPACE character \x20
                                                                    #  
                                                                    #  Again, the part, between the INNER parentheses, (?: and )??, may be ABSENT or PRESENT, with priority to ABSENCE
                                                                    #    It represents the possible FOURTH 'Verbose Output' zone (.+), as Group 7 :
                                                                    #        - PRECEDED with a LINE-BREAK \R and some SPACE characters \x20+
                                                                    #        - FOLLOWED with a SPACE character \x20
                        
                        (?=\Z|\R(\f|\x20[12]\d{6}\x20))             #  IF followed with, EITHER :  ( LOOK-AHEAD syntax )
                                                                    #      - \Z that is to say SOME \r,\n or \f characters, at the VERY END of file
                                                                    #      - a LINE-BREAK \R, followed with, EITHER :
                                                                    #          - A CONTROL character \f, beginning a HEADER block
                                                                    #          - An INCIDENT number, beginning with '1' or '2', [12]\d{6}, PRECEDED and FOLLOWED with a SPACE character \x20
                        
                        |                                           #  OR
                        
                        (?s)                                        #  The DOT regex character matches, ABSOLUTELY, ANY character
                                                                           
                        \f.*?===\R\R                                #   The SHORTEST range of ANY character *.?, EVEN on SEVERAL lines, between :
                                                                    #       - A FORM-FEED character \f
                                                                    #       - THREE EQUAL signs, FOLLOWED with 2 LINE-BREAKS \R\R
                                                                    #  
                                                                    #   => The COMPLETE HEADER block, with LINE-BREAKS
                        

                        To end with, here is this same regex S/R, displayed in Free-spacing mode, with a minimum of comments ! You may run this syntax, as well. Just select all the text, below, and paste it, in the Find what: zone of the Replace dialog :-))

                        (?x)                                            #  FRE-SPACING regex mode
                                                                        #
                                                                        #  IMPORTANT : In this mode, any \R syntax must be REPLACED with \r\n 
                                                                        #  ---------
                        
                        (?-s)                                           #  The DOT regex character matches ONLY a SINGLE STANDARD character ( NOT the EOL ones )
                                                                        
                        ^\x20([12]\d{6})\x20                            #  Group 1        ( Incident Number )
                                                                        
                        (.+)\x20                                        #  Group 2        ( Date and Time )
                                                                        
                        (\(.+\))                                        #  Group 3        ( Access Control Time )
                                                                        
                        (.+)\x20\r\n                                    #  Group 4        ( 1ST Verbose Output )
                                                                        
                        \x20+(.+)\x20                                   #  Group 5        ( 2ND Verbose Output )
                        
                        (?:\r\n\x20+(.+)\x20(?:\r\n\x20+(.+)\x20)??)??  #  Group 6 and 7  ( 3RD and 4TH Verbose Output )
                        
                        (?=\Z|\r\n(\f|\x20[12]\d{6}\x20))               #  IF the VERY END of file or IF a LINE-BREAK followed with an INCIDENT NUMBER or a HEADER
                        
                        |                                               #  OR
                        
                        (?s)                                            #  The DOT regex character matches, ABSOLUTELY, ANY character
                        
                        \f.*?===\r\n\r\n                                #  The COMPLETE HEADER block, with its LINE-BREAKS
                        

                        Remember that the Find what: zone cannot store more than 2046 characters !

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • Mark AllenM
                          Mark Allen
                          last edited by

                          Mate

                          Thank you so so so Much for your help with this, I truly mean that, it worked perfectly, If its okay, I am going to distribute this way to the supplier of the product so we can use it to make better reports from this program!!!

                          Truly Thank you heaps, and thanks for Explaination as well, I should be able to use this in the future to write up my own rules…

                          Again thanks heaps!!!

                          Mark

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors