Community
    • Login

    Find and replace line not starting with pattern and copy text from previous line

    Scheduled Pinned Locked Moved General Discussion
    16 Posts 5 Posters 5.8k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nitin jainN Offline
      nitin jain
      last edited by PeterJones

      I have CSV text file which have Chat logs starting with “[” and after that it contains system date & time, there are few lines which not starting with square brackets as chat user may have sent multiples line in that message. Now i want to find all those lines and copy the previous line date & time with sender name in starting of that.

      how do I find lines not starting with “[” and than replace text from previous line user with “[ ]” time stamp.

      Here is input and desired output file .

      INPUT

      [25/11/19, 16:26:33] Roger: Not received mail
      [25/11/19, 16:27:04] Niks: Refresh
      [25/11/19, 16:28:12] Roger: Plz send again
      [25/11/19, 16:28:55] Niks: ok sent
      [25/11/19, 16:29:14] Roger: Received ok thanks
      [25/11/19, 16:29:38] Niks: 👍🏻
      ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
      [26/11/19, 13:20:57] Roger: For 60000 units
      balance trf 59160
      [26/11/19, 13:30:55] Niks: Units Batch code
      SEGX-TWHB5Z 2500
      3CRD-QAMXD9 2500
      E4ZY-7HNK35 2500
      SGMV-FR4P5Y 2500
      [26/11/19, 13:32:55] Roger: Ok thanks

      ============================================
      Output![alt text](image url)

      [25/11/19, 16:26:33] Roger: Not received mail
      [25/11/19, 16:27:04] Niks: Refresh
      [25/11/19, 16:28:12] Roger: Plz send again
      [25/11/19, 16:28:55] Niks: ok sent
      [25/11/19, 16:29:14] Roger: Received ok thanks
      [25/11/19, 16:29:38] Niks: 👍🏻
      ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
      [26/11/19, 13:20:57] Roger: For 60000 units
      [26/11/19, 13:20:57] Roger: balance trf 59160
      [26/11/19, 13:30:55] Niks: Units Batch code
      [26/11/19, 13:30:55] Niks: SEGX-TWHB5Z 2500
      [26/11/19, 13:30:55] Niks: 3CRD-QAMXD9 2500
      [26/11/19, 13:30:55] Niks: E4ZY-7HNK35 2500
      [26/11/19, 13:30:55] Niks: SGMV-FR4P5Y 2500
      [26/11/19, 13:32:55] Roger: Ok thanks

      Finding lines in Yellow color & replace with Orange text which is taken from previous lines.

      Thanks in advance
      1.PNG

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP Online
        PeterJones @nitin jain
        last edited by PeterJones

        @nitin-jain ,

        I came up with something that mostly works, except for on the line after the 👍🏻. I haven’t figured out why that is messing up my regex:

        • FIND = (?-s)^(\x5B.*?\x5D.*?:\h*).*$\R\K(?!\x5B)
        • REPLACE = $1
        • SEARCH MODE = regular expression
        • REPLACE ALL multiple times, until it’s really all done

        The line after the thumbs-up has difficulty because, as @Alan-Kilborn found a couple years back, such chat logs will occasionally have the U+200E LEFT-TO-RIGHT MARK peppered throughout… and that photo line contains two instances. So before running the regex I showed, also replace \x{200E} with empty … or I’d suggest [\x{200B}-\x{200F}\x{202A}-\x{202F}] with empty (which will get rid of other zero-width characters).

        So run the zero-width replacement first, then REPLACE ALL multiple times on the first one I shared until all the lines have that. (If your longest block is four lines that don’t start with [, you will have to REPLACE ALL four times)

        Ahamed Nawas AliA 1 Reply Last reply Reply Quote 2
        • guy038G Offline
          guy038
          last edited by guy038

          Hello, @nitin-jain, @peterjones and All,

          Oh… Peter beats me at it ! Here is my solution, quite similar !

          If you are sure that all the dates are in increasing order, simply use the following regex S/R :

          • Open the Replace dialog ( Ctrl + H )

          • SEARCH (?-s)^(\[.+\]).+\R\K(?=\w)

          • REPLACE \1$0\x20

          • Tick the Wrap around option

          • Select the Regular repression search mode

          • Click several times on the Replace All button, till you see the message 0 occurrences were replaced in entire file

          So, from the INPUT text :

          [25/11/19, 16:26:33] Roger: Not received mail
          [25/11/19, 16:27:04] Niks: Refresh
          [25/11/19, 16:28:12] Roger: Plz send again
          [25/11/19, 16:28:55] Niks: ok sent
          [25/11/19, 16:29:14] Roger: Received ok thanks
          [25/11/19, 16:29:38] Niks: 👍🏻
          ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
          [26/11/19, 13:20:57] Roger: For 60000 units
          balance trf 59160
          [26/11/19, 13:30:55] Niks: Units Batch code
          SEGX-TWHB5Z 2500
          3CRD-QAMXD9 2500
          E4ZY-7HNK35 2500
          SGMV-FR4P5Y 2500
          [26/11/19, 13:32:55] Roger: Ok thanks
          

          you’ll get the expected OUTPUT result :

          [25/11/19, 16:26:33] Roger: Not received mail
          [25/11/19, 16:27:04] Niks: Refresh
          [25/11/19, 16:28:12] Roger: Plz send again
          [25/11/19, 16:28:55] Niks: ok sent
          [25/11/19, 16:29:14] Roger: Received ok thanks
          [25/11/19, 16:29:38] Niks: 👍🏻
          ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
          [26/11/19, 13:20:57] Roger: For 60000 units
          [26/11/19, 13:20:57] balance trf 59160
          [26/11/19, 13:30:55] Niks: Units Batch code
          [26/11/19, 13:30:55] SEGX-TWHB5Z 2500
          [26/11/19, 13:30:55] 3CRD-QAMXD9 2500
          [26/11/19, 13:30:55] E4ZY-7HNK35 2500
          [26/11/19, 13:30:55] SGMV-FR4P5Y 2500
          [26/11/19, 13:32:55] Roger: Ok thanks
          

          Best regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hi, all,

            I suppose I find out a bug, in our Boost regex engine !

            Let’s take this simple example tet :

            []xyz
            ‎[]xyz
            []xyz
            ABC
            []xyz
            DEF
            

            Now, the four regexes below, should look for the literal string [] beginning a line, followed with any non-null string and its line-ending chars, ONLY IF not followed with a leading [ symbol !

            (?-s)^\\[\\].+\R(?!\\[)

            (?-s)^\\[\\].+\r\n(?!\\[)

            (?-s)^\x5b\x5d.+\R(?!\x5b)

            (?-s)^\x5b\x5d.+\r\n(?!\x5b)


            Thus, it should only match the txo lines, below :

            • The []xyz line before the string ABC

            • The []xyz line before the string DEF

            But, unfortunately, it also matches the first line []xyz ???

            Am I wrong in any way, in this matter ?

            BR

            guy038

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA Offline
              Alan Kilborn @guy038
              last edited by

              @guy038 said in Find and replace line not starting with pattern and copy text from previous line:

              But, unfortunately, it also matches the first line []xyz ???
              Am I wrong in any way, in this matter ?

              When I copy -n-paste your black box data, there is an LRM in it, which seems to cause your erroneous match!

              07559618-c16f-4265-94b8-6f9f5f27d0d1-image.png

              PeterJonesP 1 Reply Last reply Reply Quote 3
              • PeterJonesP Online
                PeterJones @Alan Kilborn
                last edited by

                @alan-kilborn said in Find and replace line not starting with pattern and copy text from previous line:

                there is an LRM in it, which seems to cause your erroneous match!

                Indeed. I was originally going to ask Guy why my regex wasn’t working with the supplied data (the same question Guy asked us), when I happened to left arrow from the [ and stayed on that same line! That told me there was a hidden character, which is why I ran the reveal-hidden-characters script from the old conversation, and I saw the infamous LRM – which is why I added the paragraph to tell @nitin-jain to do the zero-width search/replace before doing the main search/replace.

                1 Reply Last reply Reply Quote 0
                • guy038G Offline
                  guy038
                  last edited by guy038

                  Hi, @nitin-jain, @peterjones, @alan-kilborn and All,

                  Ah ah ! Alan, I, first, didn’t understand why you had the LRM sigle in the second line of my text. My second thought was that you created a Python script to make all these fancy Unicode format characters clearly visible ! But, luckily, marking any \x{200e} character did the trick and showed me a thin red mark when this special char is present !


                  So, @nitin-jain, as @peterjones said, use this simple regex S/R, below, to get rid of these format characters !

                  SEARCH [\x{200B}-\x{200F}\x{202A}-\x{202F}]

                  REPLACE Leave EMPTY

                  However, verify that this operation does not break down your text in any way ! I personally saw this case, while pasting Unicode characters from a long list, produced by this excellent and valuable site, regarding Unicode :

                  https://r12a.github.io/uniview/


                  Now, I’m pleased to note that there is no bug of our Boost regex engine, in this matter, as that special LRM char is quite a character different from a [ symbol !

                  BR

                  guy038

                  Alan KilbornA 1 Reply Last reply Reply Quote 1
                  • Alan KilbornA Offline
                    Alan Kilborn @guy038
                    last edited by

                    @guy038 said in Find and replace line not starting with pattern and copy text from previous line:

                    Alan, … you created a Python script to make all these fancy Unicode format characters clearly visible

                    Well, yes, I did. :-)

                    1 Reply Last reply Reply Quote 0
                    • Ahamed Nawas AliA Offline
                      Ahamed Nawas Ali @PeterJones
                      last edited by

                      @PeterJones I have a similar scenario where i have 10K lines i need to fix, is there any shorter way? Also, is there any way we can unmark the line number for those identified lines which does not start with a pattern.

                      Example: My line number starts with datetime (2021-09-14T21:10:55+00:00)

                      And can i make all these lines which does not start with “2021-” without line numbers provided by notepad++?

                      PeterJonesP 1 Reply Last reply Reply Quote 0
                      • PeterJonesP Online
                        PeterJones @Ahamed Nawas Ali
                        last edited by

                        @Ahamed-Nawas-Ali said in Find and replace line not starting with pattern and copy text from previous line:

                        is there any shorter way?

                        The way described above is reasonably short. I am not sure what “improvement” you think is necessary (or even possible).

                        is there any way we can unmark the line number for those identified lines which does not start with a pattern.

                        Sorry, I don’t understand how that’s different than the original question.

                        You’ll have to give a better example – use the </> button on the toolbar when you are writing the post to create pairs of ```, between which you can paste your actual data:, something like

                        **data I have**:
                        ```
                        [1234] abcxyz
                        next line
                        [5678] pdq
                        aonther
                        ```
                        
                        **desired data after transformation**
                        ```
                        [1234] abcxyz
                        [1234] next line
                        [5678] pdq
                        [5678] aonther
                        ```
                        

                        …
                        This would be rendered as the following, so we know exactly what your “before” and “after” data needs to be.
                        -–
                        data I have:

                        [1234] abcxyz
                        next line
                        [5678] pdq
                        aonther
                        

                        desired data after transformation

                        [1234] abcxyz
                        [1234] next line
                        [5678] pdq
                        [5678] aonther
                        

                        -—

                        Useful References

                        • Please Read Before Posting
                        • Template for Search/Replace Questions
                        • Formatting Forum Posts
                        • Notepad++ Online User Manual: Searching/Regex
                        • FAQ: Where to find other regular expressions (regex) documentation
                        1 Reply Last reply Reply Quote 0
                        • Ahamed Nawas AliA Offline
                          Ahamed Nawas Ali
                          last edited by

                          @PeterJones, Thanks for your reply. I am sorry, I am new to this platform.

                          Example scenario i am dealing with is with Date_Time Sender Recepients Message delimited with ‘Tab’

                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                          Selection
                          B. Home
                          Webinar 
                          IDB
                          20214980
                          202216
                          2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                          

                          And i should make it like below

                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Selection
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	B. Home
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Webinar 
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	IDB
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	20214980
                          2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	202216
                          2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                          
                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • Alan KilbornA Offline
                            Alan Kilborn @Ahamed Nawas Ali
                            last edited by

                            @Ahamed-Nawas-Ali said :

                            Example scenario i am dealing with is with Date_Time Sender Recepients Message delimited with ‘Tab’

                            For this one, I considered the following to be enough to distinguish a timestamp line leader: 2021-09-15T

                            Thus I tried (based upon @guy038’s solution earlier in this thread):

                            Find: (?-s)^(\d{4}-\d\d-\d\dT.+\t).+\R\K(?!\d{4}-\d\d-\d\dT)
                            Replace: ${1}
                            Options: Wrap around, Regular expression
                            Action: Replace All (multiple times, until no more changes occur)

                            And after several Replace All presses, I obtained the following:

                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Selection
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	B. Home
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Webinar 
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	IDB
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	20214980
                            2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	202216
                            2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                            2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	
                            

                            Note that the last line of this output is “extra” and should be manually removed.

                            Ahamed Nawas AliA 1 Reply Last reply Reply Quote 1
                            • Ahamed Nawas AliA Offline
                              Ahamed Nawas Ali @Alan Kilborn
                              last edited by

                              @Alan-Kilborn I tried with below in order to keep clicking on the buttons to replace everything and it worked in removing the line numbers however the strings are concatenated.

                              Find box: \n([^2021-])
                              Replace box: $1

                              Result:

                              2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	LearningSelectionB. HomeWebinar IDB20214980202216
                              2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                              

                              And the results are messed up a bit. Anyway, thank you so much for the time stamp line leader and for now, i will have to use it anyway to avoid further delay in my project! Thanks @guy038 & @PeterJones for your guidance! Greatly appreciate your guidance to this community! God bless you all!

                              Alan KilbornA 1 Reply Last reply Reply Quote 0
                              • Alan KilbornA Offline
                                Alan Kilborn @Ahamed Nawas Ali
                                last edited by

                                @Ahamed-Nawas-Ali said:

                                \n([^2021-])

                                That’s totally wrong for what you’re wanting… in several ways…
                                But since you seem to be in a hurry…and you can’t reasonably do anything with regex in a hurry…I won’t explain and I’ll just wish you good luck.

                                Ahamed Nawas AliA 1 Reply Last reply Reply Quote 0
                                • Ahamed Nawas AliA Offline
                                  Ahamed Nawas Ali @Alan Kilborn
                                  last edited by

                                  @Alan-Kilborn Sorry Alan! I know i am wrong with that “\n([^2021-])” as it will spoil my delimiter as well and there could be some other issues as well. Its true that one can’t learn Regex in a hurry! I am using yours snippet and thank you for that!

                                  1 Reply Last reply Reply Quote 0
                                  • guy038G Offline
                                    guy038
                                    last edited by guy038

                                    Hello, @ahamed-nawas-ali, @peterjones, @alan-kilborn and All,

                                    @ahamed-nawas-ali, I’ll use a similar search regex to the @alan-kilborn’s one !


                                    For example , given this INPUT text , below :

                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Learning
                                    Selection
                                    B. Home
                                    Webinar 
                                    IDB
                                    20214980
                                    2021420214202216
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Test
                                    B. Home
                                    Webinar 
                                    IDB
                                    20214980
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Try
                                    Selection
                                    B. Home
                                    Webinar 
                                    IDB
                                    20214980
                                    2021420214202216
                                    Blablah
                                    OK
                                    END of story
                                    
                                    • Open the Replace dialog ( Ctrl+H )

                                    • Uncheck all box options

                                    • Search (?-s)^(\d{4}-.+\t).+\R\K(?!\d{4}-|\R|\z)

                                    • Replace $1

                                    • If necessary, check the Wrap around option

                                    • Select the Regular expression search mode

                                    • Click, exclusively, on the Replace All button, several times, till the message Replace All: 0 occurrences were replaced... is displayed !

                                    At the end, you should get this expected OUTPUT text :

                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Learning
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Selection
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	B. Home
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Webinar 
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	IDB
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	20214980
                                    2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	2021420214202216
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Test
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	B. Home
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Webinar 
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	IDB
                                    2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	20214980
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Try
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Selection
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	B. Home
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Webinar 
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	IDB
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	20214980
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	2021420214202216
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Blablah
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	OK
                                    2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	END of story
                                    

                                    Voila :-))


                                    Notes :

                                    • As you can see, the number of columns, before the last one, is not a problem !

                                    • From beginning of line ( ^ ), the regex looks for a line beginning with 4 digits, followed with a dash character (\d{4}- ) and anything else till the last tabulation ( .+\t ) of current line

                                    • This search, so far, is memorized and stored as group 1

                                    • After the last field of the line and the line-break ( .+\R ), all the matched string is discarded ( \K )

                                    • Thus, the regex engine is now searching for a zero-length string, at beginning of the next line, but ONLY IF this next line does not begin with :

                                      • 4 digits and a dash char

                                      • An other line-break

                                      • The very end of current file

                                    • When this assertion is true, it just inserts the group 1 contents at the very beginning of current line

                                    Best Regards

                                    guy038

                                    P.S. :

                                    If the condition to detect the header lines seems not restrictive enough, you may use this alternate search regex :

                                    • Search (?-is)^(20\d\d-\d\d-\d\dT.+\t).+\R\K(?!20\d\d-\d\d-\d\dT|\R|\z)
                                    1 Reply Last reply Reply Quote 0

                                    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                    With your input, this post could be even better 💗

                                    Register Login
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors