• Login
Community
  • Login

Find and replace line not starting with pattern and copy text from previous line

Scheduled Pinned Locked Moved General Discussion
16 Posts 5 Posters 2.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Apr 25, 2022, 4:25 PM Apr 25, 2022, 4:23 PM

    Hello, @nitin-jain, @peterjones and All,

    Oh… Peter beats me at it ! Here is my solution, quite similar !

    If you are sure that all the dates are in increasing order, simply use the following regex S/R :

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH (?-s)^(\[.+\]).+\R\K(?=\w)

    • REPLACE \1$0\x20

    • Tick the Wrap around option

    • Select the Regular repression search mode

    • Click several times on the Replace All button, till you see the message 0 occurrences were replaced in entire file

    So, from the INPUT text :

    [25/11/19, 16:26:33] Roger: Not received mail
    [25/11/19, 16:27:04] Niks: Refresh
    [25/11/19, 16:28:12] Roger: Plz send again
    [25/11/19, 16:28:55] Niks: ok sent
    [25/11/19, 16:29:14] Roger: Received ok thanks
    [25/11/19, 16:29:38] Niks: 👍🏻
    ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
    [26/11/19, 13:20:57] Roger: For 60000 units
    balance trf 59160
    [26/11/19, 13:30:55] Niks: Units Batch code
    SEGX-TWHB5Z 2500
    3CRD-QAMXD9 2500
    E4ZY-7HNK35 2500
    SGMV-FR4P5Y 2500
    [26/11/19, 13:32:55] Roger: Ok thanks
    

    you’ll get the expected OUTPUT result :

    [25/11/19, 16:26:33] Roger: Not received mail
    [25/11/19, 16:27:04] Niks: Refresh
    [25/11/19, 16:28:12] Roger: Plz send again
    [25/11/19, 16:28:55] Niks: ok sent
    [25/11/19, 16:29:14] Roger: Received ok thanks
    [25/11/19, 16:29:38] Niks: 👍🏻
    ‎[26/11/19, 13:20:31] Roger: ‎<attached: 00000110-PHOTO-2019-11-26-13-20-31.jpg>
    [26/11/19, 13:20:57] Roger: For 60000 units
    [26/11/19, 13:20:57] balance trf 59160
    [26/11/19, 13:30:55] Niks: Units Batch code
    [26/11/19, 13:30:55] SEGX-TWHB5Z 2500
    [26/11/19, 13:30:55] 3CRD-QAMXD9 2500
    [26/11/19, 13:30:55] E4ZY-7HNK35 2500
    [26/11/19, 13:30:55] SGMV-FR4P5Y 2500
    [26/11/19, 13:32:55] Roger: Ok thanks
    

    Best regards,

    guy038

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Apr 25, 2022, 4:45 PM Apr 25, 2022, 4:42 PM

      Hi, all,

      I suppose I find out a bug, in our Boost regex engine !

      Let’s take this simple example tet :

      []xyz
      ‎[]xyz
      []xyz
      ABC
      []xyz
      DEF
      

      Now, the four regexes below, should look for the literal string [] beginning a line, followed with any non-null string and its line-ending chars, ONLY IF not followed with a leading [ symbol !

      (?-s)^\\[\\].+\R(?!\\[)

      (?-s)^\\[\\].+\r\n(?!\\[)

      (?-s)^\x5b\x5d.+\R(?!\x5b)

      (?-s)^\x5b\x5d.+\r\n(?!\x5b)


      Thus, it should only match the txo lines, below :

      • The []xyz line before the string ABC

      • The []xyz line before the string DEF

      But, unfortunately, it also matches the first line []xyz ???

      Am I wrong in any way, in this matter ?

      BR

      guy038

      A 1 Reply Last reply Apr 25, 2022, 4:58 PM Reply Quote 0
      • A
        Alan Kilborn @guy038
        last edited by Apr 25, 2022, 4:58 PM

        @guy038 said in Find and replace line not starting with pattern and copy text from previous line:

        But, unfortunately, it also matches the first line []xyz ???
        Am I wrong in any way, in this matter ?

        When I copy -n-paste your black box data, there is an LRM in it, which seems to cause your erroneous match!

        07559618-c16f-4265-94b8-6f9f5f27d0d1-image.png

        P 1 Reply Last reply Apr 25, 2022, 6:41 PM Reply Quote 3
        • P
          PeterJones @Alan Kilborn
          last edited by Apr 25, 2022, 6:41 PM

          @alan-kilborn said in Find and replace line not starting with pattern and copy text from previous line:

          there is an LRM in it, which seems to cause your erroneous match!

          Indeed. I was originally going to ask Guy why my regex wasn’t working with the supplied data (the same question Guy asked us), when I happened to left arrow from the [ and stayed on that same line! That told me there was a hidden character, which is why I ran the reveal-hidden-characters script from the old conversation, and I saw the infamous LRM – which is why I added the paragraph to tell @nitin-jain to do the zero-width search/replace before doing the main search/replace.

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by guy038 Apr 26, 2022, 5:00 PM Apr 25, 2022, 11:27 PM

            Hi, @nitin-jain, @peterjones, @alan-kilborn and All,

            Ah ah ! Alan, I, first, didn’t understand why you had the LRM sigle in the second line of my text. My second thought was that you created a Python script to make all these fancy Unicode format characters clearly visible ! But, luckily, marking any \x{200e} character did the trick and showed me a thin red mark when this special char is present !


            So, @nitin-jain, as @peterjones said, use this simple regex S/R, below, to get rid of these format characters !

            SEARCH [\x{200B}-\x{200F}\x{202A}-\x{202F}]

            REPLACE Leave EMPTY

            However, verify that this operation does not break down your text in any way ! I personally saw this case, while pasting Unicode characters from a long list, produced by this excellent and valuable site, regarding Unicode :

            https://r12a.github.io/uniview/


            Now, I’m pleased to note that there is no bug of our Boost regex engine, in this matter, as that special LRM char is quite a character different from a [ symbol !

            BR

            guy038

            A 1 Reply Last reply Apr 26, 2022, 12:05 PM Reply Quote 1
            • A
              Alan Kilborn @guy038
              last edited by Apr 26, 2022, 12:05 PM

              @guy038 said in Find and replace line not starting with pattern and copy text from previous line:

              Alan, … you created a Python script to make all these fancy Unicode format characters clearly visible

              Well, yes, I did. :-)

              1 Reply Last reply Reply Quote 0
              • A
                Ahamed Nawas Ali @PeterJones
                last edited by Oct 25, 2024, 6:09 PM

                @PeterJones I have a similar scenario where i have 10K lines i need to fix, is there any shorter way? Also, is there any way we can unmark the line number for those identified lines which does not start with a pattern.

                Example: My line number starts with datetime (2021-09-14T21:10:55+00:00)

                And can i make all these lines which does not start with “2021-” without line numbers provided by notepad++?

                P 1 Reply Last reply Oct 25, 2024, 6:39 PM Reply Quote 0
                • P
                  PeterJones @Ahamed Nawas Ali
                  last edited by Oct 25, 2024, 6:39 PM

                  @Ahamed-Nawas-Ali said in Find and replace line not starting with pattern and copy text from previous line:

                  is there any shorter way?

                  The way described above is reasonably short. I am not sure what “improvement” you think is necessary (or even possible).

                  is there any way we can unmark the line number for those identified lines which does not start with a pattern.

                  Sorry, I don’t understand how that’s different than the original question.

                  You’ll have to give a better example – use the </> button on the toolbar when you are writing the post to create pairs of ```, between which you can paste your actual data:, something like

                  **data I have**:
                  ```
                  [1234] abcxyz
                  next line
                  [5678] pdq
                  aonther
                  ```
                  
                  **desired data after transformation**
                  ```
                  [1234] abcxyz
                  [1234] next line
                  [5678] pdq
                  [5678] aonther
                  ```
                  

                  …
                  This would be rendered as the following, so we know exactly what your “before” and “after” data needs to be.
                  —
                  data I have:

                  [1234] abcxyz
                  next line
                  [5678] pdq
                  aonther
                  

                  desired data after transformation

                  [1234] abcxyz
                  [1234] next line
                  [5678] pdq
                  [5678] aonther
                  

                  ----

                  Useful References

                  • Please Read Before Posting
                  • Template for Search/Replace Questions
                  • Formatting Forum Posts
                  • Notepad++ Online User Manual: Searching/Regex
                  • FAQ: Where to find other regular expressions (regex) documentation
                  1 Reply Last reply Reply Quote 0
                  • A
                    Ahamed Nawas Ali
                    last edited by Oct 26, 2024, 7:22 AM

                    @PeterJones, Thanks for your reply. I am sorry, I am new to this platform.

                    Example scenario i am dealing with is with Date_Time Sender Recepients Message delimited with ‘Tab’

                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                    Selection
                    B. Home
                    Webinar 
                    IDB
                    20214980
                    202216
                    2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                    

                    And i should make it like below

                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Selection
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	B. Home
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Webinar 
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	IDB
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	20214980
                    2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	202216
                    2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                    
                    A 1 Reply Last reply Oct 26, 2024, 10:22 AM Reply Quote 0
                    • A
                      Alan Kilborn @Ahamed Nawas Ali
                      last edited by Oct 26, 2024, 10:22 AM

                      @Ahamed-Nawas-Ali said :

                      Example scenario i am dealing with is with Date_Time Sender Recepients Message delimited with ‘Tab’

                      For this one, I considered the following to be enough to distinguish a timestamp line leader: 2021-09-15T

                      Thus I tried (based upon @guy038’s solution earlier in this thread):

                      Find: (?-s)^(\d{4}-\d\d-\d\dT.+\t).+\R\K(?!\d{4}-\d\d-\d\dT)
                      Replace: ${1}
                      Options: Wrap around, Regular expression
                      Action: Replace All (multiple times, until no more changes occur)

                      And after several Replace All presses, I obtained the following:

                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Learning
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Selection
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	B. Home
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	Webinar 
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	IDB
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	20214980
                      2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	202216
                      2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                      2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	
                      

                      Note that the last line of this output is “extra” and should be manually removed.

                      A 1 Reply Last reply Oct 26, 2024, 10:54 AM Reply Quote 1
                      • A
                        Ahamed Nawas Ali @Alan Kilborn
                        last edited by Oct 26, 2024, 10:54 AM

                        @Alan-Kilborn I tried with below in order to keep clicking on the buttons to replace everything and it worked in removing the line numbers however the strings are concatenated.

                        Find box: \n([^2021-])
                        Replace box: $1

                        Result:

                        2021-09-14T21:10:55+00:00	Nawas	Ram Kumar,Ahamed Ali	LearningSelectionB. HomeWebinar IDB20214980202216
                        2021-09-15T11:19:14+00:00	Ahamed Ali	Nawas	Thanks!
                        

                        And the results are messed up a bit. Anyway, thank you so much for the time stamp line leader and for now, i will have to use it anyway to avoid further delay in my project! Thanks @guy038 & @PeterJones for your guidance! Greatly appreciate your guidance to this community! God bless you all!

                        A 1 Reply Last reply Oct 26, 2024, 11:03 AM Reply Quote 0
                        • A
                          Alan Kilborn @Ahamed Nawas Ali
                          last edited by Oct 26, 2024, 11:03 AM

                          @Ahamed-Nawas-Ali said:

                          \n([^2021-])

                          That’s totally wrong for what you’re wanting… in several ways…
                          But since you seem to be in a hurry…and you can’t reasonably do anything with regex in a hurry…I won’t explain and I’ll just wish you good luck.

                          A 1 Reply Last reply Oct 26, 2024, 1:13 PM Reply Quote 0
                          • A
                            Ahamed Nawas Ali @Alan Kilborn
                            last edited by Oct 26, 2024, 1:13 PM

                            @Alan-Kilborn Sorry Alan! I know i am wrong with that “\n([^2021-])” as it will spoil my delimiter as well and there could be some other issues as well. Its true that one can’t learn Regex in a hurry! I am using yours snippet and thank you for that!

                            1 Reply Last reply Reply Quote 0
                            • G
                              guy038
                              last edited by guy038 Oct 26, 2024, 3:50 PM Oct 26, 2024, 3:43 PM

                              Hello, @ahamed-nawas-ali, @peterjones, @alan-kilborn and All,

                              @ahamed-nawas-ali, I’ll use a similar search regex to the @alan-kilborn’s one !


                              For example , given this INPUT text , below :

                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Learning
                              Selection
                              B. Home
                              Webinar 
                              IDB
                              20214980
                              2021420214202216
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Test
                              B. Home
                              Webinar 
                              IDB
                              20214980
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Try
                              Selection
                              B. Home
                              Webinar 
                              IDB
                              20214980
                              2021420214202216
                              Blablah
                              OK
                              END of story
                              
                              • Open the Replace dialog ( Ctrl+H )

                              • Uncheck all box options

                              • Search (?-s)^(\d{4}-.+\t).+\R\K(?!\d{4}-|\R|\z)

                              • Replace $1

                              • If necessary, check the Wrap around option

                              • Select the Regular expression search mode

                              • Click, exclusively, on the Replace All button, several times, till the message Replace All: 0 occurrences were replaced... is displayed !

                              At the end, you should get this expected OUTPUT text :

                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Learning
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Selection
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	B. Home
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	Webinar 
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	IDB
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	20214980
                              2021-09-14T21:10:55+00:00	ATX	Field3	Guy	Field5	2021420214202216
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Test
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	B. Home
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	Webinar 
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	IDB
                              2021-09-15T11:19:14+00:00	BYQ	Field3	Alan	Field5	20214980
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Try
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Selection
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	B. Home
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Webinar 
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	IDB
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	20214980
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	2021420214202216
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	Blablah
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	OK
                              2021-09-16T15:07:46+00:00	ATX	Field3	Peter	Field5	END of story
                              

                              Voila :-))


                              Notes :

                              • As you can see, the number of columns, before the last one, is not a problem !

                              • From beginning of line ( ^ ), the regex looks for a line beginning with 4 digits, followed with a dash character (\d{4}- ) and anything else till the last tabulation ( .+\t ) of current line

                              • This search, so far, is memorized and stored as group 1

                              • After the last field of the line and the line-break ( .+\R ), all the matched string is discarded ( \K )

                              • Thus, the regex engine is now searching for a zero-length string, at beginning of the next line, but ONLY IF this next line does not begin with :

                                • 4 digits and a dash char

                                • An other line-break

                                • The very end of current file

                              • When this assertion is true, it just inserts the group 1 contents at the very beginning of current line

                              Best Regards

                              guy038

                              P.S. :

                              If the condition to detect the header lines seems not restrictive enough, you may use this alternate search regex :

                              • Search (?-is)^(20\d\d-\d\d-\d\dT.+\t).+\R\K(?!20\d\d-\d\d-\d\dT|\R|\z)
                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors