Community
    • Login

    Need to merge all lines before a blank line

    Scheduled Pinned Locked Moved General Discussion
    10 Posts 4 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jesse MwangiJ
      Jesse Mwangi
      last edited by

      I have some data that I would like to edit. My data is as looks like this;

      A. M. R. E. F.
      THE AVIATION DIRECTOR
      P.O. BOX 30125.WILSON AIRPORT.
      NAIROBI. KENYA

      ABERCROMBIE & KENT LTD…
      P.O.BOX 59749.
      NAIROBI ~ 00200
      KENYA. USD

      ACHARYA TRAVEL AGENCIES LTD.
      P.O.BOX 42590.
      NAIROBI.
      KENYA. KES

      The data is separated by a blank line:
      I would like to have the data as follows;

      A. M. R. E. F. THE AVIATION DIRECTOR P.O. BOX 30125.WILSON AIRPORT. NAIROBI. KENYA

      ABERCROMBIE & KENT LTD… P.O.BOX 59749. NAIROBI ~ 00200 KENYA. USD

      ACHARYA TRAVEL AGENCIES LTD. P.O.BOX 42590. NAIROBI. KENYA. KES

      Would someone point in the right direction as per which regex I could use to achieve this. I have a lot of data to process so the automation would help.

      Thanks

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @Jesse Mwangi
        last edited by

        @Jesse-Mwangi

        I suppose the simplest thing is to do this:

        Find what zone: \R(?!\R)
        Replace with zone: make sure this is EMPTY
        Search mode: Regular expression

        This searches for a line-ending which is not followed directly by another line-ending, and effectively removes the (first) line-ending.

        Jesse MwangiJ 1 Reply Last reply Reply Quote 3
        • Terry RT
          Terry R
          last edited by

          @Scott-Sumner, @Jesse-Mwangi , might I suggest 1 small amendment.

          Have “Replace with” field include a single space. That way successive line’s text won’t butt up against the previous line. It could even have a , instead. That way, down the track it could be reconstituted as it originally looked if needed.

          As Leonardo Da Vinci said:
          Simplicity is the ultimate sophistication

          Terry

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner @Terry R
            last edited by

            @Terry-R said:

            Have “Replace with” field include a single space

            I don’t see where you are going with this…can you explain further? It seems like it would just put the space or the comma on the front of the second and greater lines…hmmmm…

            1 Reply Last reply Reply Quote 1
            • Terry RT
              Terry R
              last edited by

              @Scott-Sumner
              in the example text provided, the all the lines ended directly behind the text, Removing the CR/LF meant the 1st character of the next line was against the last character of the previous line, rather then the OP’s request showing a space between.

              Terry

              1 Reply Last reply Reply Quote 2
              • Jesse MwangiJ
                Jesse Mwangi @Scott Sumner
                last edited by

                @Scott-Sumner With the sample data I had shared your suggestion actually works like a charm. When I try it on the thousands of lines of data where the number of lines before the line break differ, the regex brings all the data in one line. Would you mind having a look at the data or maybe let me know where I am making a mistake?

                https://wetransfer.com/downloads/1d4c4aa16a819bf8f6f80d4eb8545cc320181002213144/32535c909443d7d7272f68bd674cdce520181002213144/16fe92

                Scott SumnerS 1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @jesse-mwangi, @Scott-sumner, @terry-r and All,

                  So, I downloaded your WILSON_1.csv file, without no trouble. Fine !

                  Then, before thinking anything about regex, I just deduced some facts, about this file :

                  • It contains 3221 lines/rows

                  • Text is always written, in an uppercase way and, mainly, located in column A

                  • I noticed 3 exceptions, only :

                    • At row 21, we have “ATT”, in column A and “WILLIAM OREMBO”, in column B
                    • At row 1139, we have “ATTN”, in column A and “AART MULDER”, in column B
                    • At row 1160, we have “ATTN”, in column A and “SALLY”, in column B

                  => So, I moved, for these 3 rows, the text, in column B, after the present text of column A. Then, I selected all that .CSV text, located in column A and pasted it in a new N++ tab

                  • Now, clicking on the Show all characters icon, I noticed that :

                    • Non-blank lines begin, generally, with a letter but few of them begin with a space character

                    • Non-blank lines end, generally, with a space character, but some don’t

                    • The paragraph line-breaks, generally, begin with a space character

                  => I thought it would be better to trim all these characters, first, with the menu command Edit > Blank operations > Trim Leading and Trailing Spaces. ( Of course, this can be done, either, with the SEARCH regex ^\h+|\h+$ and the REPLACEMENT zone left empty ! )


                  Well, now, our text is quite clean:-)) The next step consists to replace any line-break, which is, both, preceded and followed with a standard character with a single space character, in order to easily visualize the former lines !

                  Thus, the regex S/R, below :

                  • SEARCH (?-s)(?<=.)\R(?=.)

                  • REPLACE \x20

                  • Select, preferably, the Wrap around option

                  • Click, once, on the Replace All button

                  => 1506 replacements occur and your file contains, now, 1715 lines only !

                  • Finally, select all this new text, start Excel, do a Ctrl + V operation, in cell 1A and re-save it, in an Excel format

                  Et voilà !

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 4
                  • Jesse MwangiJ
                    Jesse Mwangi
                    last edited by

                    This works very well and I have my data the way I wanted it. Thanks a lot guys for your time, I have learnt so much.

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @jesse-mwangi and All,

                      I just forgot to explain my previous regex S/R ! So :

                      • First, the (?-s) is an on-line modifier which forces the regex engine to consider that the dot meta-character ( . ) will match any single standard character and not the EOL chars

                      • Then it searches for the \R part which, globally, represents any kind of EOL characters ( \r\n in Windows files, \n in Unix files or \r in MAC files ), but ONLY IF :

                        • The line-break is preceded with a single standard character, due to the positive look-behind (?<=.) ( so preceded with a non-blank line, of the present paragraph )

                        • The line-break is followed with a single standard character, due to the positive look-ahead (?=.) ( so followed with a non-blank line, of the present paragraph )

                      • In replacement, this/these EOL character(s) are, simply, replaced with a single space char, \x20. Note that you may, as well, simply hit the space key !

                      Refer to the post, below, for further information on Regular expressions ;-))

                      https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation/1

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 1
                      • Scott SumnerS
                        Scott Sumner @Jesse Mwangi
                        last edited by Scott Sumner

                        @Terry-R is right. I botched the original try at it because I didn’t notice that the replace action didn’t put a space between the lines that were joined. I totally missed that–TWICE! Sorry about that. (Thanks to Terry for pointing that out!)

                        @Jesse-Mwangi said:

                        the regex brings all the data in one line

                        So the reason that it all ends up on one line is that you have a single blank space on the lines that I interpreted as being totally empty from your sample data. I figured this out by downloading your file (otherwise that would have been quite difficult to discover!).

                        @Jesse-Mwangi, so I presume that @guy038 's help has you on the right track now…please post again if not and we’ll take another look.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors