• Login
Community
  • Login

Need to merge all lines before a blank line

Scheduled Pinned Locked Moved General Discussion
10 Posts 4 Posters 2.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    Jesse Mwangi
    last edited by Oct 2, 2018, 9:52 AM

    I have some data that I would like to edit. My data is as looks like this;

    A. M. R. E. F.
    THE AVIATION DIRECTOR
    P.O. BOX 30125.WILSON AIRPORT.
    NAIROBI. KENYA

    ABERCROMBIE & KENT LTD…
    P.O.BOX 59749.
    NAIROBI ~ 00200
    KENYA. USD

    ACHARYA TRAVEL AGENCIES LTD.
    P.O.BOX 42590.
    NAIROBI.
    KENYA. KES

    The data is separated by a blank line:
    I would like to have the data as follows;

    A. M. R. E. F. THE AVIATION DIRECTOR P.O. BOX 30125.WILSON AIRPORT. NAIROBI. KENYA

    ABERCROMBIE & KENT LTD… P.O.BOX 59749. NAIROBI ~ 00200 KENYA. USD

    ACHARYA TRAVEL AGENCIES LTD. P.O.BOX 42590. NAIROBI. KENYA. KES

    Would someone point in the right direction as per which regex I could use to achieve this. I have a lot of data to process so the automation would help.

    Thanks

    S 1 Reply Last reply Oct 2, 2018, 12:31 PM Reply Quote 0
    • S
      Scott Sumner @Jesse Mwangi
      last edited by Oct 2, 2018, 12:31 PM

      @Jesse-Mwangi

      I suppose the simplest thing is to do this:

      Find what zone: \R(?!\R)
      Replace with zone: make sure this is EMPTY
      Search mode: Regular expression

      This searches for a line-ending which is not followed directly by another line-ending, and effectively removes the (first) line-ending.

      J 1 Reply Last reply Oct 2, 2018, 9:41 PM Reply Quote 3
      • T
        Terry R
        last edited by Oct 2, 2018, 8:03 PM

        @Scott-Sumner, @Jesse-Mwangi , might I suggest 1 small amendment.

        Have “Replace with” field include a single space. That way successive line’s text won’t butt up against the previous line. It could even have a , instead. That way, down the track it could be reconstituted as it originally looked if needed.

        As Leonardo Da Vinci said:
        Simplicity is the ultimate sophistication

        Terry

        S 1 Reply Last reply Oct 2, 2018, 8:29 PM Reply Quote 0
        • S
          Scott Sumner @Terry R
          last edited by Oct 2, 2018, 8:29 PM

          @Terry-R said:

          Have “Replace with” field include a single space

          I don’t see where you are going with this…can you explain further? It seems like it would just put the space or the comma on the front of the second and greater lines…hmmmm…

          1 Reply Last reply Reply Quote 1
          • T
            Terry R
            last edited by Oct 2, 2018, 8:33 PM

            @Scott-Sumner
            in the example text provided, the all the lines ended directly behind the text, Removing the CR/LF meant the 1st character of the next line was against the last character of the previous line, rather then the OP’s request showing a space between.

            Terry

            1 Reply Last reply Reply Quote 2
            • J
              Jesse Mwangi @Scott Sumner
              last edited by Oct 2, 2018, 9:41 PM

              @Scott-Sumner With the sample data I had shared your suggestion actually works like a charm. When I try it on the thousands of lines of data where the number of lines before the line break differ, the regex brings all the data in one line. Would you mind having a look at the data or maybe let me know where I am making a mistake?

              https://wetransfer.com/downloads/1d4c4aa16a819bf8f6f80d4eb8545cc320181002213144/32535c909443d7d7272f68bd674cdce520181002213144/16fe92

              S 1 Reply Last reply Oct 3, 2018, 1:00 PM Reply Quote 1
              • G
                guy038
                last edited by guy038 Oct 3, 2018, 10:56 AM Oct 3, 2018, 12:31 AM

                Hello, @jesse-mwangi, @Scott-sumner, @terry-r and All,

                So, I downloaded your WILSON_1.csv file, without no trouble. Fine !

                Then, before thinking anything about regex, I just deduced some facts, about this file :

                • It contains 3221 lines/rows

                • Text is always written, in an uppercase way and, mainly, located in column A

                • I noticed 3 exceptions, only :

                  • At row 21, we have “ATT”, in column A and “WILLIAM OREMBO”, in column B
                  • At row 1139, we have “ATTN”, in column A and “AART MULDER”, in column B
                  • At row 1160, we have “ATTN”, in column A and “SALLY”, in column B

                => So, I moved, for these 3 rows, the text, in column B, after the present text of column A. Then, I selected all that .CSV text, located in column A and pasted it in a new N++ tab

                • Now, clicking on the Show all characters icon, I noticed that :

                  • Non-blank lines begin, generally, with a letter but few of them begin with a space character

                  • Non-blank lines end, generally, with a space character, but some don’t

                  • The paragraph line-breaks, generally, begin with a space character

                => I thought it would be better to trim all these characters, first, with the menu command Edit > Blank operations > Trim Leading and Trailing Spaces. ( Of course, this can be done, either, with the SEARCH regex ^\h+|\h+$ and the REPLACEMENT zone left empty ! )


                Well, now, our text is quite clean:-)) The next step consists to replace any line-break, which is, both, preceded and followed with a standard character with a single space character, in order to easily visualize the former lines !

                Thus, the regex S/R, below :

                • SEARCH (?-s)(?<=.)\R(?=.)

                • REPLACE \x20

                • Select, preferably, the Wrap around option

                • Click, once, on the Replace All button

                => 1506 replacements occur and your file contains, now, 1715 lines only !

                • Finally, select all this new text, start Excel, do a Ctrl + V operation, in cell 1A and re-save it, in an Excel format

                Et voilà !

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 4
                • J
                  Jesse Mwangi
                  last edited by Oct 3, 2018, 10:54 AM

                  This works very well and I have my data the way I wanted it. Thanks a lot guys for your time, I have learnt so much.

                  1 Reply Last reply Reply Quote 2
                  • G
                    guy038
                    last edited by guy038 Oct 3, 2018, 11:45 AM Oct 3, 2018, 11:14 AM

                    Hello, @jesse-mwangi and All,

                    I just forgot to explain my previous regex S/R ! So :

                    • First, the (?-s) is an on-line modifier which forces the regex engine to consider that the dot meta-character ( . ) will match any single standard character and not the EOL chars

                    • Then it searches for the \R part which, globally, represents any kind of EOL characters ( \r\n in Windows files, \n in Unix files or \r in MAC files ), but ONLY IF :

                      • The line-break is preceded with a single standard character, due to the positive look-behind (?<=.) ( so preceded with a non-blank line, of the present paragraph )

                      • The line-break is followed with a single standard character, due to the positive look-ahead (?=.) ( so followed with a non-blank line, of the present paragraph )

                    • In replacement, this/these EOL character(s) are, simply, replaced with a single space char, \x20. Note that you may, as well, simply hit the space key !

                    Refer to the post, below, for further information on Regular expressions ;-))

                    https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation/1

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • S
                      Scott Sumner @Jesse Mwangi
                      last edited by Scott Sumner Oct 3, 2018, 1:01 PM Oct 3, 2018, 1:00 PM

                      @Terry-R is right. I botched the original try at it because I didn’t notice that the replace action didn’t put a space between the lines that were joined. I totally missed that–TWICE! Sorry about that. (Thanks to Terry for pointing that out!)

                      @Jesse-Mwangi said:

                      the regex brings all the data in one line

                      So the reason that it all ends up on one line is that you have a single blank space on the lines that I interpreted as being totally empty from your sample data. I figured this out by downloading your file (otherwise that would have been quite difficult to discover!).

                      @Jesse-Mwangi, so I presume that @guy038 's help has you on the right track now…please post again if not and we’ll take another look.

                      1 Reply Last reply Reply Quote 1
                      4 out of 10
                      • First post
                        4/10
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors