Need to merge all lines before a blank line



  • I have some data that I would like to edit. My data is as looks like this;

    A. M. R. E. F.
    THE AVIATION DIRECTOR
    P.O. BOX 30125.WILSON AIRPORT.
    NAIROBI. KENYA

    ABERCROMBIE & KENT LTD…
    P.O.BOX 59749.
    NAIROBI ~ 00200
    KENYA. USD

    ACHARYA TRAVEL AGENCIES LTD.
    P.O.BOX 42590.
    NAIROBI.
    KENYA. KES

    The data is separated by a blank line:
    I would like to have the data as follows;

    A. M. R. E. F. THE AVIATION DIRECTOR P.O. BOX 30125.WILSON AIRPORT. NAIROBI. KENYA

    ABERCROMBIE & KENT LTD… P.O.BOX 59749. NAIROBI ~ 00200 KENYA. USD

    ACHARYA TRAVEL AGENCIES LTD. P.O.BOX 42590. NAIROBI. KENYA. KES

    Would someone point in the right direction as per which regex I could use to achieve this. I have a lot of data to process so the automation would help.

    Thanks



  • @Jesse-Mwangi

    I suppose the simplest thing is to do this:

    Find what zone: \R(?!\R)
    Replace with zone: make sure this is EMPTY
    Search mode: Regular expression

    This searches for a line-ending which is not followed directly by another line-ending, and effectively removes the (first) line-ending.



  • @Scott-Sumner, @Jesse-Mwangi , might I suggest 1 small amendment.

    Have “Replace with” field include a single space. That way successive line’s text won’t butt up against the previous line. It could even have a , instead. That way, down the track it could be reconstituted as it originally looked if needed.

    As Leonardo Da Vinci said:
    Simplicity is the ultimate sophistication

    Terry



  • @Terry-R said:

    Have “Replace with” field include a single space

    I don’t see where you are going with this…can you explain further? It seems like it would just put the space or the comma on the front of the second and greater lines…hmmmm…



  • @Scott-Sumner
    in the example text provided, the all the lines ended directly behind the text, Removing the CR/LF meant the 1st character of the next line was against the last character of the previous line, rather then the OP’s request showing a space between.

    Terry



  • @Scott-Sumner With the sample data I had shared your suggestion actually works like a charm. When I try it on the thousands of lines of data where the number of lines before the line break differ, the regex brings all the data in one line. Would you mind having a look at the data or maybe let me know where I am making a mistake?

    https://wetransfer.com/downloads/1d4c4aa16a819bf8f6f80d4eb8545cc320181002213144/32535c909443d7d7272f68bd674cdce520181002213144/16fe92



  • Hello, @jesse-mwangi, @Scott-sumner, @terry-r and All,

    So, I downloaded your WILSON_1.csv file, without no trouble. Fine !

    Then, before thinking anything about regex, I just deduced some facts, about this file :

    • It contains 3221 lines/rows

    • Text is always written, in an uppercase way and, mainly, located in column A

    • I noticed 3 exceptions, only :

      • At row 21, we have “ATT”, in column A and “WILLIAM OREMBO”, in column B
      • At row 1139, we have “ATTN”, in column A and “AART MULDER”, in column B
      • At row 1160, we have “ATTN”, in column A and “SALLY”, in column B

    => So, I moved, for these 3 rows, the text, in column B, after the present text of column A. Then, I selected all that .CSV text, located in column A and pasted it in a new N++ tab

    • Now, clicking on the Show all characters icon, I noticed that :

      • Non-blank lines begin, generally, with a letter but few of them begin with a space character

      • Non-blank lines end, generally, with a space character, but some don’t

      • The paragraph line-breaks, generally, begin with a space character

    => I thought it would be better to trim all these characters, first, with the menu command Edit > Blank operations > Trim Leading and Trailing Spaces. ( Of course, this can be done, either, with the SEARCH regex ^\h+|\h+$ and the REPLACEMENT zone left empty ! )


    Well, now, our text is quite clean:-)) The next step consists to replace any line-break, which is, both, preceded and followed with a standard character with a single space character, in order to easily visualize the former lines !

    Thus, the regex S/R, below :

    • SEARCH (?-s)(?<=.)\R(?=.)

    • REPLACE \x20

    • Select, preferably, the Wrap around option

    • Click, once, on the Replace All button

    => 1506 replacements occur and your file contains, now, 1715 lines only !

    • Finally, select all this new text, start Excel, do a Ctrl + V operation, in cell 1A and re-save it, in an Excel format

    Et voilà !

    Cheers,

    guy038



  • This works very well and I have my data the way I wanted it. Thanks a lot guys for your time, I have learnt so much.



  • Hello, @jesse-mwangi and All,

    I just forgot to explain my previous regex S/R ! So :

    • First, the (?-s) is an on-line modifier which forces the regex engine to consider that the dot meta-character ( . ) will match any single standard character and not the EOL chars

    • Then it searches for the \R part which, globally, represents any kind of EOL characters ( \r\n in Windows files, \n in Unix files or \r in MAC files ), but ONLY IF :

      • The line-break is preceded with a single standard character, due to the positive look-behind (?<=.) ( so preceded with a non-blank line, of the present paragraph )

      • The line-break is followed with a single standard character, due to the positive look-ahead (?=.) ( so followed with a non-blank line, of the present paragraph )

    • In replacement, this/these EOL character(s) are, simply, replaced with a single space char, \x20. Note that you may, as well, simply hit the space key !

    Refer to the post, below, for further information on Regular expressions ;-))

    https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation/1

    Cheers,

    guy038



  • @Terry-R is right. I botched the original try at it because I didn’t notice that the replace action didn’t put a space between the lines that were joined. I totally missed that–TWICE! Sorry about that. (Thanks to Terry for pointing that out!)

    @Jesse-Mwangi said:

    the regex brings all the data in one line

    So the reason that it all ends up on one line is that you have a single blank space on the lines that I interpreted as being totally empty from your sample data. I figured this out by downloading your file (otherwise that would have been quite difficult to discover!).

    @Jesse-Mwangi, so I presume that @guy038 's help has you on the right track now…please post again if not and we’ll take another look.


Log in to reply