Need to merge all lines before a blank line
-
I have some data that I would like to edit. My data is as looks like this;
A. M. R. E. F.
THE AVIATION DIRECTOR
P.O. BOX 30125.WILSON AIRPORT.
NAIROBI. KENYAABERCROMBIE & KENT LTD…
P.O.BOX 59749.
NAIROBI ~ 00200
KENYA. USDACHARYA TRAVEL AGENCIES LTD.
P.O.BOX 42590.
NAIROBI.
KENYA. KESThe data is separated by a blank line:
I would like to have the data as follows;A. M. R. E. F. THE AVIATION DIRECTOR P.O. BOX 30125.WILSON AIRPORT. NAIROBI. KENYA
ABERCROMBIE & KENT LTD… P.O.BOX 59749. NAIROBI ~ 00200 KENYA. USD
ACHARYA TRAVEL AGENCIES LTD. P.O.BOX 42590. NAIROBI. KENYA. KES
Would someone point in the right direction as per which regex I could use to achieve this. I have a lot of data to process so the automation would help.
Thanks
-
I suppose the simplest thing is to do this:
Find what zone:
\R(?!\R)
Replace with zone: make sure this is EMPTY
Search mode: Regular expressionThis searches for a line-ending which is not followed directly by another line-ending, and effectively removes the (first) line-ending.
-
@Scott-Sumner, @Jesse-Mwangi , might I suggest 1 small amendment.
Have “Replace with” field include a single space. That way successive line’s text won’t butt up against the previous line. It could even have a
,
instead. That way, down the track it could be reconstituted as it originally looked if needed.As Leonardo Da Vinci said:
Simplicity is the ultimate sophisticationTerry
-
@Terry-R said:
Have “Replace with” field include a single space
I don’t see where you are going with this…can you explain further? It seems like it would just put the space or the comma on the front of the second and greater lines…hmmmm…
-
@Scott-Sumner
in the example text provided, the all the lines ended directly behind the text, Removing the CR/LF meant the 1st character of the next line was against the last character of the previous line, rather then the OP’s request showing a space between.Terry
-
@Scott-Sumner With the sample data I had shared your suggestion actually works like a charm. When I try it on the thousands of lines of data where the number of lines before the line break differ, the regex brings all the data in one line. Would you mind having a look at the data or maybe let me know where I am making a mistake?
-
Hello, @jesse-mwangi, @Scott-sumner, @terry-r and All,
So, I downloaded your WILSON_1.csv file, without no trouble. Fine !
Then, before thinking anything about regex, I just deduced some facts, about this file :
-
It contains
3221
lines/rows -
Text is always written, in an uppercase way and, mainly, located in column
A
-
I noticed
3
exceptions, only :- At row
21
, we have “ATT”, in columnA
and “WILLIAM OREMBO”, in columnB
- At row
1139
, we have “ATTN”, in columnA
and “AART MULDER”, in columnB
- At row
1160
, we have “ATTN”, in columnA
and “SALLY”, in columnB
- At row
=> So, I moved, for these
3
rows, the text, in columnB
, after the present text of columnA
. Then, I selected all that.CSV
text, located in columnA
and pasted it in a new N++ tab-
Now, clicking on the Show all characters icon, I noticed that :
-
Non-blank lines begin, generally, with a letter but few of them begin with a space character
-
Non-blank lines end, generally, with a space character, but some don’t
-
The paragraph line-breaks, generally, begin with a space character
-
=> I thought it would be better to trim all these characters, first, with the menu command Edit > Blank operations > Trim Leading and Trailing Spaces. ( Of course, this can be done, either, with the SEARCH regex
^\h+|\h+$
and the REPLACEMENT zone leftempty
! )
Well, now, our text is quite clean:-)) The next step consists to replace any line-break, which is, both, preceded and followed with a standard character with a single space character, in order to easily visualize the former lines !
Thus, the regex S/R, below :
-
SEARCH
(?-s)(?<=.)\R(?=.)
-
REPLACE
\x20
-
Select, preferably, the
Wrap around
option -
Click, once, on the
Replace All
button
=>
1506
replacements occur and your file contains, now,1715
lines only !- Finally, select all this new text, start Excel, do a
Ctrl + V
operation, in cell1A
and re-save it, in an Excel format
Et voilà !
Cheers,
guy038
-
-
This works very well and I have my data the way I wanted it. Thanks a lot guys for your time, I have learnt so much.
-
Hello, @jesse-mwangi and All,
I just forgot to explain my previous regex S/R ! So :
-
First, the
(?-s)
is an on-line modifier which forces the regex engine to consider that the dot meta-character (.
) will match any single standard character and not the EOL chars -
Then it searches for the
\R
part which, globally, represents any kind of EOL characters (\r\n
in Windows files,\n
in Unix files or\r
in MAC files ), but ONLY IF :-
The line-break is preceded with a single standard character, due to the positive look-behind
(?<=.)
( so preceded with a non-blank line, of the present paragraph ) -
The line-break is followed with a single standard character, due to the positive look-ahead
(?=.)
( so followed with a non-blank line, of the present paragraph )
-
-
In replacement, this/these EOL character(s) are, simply, replaced with a single space char,
\x20
. Note that you may, as well, simply hit the space key !
Refer to the post, below, for further information on Regular expressions ;-))
https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation/1
Cheers,
guy038
-
-
@Terry-R is right. I botched the original try at it because I didn’t notice that the replace action didn’t put a space between the lines that were joined. I totally missed that–TWICE! Sorry about that. (Thanks to Terry for pointing that out!)
@Jesse-Mwangi said:
the regex brings all the data in one line
So the reason that it all ends up on one line is that you have a single blank space on the lines that I interpreted as being totally empty from your sample data. I figured this out by downloading your file (otherwise that would have been quite difficult to discover!).
@Jesse-Mwangi, so I presume that @guy038 's help has you on the right track now…please post again if not and we’ll take another look.