Need Assistance Regarding a Find/Replace

Justin E. Miller

Hello all,

I have a huge list of erratically lined text (nearly 10k lines) which I copied from a PDF file that I am trying to make each entry wind up on a single line. A sample text is as follows:

Jai Quickcall, an expert farmer. Has dancing
brown eyes, curly black hair, a plain face, and a
willowy build. Reputed to be forthright and easygoing.
Spouse: Lael. 2 children
Will Amblehearth, a careless herder. Has dull blue
eyes, cropped black hair, a smooth face, and a
stout build. Reputed to be tolerant though narrowminded.
Spouse: Zoreene. 2 children
Evendur Blackbolt, a competent farmer. Has
languid blue eyes, straight red hair, a full face, and
a small build. Characterized as unfriendly and
stingy. Spouse: Alestra. 1 child
Sern Longsky, a capable farmer. Has wild brown
eyes, cropped brown hair, a thin face, and a lanky
build. Described as reserved and abrupt. Spouse:
Erin. 1 child

I have the regex necessary to match the beginning of the text, which is: ([A-Z])\w+\s+([A-Z])\w+,
I can confirm this works as I built it on RegExr.
What I want to do is whenever it finds any strings matching this format, to add in a newline before it, but retaining the existing text. so that the text then becomes:

Jai Quickcall, an expert farmer. Has dancing
brown eyes, curly black hair, a plain face, and a
willowy build. Reputed to be forthright and easygoing.
Spouse: Lael. 2 children

Will Amblehearth, a careless herder. Has dull blue
eyes, cropped black hair, a smooth face, and a
stout build. Reputed to be tolerant though narrowminded.
Spouse: Zoreene. 2 children

Evendur Blackbolt, a competent farmer. Has
languid blue eyes, straight red hair, a full face, and
a small build. Characterized as unfriendly and
stingy. Spouse: Alestra. 1 child

Sern Longsky, a capable farmer. Has wild brown
eyes, cropped brown hair, a thin face, and a lanky
build. Described as reserved and abrupt. Spouse:
Erin. 1 child

From there, I should be able to do a find\replace on the paragraphs to make them into this:

Jai Quickcall, an expert farmer. Has dancing brown eyes, curly black hair, a plain face, and a willowy build. Reputed to be forthright and easygoing. Spouse: Lael. 2 children
Will Amblehearth, a careless herder. Has dull blue eyes, cropped black hair, a smooth face, and a stout build. Reputed to be tolerant though narrowminded. Spouse: Zoreene. 2 children
Evendur Blackbolt, a competent farmer. Has languid blue eyes, straight red hair, a full face, and a small build. Characterized as unfriendly and stingy. Spouse: Alestra. 1 child
Sern Longsky, a capable farmer. Has wild brown eyes, cropped brown hair, a thin face, and a lanky build. Described as reserved and abrupt. Spouse: Erin. 1 child

I performed a find\replace of Find: ([A-Z])\w+\s+([A-Z])\w+, Replace: [\r\n]+([A-Z])\w+\s+([A-Z])\w+, but all that did was give me this:
[
]+[A-Z]w+s+[A-Z]w+, an expert farmer. Has dancing
[
]+[A-Z]w+s+[A-Z]w+, curly [
]+[A-Z]w+s+[A-Z]w+, a [
]+[A-Z]w+s+[A-Z]w+, and a
willowy build. Reputed to be forthright and easygoing.
Spouse: Lael. 2 children

Any idea on how I can fix this, or even better, do it in less steps?

Terry R

@Justin-E.-Miller said:

A sample text is as follows

Given the sample text I’d just replace all carriage returns/line feeds with a single space. Possibly after this I might want to replace all multiple spaces with a single space, just in case the original text had some extra spaces at line ends.

So now all text runs together on 1 line (true even if you have word wrap enabled) Then use a regex to find the text that you want to end each paragraph and append a carriage return/line feed.

Again from the sample, it would appear that the end of the paragraph starts with “Spouse” and may also contain “child” or “children” at the end.

Not entirely sure it may be any less number of steps but perhaps each regex might be easier to create.

Terry

Justin E. Miller

That worked quite well on a small sample size. I will try that on a full document.

Justin E. Miller

Edit:
The sample size I had was fine because there were children listed in each of them. But the ones that do not have kids listed wind up not being able to move lines. That leaves 603 spread out throughout a very large document ( and I am repeating this for other sections) which I cannot do using Child/Children.

Justin E. Miller

I got it.
Find my requested string:
^([A-Z])\w+\s+([A-Z])\w+,

Replace with:
\n$0

Making sure match case is set.
This separates everything out so that it looks like this:
Bethys Grimlaw, a capable trapper. Has wild blue
eyes, straight blond hair, a plain face, and a lean
build. Described as sympathetic though dull.
Spouse: Senn. 1 child

Laval Heldsheath, a talented herder. Has large
blue eyes, wavy red hair, a smooth face, and a
stocky build. Described as adventurous though
stubborn. Spouse: Mirri. 2 children

Jakov Buckhorn, a talented trapper. Has close-set
brown eyes, dry brown hair, a sculpted face, and a
willowy build. Characterized as insecure and
critical. Spouse: Arveera.

Gothar Longborn, a competent farmer. Has
knowing blue eyes, brittle auburn hair, a
blemished face, and a sinewy build. Said to be
gentle though coy. Spouse: Shatha. 1 child

Eliminates the issue with lines with no children.

Do a find\replace of “\r\n” replacing with nothing, changing to extended search mode.
This made 2500 lines how I needed it out of a document that was originally 9,982 lines long.