Need Assistance Regarding a Find/Replace



  • Hello all,

    I have a huge list of erratically lined text (nearly 10k lines) which I copied from a PDF file that I am trying to make each entry wind up on a single line. A sample text is as follows:

    Jai Quickcall, an expert farmer. Has dancing
    brown eyes, curly black hair, a plain face, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children
    Will Amblehearth, a careless herder. Has dull blue
    eyes, cropped black hair, a smooth face, and a
    stout build. Reputed to be tolerant though narrowminded.
    Spouse: Zoreene. 2 children
    Evendur Blackbolt, a competent farmer. Has
    languid blue eyes, straight red hair, a full face, and
    a small build. Characterized as unfriendly and
    stingy. Spouse: Alestra. 1 child
    Sern Longsky, a capable farmer. Has wild brown
    eyes, cropped brown hair, a thin face, and a lanky
    build. Described as reserved and abrupt. Spouse:
    Erin. 1 child

    I have the regex necessary to match the beginning of the text, which is: ([A-Z])\w+\s+([A-Z])\w+,
    I can confirm this works as I built it on RegExr.
    What I want to do is whenever it finds any strings matching this format, to add in a newline before it, but retaining the existing text. so that the text then becomes:

    Jai Quickcall, an expert farmer. Has dancing
    brown eyes, curly black hair, a plain face, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children

    Will Amblehearth, a careless herder. Has dull blue
    eyes, cropped black hair, a smooth face, and a
    stout build. Reputed to be tolerant though narrowminded.
    Spouse: Zoreene. 2 children

    Evendur Blackbolt, a competent farmer. Has
    languid blue eyes, straight red hair, a full face, and
    a small build. Characterized as unfriendly and
    stingy. Spouse: Alestra. 1 child

    Sern Longsky, a capable farmer. Has wild brown
    eyes, cropped brown hair, a thin face, and a lanky
    build. Described as reserved and abrupt. Spouse:
    Erin. 1 child

    From there, I should be able to do a find\replace on the paragraphs to make them into this:

    Jai Quickcall, an expert farmer. Has dancing brown eyes, curly black hair, a plain face, and a willowy build. Reputed to be forthright and easygoing. Spouse: Lael. 2 children
    Will Amblehearth, a careless herder. Has dull blue eyes, cropped black hair, a smooth face, and a stout build. Reputed to be tolerant though narrowminded. Spouse: Zoreene. 2 children
    Evendur Blackbolt, a competent farmer. Has languid blue eyes, straight red hair, a full face, and a small build. Characterized as unfriendly and stingy. Spouse: Alestra. 1 child
    Sern Longsky, a capable farmer. Has wild brown eyes, cropped brown hair, a thin face, and a lanky build. Described as reserved and abrupt. Spouse: Erin. 1 child

    I performed a find\replace of Find: ([A-Z])\w+\s+([A-Z])\w+, Replace: [\r\n]+([A-Z])\w+\s+([A-Z])\w+, but all that did was give me this:
    [
    ]+[A-Z]w+s+[A-Z]w+, an expert farmer. Has dancing
    [
    ]+[A-Z]w+s+[A-Z]w+, curly [
    ]+[A-Z]w+s+[A-Z]w+, a [
    ]+[A-Z]w+s+[A-Z]w+, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children

    Any idea on how I can fix this, or even better, do it in less steps?



  • @Justin-E.-Miller said:

    A sample text is as follows

    Given the sample text I’d just replace all carriage returns/line feeds with a single space. Possibly after this I might want to replace all multiple spaces with a single space, just in case the original text had some extra spaces at line ends.

    So now all text runs together on 1 line (true even if you have word wrap enabled) Then use a regex to find the text that you want to end each paragraph and append a carriage return/line feed.

    Again from the sample, it would appear that the end of the paragraph starts with “Spouse” and may also contain “child” or “children” at the end.

    Not entirely sure it may be any less number of steps but perhaps each regex might be easier to create.

    Terry



  • That worked quite well on a small sample size. I will try that on a full document.



  • Edit:
    The sample size I had was fine because there were children listed in each of them. But the ones that do not have kids listed wind up not being able to move lines. That leaves 603 spread out throughout a very large document ( and I am repeating this for other sections) which I cannot do using Child/Children.



  • I got it.
    Find my requested string:
    ^([A-Z])\w+\s+([A-Z])\w+,

    Replace with:
    \n$0

    Making sure match case is set.
    This separates everything out so that it looks like this:
    Bethys Grimlaw, a capable trapper. Has wild blue
    eyes, straight blond hair, a plain face, and a lean
    build. Described as sympathetic though dull.
    Spouse: Senn. 1 child

    Laval Heldsheath, a talented herder. Has large
    blue eyes, wavy red hair, a smooth face, and a
    stocky build. Described as adventurous though
    stubborn. Spouse: Mirri. 2 children

    Jakov Buckhorn, a talented trapper. Has close-set
    brown eyes, dry brown hair, a sculpted face, and a
    willowy build. Characterized as insecure and
    critical. Spouse: Arveera.

    Gothar Longborn, a competent farmer. Has
    knowing blue eyes, brittle auburn hair, a
    blemished face, and a sinewy build. Said to be
    gentle though coy. Spouse: Shatha. 1 child

    Eliminates the issue with lines with no children.

    Do a find\replace of “\r\n” replacing with nothing, changing to extended search mode.
    This made 2500 lines how I needed it out of a document that was originally 9,982 lines long.


Log in to reply