Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Need Assistance Regarding a Find/Replace

    Help wanted · · · – – – · · ·
    regex
    2
    5
    482
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Justin E. Miller
      Justin E. Miller last edited by

      Hello all,

      I have a huge list of erratically lined text (nearly 10k lines) which I copied from a PDF file that I am trying to make each entry wind up on a single line. A sample text is as follows:

      Jai Quickcall, an expert farmer. Has dancing
      brown eyes, curly black hair, a plain face, and a
      willowy build. Reputed to be forthright and easygoing.
      Spouse: Lael. 2 children
      Will Amblehearth, a careless herder. Has dull blue
      eyes, cropped black hair, a smooth face, and a
      stout build. Reputed to be tolerant though narrowminded.
      Spouse: Zoreene. 2 children
      Evendur Blackbolt, a competent farmer. Has
      languid blue eyes, straight red hair, a full face, and
      a small build. Characterized as unfriendly and
      stingy. Spouse: Alestra. 1 child
      Sern Longsky, a capable farmer. Has wild brown
      eyes, cropped brown hair, a thin face, and a lanky
      build. Described as reserved and abrupt. Spouse:
      Erin. 1 child

      I have the regex necessary to match the beginning of the text, which is: ([A-Z])\w+\s+([A-Z])\w+,
      I can confirm this works as I built it on RegExr.
      What I want to do is whenever it finds any strings matching this format, to add in a newline before it, but retaining the existing text. so that the text then becomes:

      Jai Quickcall, an expert farmer. Has dancing
      brown eyes, curly black hair, a plain face, and a
      willowy build. Reputed to be forthright and easygoing.
      Spouse: Lael. 2 children

      Will Amblehearth, a careless herder. Has dull blue
      eyes, cropped black hair, a smooth face, and a
      stout build. Reputed to be tolerant though narrowminded.
      Spouse: Zoreene. 2 children

      Evendur Blackbolt, a competent farmer. Has
      languid blue eyes, straight red hair, a full face, and
      a small build. Characterized as unfriendly and
      stingy. Spouse: Alestra. 1 child

      Sern Longsky, a capable farmer. Has wild brown
      eyes, cropped brown hair, a thin face, and a lanky
      build. Described as reserved and abrupt. Spouse:
      Erin. 1 child

      From there, I should be able to do a find\replace on the paragraphs to make them into this:

      Jai Quickcall, an expert farmer. Has dancing brown eyes, curly black hair, a plain face, and a willowy build. Reputed to be forthright and easygoing. Spouse: Lael. 2 children
      Will Amblehearth, a careless herder. Has dull blue eyes, cropped black hair, a smooth face, and a stout build. Reputed to be tolerant though narrowminded. Spouse: Zoreene. 2 children
      Evendur Blackbolt, a competent farmer. Has languid blue eyes, straight red hair, a full face, and a small build. Characterized as unfriendly and stingy. Spouse: Alestra. 1 child
      Sern Longsky, a capable farmer. Has wild brown eyes, cropped brown hair, a thin face, and a lanky build. Described as reserved and abrupt. Spouse: Erin. 1 child

      I performed a find\replace of Find: ([A-Z])\w+\s+([A-Z])\w+, Replace: [\r\n]+([A-Z])\w+\s+([A-Z])\w+, but all that did was give me this:
      [
      ]+[A-Z]w+s+[A-Z]w+, an expert farmer. Has dancing
      [
      ]+[A-Z]w+s+[A-Z]w+, curly [
      ]+[A-Z]w+s+[A-Z]w+, a [
      ]+[A-Z]w+s+[A-Z]w+, and a
      willowy build. Reputed to be forthright and easygoing.
      Spouse: Lael. 2 children

      Any idea on how I can fix this, or even better, do it in less steps?

      1 Reply Last reply Reply Quote 0
      • Terry R
        Terry R last edited by

        @Justin-E.-Miller said:

        A sample text is as follows

        Given the sample text I’d just replace all carriage returns/line feeds with a single space. Possibly after this I might want to replace all multiple spaces with a single space, just in case the original text had some extra spaces at line ends.

        So now all text runs together on 1 line (true even if you have word wrap enabled) Then use a regex to find the text that you want to end each paragraph and append a carriage return/line feed.

        Again from the sample, it would appear that the end of the paragraph starts with “Spouse” and may also contain “child” or “children” at the end.

        Not entirely sure it may be any less number of steps but perhaps each regex might be easier to create.

        Terry

        1 Reply Last reply Reply Quote 2
        • Justin E. Miller
          Justin E. Miller last edited by

          That worked quite well on a small sample size. I will try that on a full document.

          1 Reply Last reply Reply Quote 1
          • Justin E. Miller
            Justin E. Miller last edited by

            Edit:
            The sample size I had was fine because there were children listed in each of them. But the ones that do not have kids listed wind up not being able to move lines. That leaves 603 spread out throughout a very large document ( and I am repeating this for other sections) which I cannot do using Child/Children.

            1 Reply Last reply Reply Quote 0
            • Justin E. Miller
              Justin E. Miller last edited by

              I got it.
              Find my requested string:
              ^([A-Z])\w+\s+([A-Z])\w+,

              Replace with:
              \n$0

              Making sure match case is set.
              This separates everything out so that it looks like this:
              Bethys Grimlaw, a capable trapper. Has wild blue
              eyes, straight blond hair, a plain face, and a lean
              build. Described as sympathetic though dull.
              Spouse: Senn. 1 child

              Laval Heldsheath, a talented herder. Has large
              blue eyes, wavy red hair, a smooth face, and a
              stocky build. Described as adventurous though
              stubborn. Spouse: Mirri. 2 children

              Jakov Buckhorn, a talented trapper. Has close-set
              brown eyes, dry brown hair, a sculpted face, and a
              willowy build. Characterized as insecure and
              critical. Spouse: Arveera.

              Gothar Longborn, a competent farmer. Has
              knowing blue eyes, brittle auburn hair, a
              blemished face, and a sinewy build. Said to be
              gentle though coy. Spouse: Shatha. 1 child

              Eliminates the issue with lines with no children.

              Do a find\replace of “\r\n” replacing with nothing, changing to extended search mode.
              This made 2500 lines how I needed it out of a document that was originally 9,982 lines long.

              1 Reply Last reply Reply Quote 3
              • First post
                Last post
              Copyright © 2014 NodeBB Forums | Contributors