• Login
Community
  • Login

Need Assistance Regarding a Find/Replace

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regex
5 Posts 2 Posters 709 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    Justin E. Miller
    last edited by Jun 11, 2019, 2:44 AM

    Hello all,

    I have a huge list of erratically lined text (nearly 10k lines) which I copied from a PDF file that I am trying to make each entry wind up on a single line. A sample text is as follows:

    Jai Quickcall, an expert farmer. Has dancing
    brown eyes, curly black hair, a plain face, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children
    Will Amblehearth, a careless herder. Has dull blue
    eyes, cropped black hair, a smooth face, and a
    stout build. Reputed to be tolerant though narrowminded.
    Spouse: Zoreene. 2 children
    Evendur Blackbolt, a competent farmer. Has
    languid blue eyes, straight red hair, a full face, and
    a small build. Characterized as unfriendly and
    stingy. Spouse: Alestra. 1 child
    Sern Longsky, a capable farmer. Has wild brown
    eyes, cropped brown hair, a thin face, and a lanky
    build. Described as reserved and abrupt. Spouse:
    Erin. 1 child

    I have the regex necessary to match the beginning of the text, which is: ([A-Z])\w+\s+([A-Z])\w+,
    I can confirm this works as I built it on RegExr.
    What I want to do is whenever it finds any strings matching this format, to add in a newline before it, but retaining the existing text. so that the text then becomes:

    Jai Quickcall, an expert farmer. Has dancing
    brown eyes, curly black hair, a plain face, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children

    Will Amblehearth, a careless herder. Has dull blue
    eyes, cropped black hair, a smooth face, and a
    stout build. Reputed to be tolerant though narrowminded.
    Spouse: Zoreene. 2 children

    Evendur Blackbolt, a competent farmer. Has
    languid blue eyes, straight red hair, a full face, and
    a small build. Characterized as unfriendly and
    stingy. Spouse: Alestra. 1 child

    Sern Longsky, a capable farmer. Has wild brown
    eyes, cropped brown hair, a thin face, and a lanky
    build. Described as reserved and abrupt. Spouse:
    Erin. 1 child

    From there, I should be able to do a find\replace on the paragraphs to make them into this:

    Jai Quickcall, an expert farmer. Has dancing brown eyes, curly black hair, a plain face, and a willowy build. Reputed to be forthright and easygoing. Spouse: Lael. 2 children
    Will Amblehearth, a careless herder. Has dull blue eyes, cropped black hair, a smooth face, and a stout build. Reputed to be tolerant though narrowminded. Spouse: Zoreene. 2 children
    Evendur Blackbolt, a competent farmer. Has languid blue eyes, straight red hair, a full face, and a small build. Characterized as unfriendly and stingy. Spouse: Alestra. 1 child
    Sern Longsky, a capable farmer. Has wild brown eyes, cropped brown hair, a thin face, and a lanky build. Described as reserved and abrupt. Spouse: Erin. 1 child

    I performed a find\replace of Find: ([A-Z])\w+\s+([A-Z])\w+, Replace: [\r\n]+([A-Z])\w+\s+([A-Z])\w+, but all that did was give me this:
    [
    ]+[A-Z]w+s+[A-Z]w+, an expert farmer. Has dancing
    [
    ]+[A-Z]w+s+[A-Z]w+, curly [
    ]+[A-Z]w+s+[A-Z]w+, a [
    ]+[A-Z]w+s+[A-Z]w+, and a
    willowy build. Reputed to be forthright and easygoing.
    Spouse: Lael. 2 children

    Any idea on how I can fix this, or even better, do it in less steps?

    1 Reply Last reply Reply Quote 0
    • T
      Terry R
      last edited by Jun 11, 2019, 3:17 AM

      @Justin-E.-Miller said:

      A sample text is as follows

      Given the sample text I’d just replace all carriage returns/line feeds with a single space. Possibly after this I might want to replace all multiple spaces with a single space, just in case the original text had some extra spaces at line ends.

      So now all text runs together on 1 line (true even if you have word wrap enabled) Then use a regex to find the text that you want to end each paragraph and append a carriage return/line feed.

      Again from the sample, it would appear that the end of the paragraph starts with “Spouse” and may also contain “child” or “children” at the end.

      Not entirely sure it may be any less number of steps but perhaps each regex might be easier to create.

      Terry

      1 Reply Last reply Reply Quote 2
      • J
        Justin E. Miller
        last edited by Jun 11, 2019, 3:30 AM

        That worked quite well on a small sample size. I will try that on a full document.

        1 Reply Last reply Reply Quote 1
        • J
          Justin E. Miller
          last edited by Jun 11, 2019, 3:54 AM

          Edit:
          The sample size I had was fine because there were children listed in each of them. But the ones that do not have kids listed wind up not being able to move lines. That leaves 603 spread out throughout a very large document ( and I am repeating this for other sections) which I cannot do using Child/Children.

          1 Reply Last reply Reply Quote 0
          • J
            Justin E. Miller
            last edited by Jun 11, 2019, 4:15 AM

            I got it.
            Find my requested string:
            ^([A-Z])\w+\s+([A-Z])\w+,

            Replace with:
            \n$0

            Making sure match case is set.
            This separates everything out so that it looks like this:
            Bethys Grimlaw, a capable trapper. Has wild blue
            eyes, straight blond hair, a plain face, and a lean
            build. Described as sympathetic though dull.
            Spouse: Senn. 1 child

            Laval Heldsheath, a talented herder. Has large
            blue eyes, wavy red hair, a smooth face, and a
            stocky build. Described as adventurous though
            stubborn. Spouse: Mirri. 2 children

            Jakov Buckhorn, a talented trapper. Has close-set
            brown eyes, dry brown hair, a sculpted face, and a
            willowy build. Characterized as insecure and
            critical. Spouse: Arveera.

            Gothar Longborn, a competent farmer. Has
            knowing blue eyes, brittle auburn hair, a
            blemished face, and a sinewy build. Said to be
            gentle though coy. Spouse: Shatha. 1 child

            Eliminates the issue with lines with no children.

            Do a find\replace of “\r\n” replacing with nothing, changing to extended search mode.
            This made 2500 lines how I needed it out of a document that was originally 9,982 lines long.

            1 Reply Last reply Reply Quote 3
            3 out of 5
            • First post
              3/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors