Breaking lines after full stops, not dots



  • Hi All,

    I hope the title makes sense.

    So I have a very long text and I’d like to display it sentence by sentence so every new line should contain only one of them.
    With most punctuation marks I have no problem, if my idea is right, I only need to use
    CTRL+A then CTRL+J (making the whole text a single line)
    and working with the following formulas:

    Find what: ?
    Replace with: ?\r\n

    or

    Find what: !
    Replace with: !\r\n

    with Extended search mode.

    But I have an issue with dots. The text have many many occurrences like Mr., Mrs., Ms., other abbreviations, dates which makes the above-mentioned process useless when it comes to most of the sentences.

    Is there any solution regarding this?

    It’s a very long text so searching for every possible instances of problematic words is not feasible not to mention I don’t even know all problematical terms beforehand.

    Thank you so much and have a nice day,
    Viktoria



  • @Viktoria-Ontapado said in Breaking lines after full stops, not dots:

    Is there any solution regarding this?

    Not in the general.

    However, if you are willing to change your workflow, and meet certain conditions, there may be a solution. (see below)

    -----
    First, an aside: your original solution won’t work for ? or ! if they are within quotes, such as

    I said, “No!”
    She asked, “Why not?”

    Further, the start of each line may have extra spaces at the beginning when there weren’t quotes involved.

    This is exciting! Really? Okay.
    EOT
    

    would have become

    This is exciting!
     Really?
     Okay.
    EOT
    

    with extra spaces before Really? and Okay.
    -----

    Back to your question.

    If you have text like the following, where, if there are already sentence-ending full-stops, they have two spaces after them (which once-upon-a-time was the standard):

    123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
    This is a sentence.  Are you sure, Dr. Somebody?  Yes, quite sure.
    This is a new line.
    This line talks to Mr. Rogers without anyone in the neighborhood.
    This is the end.
    

    Then instead of Ctrl+A Ctrl+J, start with a first replacement to change newlines (\r\n) to two spaces (\x20\x20: I use the hex notation for the space character, to make the number of space characters obvious) in Extended search mode (or Regular expression mode). For the second replacement, I would recommend using Regular expression search mode, with:

    • FIND = (\.(?=\x20\x20+)|[?!])\x20*
    • REPLACE = $1\r\n

    The outer parentheses are used to put either the . or ? or ! into group#1 (the $1 referenced in the REPLACE). The inner parentheses (?=\x20\x20+) will require at least two spaces after the dot, but won’t keep those spaces in the group#1; the vertical bar | is an alternation symbol (“or”); [?!] says "match one of either ? or !. The final \x20* matches zero or or more horizontal spaces that come after the period, question mark, or exclamation point. The example text becomes:

    123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
    This is a sentence.
    Are you sure, Dr. Somebody?
    Yes, quite sure.
    This is a new line.
    This line talks to Mr. Rogers without anyone in the neighborhood.
    This is the end.
    

    … which is, I think, what you want.



  • Thank you for your detailed answer @PeterJones, I really appreciate it.
    I reproduced your exact tip with success. Though your solution is exactly what I’d like to achieve, unfortunately my texts have no two spaces after full-stops, only one.

    So at the moment, I get this:

    123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
    This is a sentence. Are you sure, Dr. Somebody?
    Yes, quite sure.
    This is a new line.
    This line talks to Mr. Rogers without anyone in the neighborhood.
    This is the end.
    

    Am I out of luck at this point?
    Can I manipulate the original text somehow to ‘closing’ my sentences with two spaces instead of one so I can use your implementation?



  • @Viktoria-Ontapado said in Breaking lines after full stops, not dots:

    Am I out of luck at this point?

    Sorry, yes.

    There is no regex in existence that can differentiate between the period in “Like Harry Potter, I live on Privet Dr. Where do you live?” and “I talked with Dr. Where in the emergency room, and was boggled by her strange name” (and all the possible variants of abbreviations vs sentence ender confusions). That would take a very-well-trained language AI to differentiate – or a human brain.

    Basically, for the dots, you are going to have to look for “dot space”, and then manually decide on each whether to hit REPLACE with .\r\n or FIND NEXT to skip that instance. If you do it after the regex sequence I recommended, then at least some of the sentence-ending dots will already be at the end of the line (assuming some of your sentences ended on a line originally, so were replaced with dot-double-space).

    That’s the best I can do for you.



  • @PeterJones

    I see. Again, thank you for your explanation and assistance and have a nice weekend!


Log in to reply