Community
    • Login

    Breaking lines after full stops, not dots

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Viktoria OntapadoV
      Viktoria Ontapado
      last edited by

      Hi All,

      I hope the title makes sense.

      So I have a very long text and I’d like to display it sentence by sentence so every new line should contain only one of them.
      With most punctuation marks I have no problem, if my idea is right, I only need to use
      CTRL+A then CTRL+J (making the whole text a single line)
      and working with the following formulas:

      Find what: ?
      Replace with: ?\r\n

      or

      Find what: !
      Replace with: !\r\n

      with Extended search mode.

      But I have an issue with dots. The text have many many occurrences like Mr., Mrs., Ms., other abbreviations, dates which makes the above-mentioned process useless when it comes to most of the sentences.

      Is there any solution regarding this?

      It’s a very long text so searching for every possible instances of problematic words is not feasible not to mention I don’t even know all problematical terms beforehand.

      Thank you so much and have a nice day,
      Viktoria

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        @Viktoria-Ontapado said in Breaking lines after full stops, not dots:

        Is there any solution regarding this?

        Not in the general.

        However, if you are willing to change your workflow, and meet certain conditions, there may be a solution. (see below)

        -----
        First, an aside: your original solution won’t work for ? or ! if they are within quotes, such as

        I said, “No!”
        She asked, “Why not?”

        Further, the start of each line may have extra spaces at the beginning when there weren’t quotes involved.

        This is exciting! Really? Okay.
        EOT
        

        would have become

        This is exciting!
         Really?
         Okay.
        EOT
        

        with extra spaces before Really? and Okay.
        -----

        Back to your question.

        If you have text like the following, where, if there are already sentence-ending full-stops, they have two spaces after them (which once-upon-a-time was the standard):

        123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
        This is a sentence.  Are you sure, Dr. Somebody?  Yes, quite sure.
        This is a new line.
        This line talks to Mr. Rogers without anyone in the neighborhood.
        This is the end.
        

        Then instead of Ctrl+A Ctrl+J, start with a first replacement to change newlines (\r\n) to two spaces (\x20\x20: I use the hex notation for the space character, to make the number of space characters obvious) in Extended search mode (or Regular expression mode). For the second replacement, I would recommend using Regular expression search mode, with:

        • FIND = (\.(?=\x20\x20+)|[?!])\x20*
        • REPLACE = $1\r\n

        The outer parentheses are used to put either the . or ? or ! into group#1 (the $1 referenced in the REPLACE). The inner parentheses (?=\x20\x20+) will require at least two spaces after the dot, but won’t keep those spaces in the group#1; the vertical bar | is an alternation symbol (“or”); [?!] says "match one of either ? or !. The final \x20* matches zero or or more horizontal spaces that come after the period, question mark, or exclamation point. The example text becomes:

        123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
        This is a sentence.
        Are you sure, Dr. Somebody?
        Yes, quite sure.
        This is a new line.
        This line talks to Mr. Rogers without anyone in the neighborhood.
        This is the end.
        

        … which is, I think, what you want.

        Viktoria OntapadoV 1 Reply Last reply Reply Quote 2
        • Viktoria OntapadoV
          Viktoria Ontapado @PeterJones
          last edited by

          Thank you for your detailed answer @PeterJones, I really appreciate it.
          I reproduced your exact tip with success. Though your solution is exactly what I’d like to achieve, unfortunately my texts have no two spaces after full-stops, only one.

          So at the moment, I get this:

          123456789x123456789x123456789x123456789x123456789x123456789x123456789x.
          This is a sentence. Are you sure, Dr. Somebody?
          Yes, quite sure.
          This is a new line.
          This line talks to Mr. Rogers without anyone in the neighborhood.
          This is the end.
          

          Am I out of luck at this point?
          Can I manipulate the original text somehow to ‘closing’ my sentences with two spaces instead of one so I can use your implementation?

          PeterJonesP 1 Reply Last reply Reply Quote 1
          • PeterJonesP
            PeterJones @Viktoria Ontapado
            last edited by PeterJones

            @Viktoria-Ontapado said in Breaking lines after full stops, not dots:

            Am I out of luck at this point?

            Sorry, yes.

            There is no regex in existence that can differentiate between the period in “Like Harry Potter, I live on Privet Dr. Where do you live?” and “I talked with Dr. Where in the emergency room, and was boggled by her strange name” (and all the possible variants of abbreviations vs sentence ender confusions). That would take a very-well-trained language AI to differentiate – or a human brain.

            Basically, for the dots, you are going to have to look for “dot space”, and then manually decide on each whether to hit REPLACE with .\r\n or FIND NEXT to skip that instance. If you do it after the regex sequence I recommended, then at least some of the sentence-ending dots will already be at the end of the line (assuming some of your sentences ended on a line originally, so were replaced with dot-double-space).

            That’s the best I can do for you.

            Viktoria OntapadoV 1 Reply Last reply Reply Quote 2
            • Viktoria OntapadoV
              Viktoria Ontapado @PeterJones
              last edited by

              @PeterJones

              I see. Again, thank you for your explanation and assistance and have a nice weekend!

              1 Reply Last reply Reply Quote 2
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors