Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    one sentence per line

    Help wanted · · · – – – · · ·
    3
    5
    564
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dragoon 35
      Dragoon 35 last edited by

      I have a .txt file win the following format:

      [Line 1]Health professionals are expected to undertake audit and
      [Line 2]service evaluation as part of quality assurance. These usually
      [Line 3]involve minimal additional risk, burden or intrusion for
      [Line 4]participants. It is important to determine at an early stage
      [Line 5]whether a project is audit or research, and sometimes that
      [Line 6]is not as easy as it seems. The decision will determine the
      [Line 7]framework in which the study is undertaken.

      How I want is, in [Line 1], one complete sentence(till ‘.’ character)
      Is there any way to automate it(text file has 119043 characters; so it will take a long time to do so manually)

      Thanks

      1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones last edited by

        @Dragoon-35 ,

        1. Ctrl+A (Edit > Select All)
        2. Ctrl+J (Edit > Line Operations > Join Lines)
          • at this point, you should have one giant paragraph
        3. Ctrl+H (Search > Replace)
          • Find What = (?-s)\.\h+
          • Replace With = .\r\n
          • Search Mode = Regular Expression
        4. Click Replace All

        Given the exact data:

        Health professionals are expected to undertake audit and
        service evaluation as part of quality assurance. These usually
        involve minimal additional risk, burden or intrusion for
        participants. It is important to determine at an early stage
        whether a project is audit or research, and sometimes that
        is not as easy as it seems. The decision will determine the
        framework in which the study is undertaken.
        

        That will result in

        Health professionals are expected to undertake audit and service evaluation as part of quality assurance.
        These usually involve minimal additional risk, burden or intrusion for participants.
        It is important to determine at an early stage whether a project is audit or research, and sometimes that is not as easy as it seems.
        The decision will determine the framework in which the study is undertaken.
        

        Assumptions:

        • [Line #] isn’t actually part of the text
        • All your sentences end with ., and none end in ? or ! or ." or other such endings
        • You don’t have any other period-space instances in your file.
          • Having text like Dr. Bob is a surgeon. will mess up the algorithm, because the . between Dr and Bob will be interpreted as a sentence-ender

        If this isn’t sufficient for your needs, you will have to clarify what you really want. Please read the advice below.

        –
        Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All data / example text should be marked up as plaintext using the </> toolbar button or manual Markdown syntax; screenshots can be pasted in natively using Ctrl+V when you have the image in your clipboard. Show the data you have; show the regex you tried, and why you thought it should work; show what you get, and compare it to what you wanted to get; make sure to include examples of things that should match and be transformed, and things that don’t match and should be left alone. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ.
        We sometimes vent our frustration when all particular user does is demand the answer be given to them after many changes of requirements, or comes back time and again for new “gimme” requests, without showing any effort. But if you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

        1 Reply Last reply Reply Quote 2
        • Dragoon 35
          Dragoon 35 last edited by

          @PeterJones
          Thanks. This works just fine

          1 Reply Last reply Reply Quote 0
          • guy038
            guy038 last edited by

            Hello, @dragoon-35, @peterjones and All,

            Just an alternate regex solution, which does not need any prior line operations, before the S/R :

            SEARCH \.\h*(?!\R)|(?<!\.)(\r\n)

            REPLACE ?1\x20:.\r\n

            Notes :

            • This regex searches for :

              • Any literal . , followed with possible blank chars,  ONLY IF NOT followed with a line-break

              • A line-break  ONLY IF NOT preceded with a literal dot .

            • In replacement :

              • In case of the first alternative, as group 1 is NOT defined, the ELSE part is used and a dot . is rewritten, followed with a line-break

              • In case of the second alternative, as group 1 is defined, the THEN part is used and the line-break is replaced with a space character


            Remark :

            You may have been intrigued by the syntax (\r\n), which, obviously, could have been simplified to (\R) !

            Well, let’s use this 2-lines data, in a new tab, with no line-break after word End

            This is a test.
            End
            

            We’ll just use the second alternative of the search regex, with the \R syntax and the corresponding replacement part

            SEARCH (?<!\.)\R

            REPLACE \x20

            As line 1 does end with a dot, this regex should not match, against this text. However, one replacement does occur and we get :

            This is a test.
             End
            

            Note that the line 1 ends with the CR character, only and line 2 begins with a space char. WHY ? Well, the two-chars CR-LF, indeed, are preceded with a dot and so, does not satisfy the regex. However, when the regex engine move one position, on the right, the LF char, which matches the regex \R too, is preceded by the CR, which is not a dot symbol and, then, satisfies the regex. So, the LF is replaced with a space character !

            Now, if this regex is changed, as below :

            SEARCH (?<!\.)\r\n

            REPLACE \x20

            This time, there no more ambiguity and no match at all occurs against our small piece of text !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • Dragoon 35
              Dragoon 35 last edited by

              Thank you @guy038

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              Copyright © 2014 NodeBB Forums | Contributors