• Login
Community
  • Login

is there any add-on or regex or macro to split long text to sentence boundary

Scheduled Pinned Locked Moved General Discussion
3 Posts 2 Posters 2.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    V S Rawat
    last edited by Apr 22, 2017, 7:11 AM

    I am doing translation work that requires me to break long pages or paragraphs at sentence boundary, that is full-stop (.) for English, and Poorn-Viraam (।) for Hindi.

    It is called "align"ing or (probably) tokenizing.

    For Notepad++ (w9-32 bit), is there any add-on or regex or macro to split long text to sentences. (in plain text txt files).,

    In regex, I can’t just find . and replace with .\r\n because . is also used as abbreviation indication, say, in i.e., e.g., pvt., ltd., inc., etc… so it would replace there also, that is not desirable. Also, if several sentences are there wihtin parentheses () [] {}, then also, these should not be broken.

    So, I guess a single regex command just woundn’t do, or it will become too complex to take care of all the possibilities.

    it has to be a set of regex executed one after another, or a macro,

    or if someone has developed some add-on for that?

    Thanks.

    Rawat

    1 Reply Last reply Reply Quote 0
    • V
      V S Rawat
      last edited by Apr 22, 2017, 7:12 AM

      oops! w9-32 bit -> w8-32 bit. sorry.

      1 Reply Last reply Reply Quote 0
      • P
        Per Isakson
        last edited by Apr 22, 2017, 5:59 PM

        Did you try Google? See Python - RegEx for splitting text into sentences (sentence-tokenizing)

        1 Reply Last reply Reply Quote 0
        2 out of 3
        • First post
          2/3
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors