is there any add-on or regex or macro to split long text to sentence boundary
-
I am doing translation work that requires me to break long pages or paragraphs at sentence boundary, that is full-stop (.) for English, and Poorn-Viraam (।) for Hindi.
It is called "align"ing or (probably) tokenizing.
For Notepad++ (w9-32 bit), is there any add-on or regex or macro to split long text to sentences. (in plain text txt files).,
In regex, I can’t just find . and replace with .\r\n because . is also used as abbreviation indication, say, in i.e., e.g., pvt., ltd., inc., etc… so it would replace there also, that is not desirable. Also, if several sentences are there wihtin parentheses () [] {}, then also, these should not be broken.
So, I guess a single regex command just woundn’t do, or it will become too complex to take care of all the possibilities.
it has to be a set of regex executed one after another, or a macro,
or if someone has developed some add-on for that?
Thanks.
Rawat
-
oops! w9-32 bit -> w8-32 bit. sorry.
-
Did you try Google? See Python - RegEx for splitting text into sentences (sentence-tokenizing)
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login