Sentences and words
-
Hello, I have some paragraphs (one paragraph in each separate text file). The files are numbered sequentially as p1,p2,p3, … I wish to count the number of sentences in each paragraph in these files and the number of words in each of these sentences. For example, paragraph p1 might have 3 sentences (identified by full stop at the end of each sentence). An the wordCount in these 3 sentences might be 10, 23, 16. How to go about counting and listing the number of these sentences and words in each of these sentences? If it can be done individually for each text file (paragraph) then also its ok. Thanks.
-
I have some paragraphs (one paragraph in each separate text file)
Isn’t the number of “paragraphs” simply the number of files then?
I wish to count the number of sentences in each paragraph in these files
You could do it in ONE file like this (via the Count button), seeing the result on the status bar:
To do multiple files you would switch to the Find in Files tab and run the same search against multiple files and obtain your count in Search results:
How to go about counting…words in each of these sentences?
This would require writing a program.
-
You can get word counts right with npp’s View -> Summary command, but that might be too much busy-work if there are lots of files and/or you want fresh updates for files that are frequently changed.
Getting sentences right is not a trivial problem. You could examine your own data, and determine a reasonable average number words per sentence, and then calculate an estimated sentence count by dividing word count by that average.
But sentence counting is surely a problem that has been somewhat solved (ie uses sophisticated models to give pretty good statistical results) many times. Best you do a search and try to find purpose built tools.
<run of non-periods><period>
is not a great definition of a sentence due to false positives with decimal numbers.… among other things.
A plausible alternative might be
<period><space or newline or end-of-file>
but that also misfires, as with “Mr. Kilborn and Mr. Jones frequently comment in this community.”
Isn’t the number of “paragraphs” simply the number of files then?
OP did not ask for paragraph count.
-
Isn’t the number of “paragraphs” simply the number of files then?
OP did not ask for paragraph count.
Correct. I was merely commenting on that, not supplying any kind of “solution” for it.
<run of non-periods><period> is not a great definition of a sentence due to false positives with decimal numbers.… among other things.
We know this… I was wanting the OP to try the solution and point this out, at which time we would explain that they are pretty much asking for an impossible task…at least with what Notepad++ can do.
However, if the need is not for a 100% exact count, but possibly one that is somewhat close, simple algorithms might work.