Community
    • Login

    Wordcount splitting words on apostrophes

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 4 Posters 940 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stewart BakerS
      Stewart Baker
      last edited by

      Hello!

      My Notepad++ has been giving me weird results for wordcount for a while now. I finally figured out today that the issue is apostrophes–the word counter seems to think that apostrophes are whitespace. I have a document with only the word it’s in it, and the program says there are two words instead of one.

      Is there any setting I can change to fix this, or should I file an issue in GitHub? I’m not sure if it’s just doing this because I messed up a setting somewhere or what. I just updated to the latest version in case that was the problem, but I am still seeing this.

      Thanks!

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Stewart Baker
        last edited by

        @Stewart-Baker

        It does this because it doesn’t consider an apostrophe to be a “word character”. You can certainly do a github issue if you’d like to.

        You might try this as an alternate method for obtaining word count:

        Pull up the Find window and do a regular expression search mode search for \S+ by pressing the Count button. Here’s a demo of it counting your it's as 1 word instead of 2:

        d997d2bf-25c0-4b43-ac32-7cdba1ae46c4-image.png

        BTW, \S+ may not be appropriate in all instances. What it means is: “Consider a match to be the longest string you can find between traditional whitespace characters”.

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @stewart-baker, @alan-kilborn and *All,

          Here is an alternative to the @alan-kilborn’s solution :

          Open the Find dialog ( Ctrl + F )

          • SEARCH [\w'’]+

          • Tick the Wrap around option

          • Un-tick all the squared box options

          • Select the Regular expression serch mode

          • Click on the Count button or use the default Alt + T shortcut


          The regex [\w'’] forces the regex engine to consider the two Unicode characters APOSTROPHE ' ( \x{0027} ) and RIGHT SINGLE QUOTATION MARK ’ ( \x{2019} ) as word char, as well !

          In addition, you may feel interesting to have a look to this other post of mime, about the Summary feature, especially the first part :

          https://community.notepad-plus-plus.org/post/59069

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn
            last edited by

            Just for reference the “word count” function (found via View menu > Summary… and then looking at Words: in the output) in Notepad++ uses this regular expression to determine what is a word:

            [^\x20\t\\.,;:!?()+\r\n\-\*/=\]\[{}&~"'`|@$%<>\^]+

            Note that because this is a regex of the form [^...] that this list of characters is saying what is NOT a word character rather than what IS a word character.

            So we basically have these characters which will terminate counting some bit of text as a “word”:

            \x20\t\\.,;:!?()+\r\n\-\*/=\]\[{}&~"'`|@$%<>\^

            I took some liberties with the \x20 and the \t by changing them from literal space and tab character (so that they are more easily seen).

            Anyway, I see an apostrophe in there, so that’s what is causing it's to be counted as two words.

            But, how does this handle UTF-8 characters?
            If we copy the “it’s” from the OP (the bolded “it’s”) and paste it into a Notepad++ tab and then run the “word count” function, we see that it shows Words: 1. Success? Yes, but really No. :-(
            This is a UTF-8 special apostrophe, and it isn’t accounted for in Notepad++'s expression for what is not a word character.

            Perhaps the conclusion to be drawn is that Notepad++ is not a great counter of words. :-)

            Ken HK 1 Reply Last reply Reply Quote 2
            • Stewart BakerS
              Stewart Baker
              last edited by Stewart Baker

              Thanks for all the comments!

              Perhaps the conclusion to be drawn is that Notepad++ is not a great counter of words. :-)

              This does, alas, seem to be the conclusion… :) I will live!

              (ETA: I didn’t realize it had turned my non-smart apostrophes into the special ones in the forum post. In the text files, I just use standard apostrophes. Oh well!)

              1 Reply Last reply Reply Quote 3
              • Ken HK
                Ken H @Alan Kilborn
                last edited by

                @Alan-Kilborn For analogous reasons, I just found that when the curly apostrophe is used in “JOHN’S CAR”, then Edit > Convert Case to > Proper Case, the result will be “John’S Car”, which I can understand and deal with, but I was initially surprised that the possessive s was still upper case.

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @Ken H
                  last edited by

                  @Ken-H said in Wordcount splitting words on apostrophes:

                  found that when the curly apostrophe is used in “JOHN’S CAR”, then Edit > Convert Case to > Proper Case, the result will be “John’S Car”

                  Sounds like a bug to me.
                  Feel free to report it; info on doing that is HERE.

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @ken-h, @stewart-baker, @alan-kilborn and All,

                    Refer also to my post here

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors