SpeechPlugin.dll (text-to-speech) in Spanish language skips letters with accents/diacritics
-
Windows 10 Pro, Notepad++ v7.9.1 64bit. Notepad++/plugins/SpeechPlugin/SpeechPlugin.dll (0.2.2.0 Jim Xochellis)
Microsoft Narrator accessibility tool with Spanish-language voices Raul/Sabina works fine in Notepad++, but when using the same voices via text-to-speech plugin, accented letters are ignored. “áéíñóúü” is silent, “jabalí” is spoken as “jabal”.
So far I have tried changing Notepad++ localization and Windows 10 language settings so everything’s in Spanish, and various encoding formats & character sets (ANSI/UTF-8 etc). Any ideas? Thanks for reading.
-
I hadn’t remembered mentions of that plugin previously, and I didn’t see it in Plugins Admin. A search indicates that @Michael-Vincent updated the code to work with 64bit NPP a couple years ago, and I see that his github has v0.2.2.0, so I am guessing you got your copy from his fork.
If you are using his fork, he comes here frequently, so hopefully he can chime in with more knowledge of the plugin – both from the user perspective and the maintenance perspective.
My guess is that the plugin is just using Windows’ builtin speech API, so you (and the plugin) are limited to its requirements. It may be that the plugin is not passing the strings to the Windows API in the form it desires – my guess would be that a Windows API interface would require UCS-2-LE for unicode strings.
Given that the last release in the old sourceforge repo was in 2008, I think that may have been before the unicode version of Notepad++ (the sf repo only has one dll, rather than the two DLL that were used during the time that ANSI and UNICODE were separate). If my guess is correct, then maybe it’s using the SomeWin32ApiA() version of the call rather than SomeWin32ApiW() version of the call.
Hmm, a quick search of the main source code for 0.2.2.0 doesn’t show any
...A(...
or...W(...
calls, so maybe I’m wrong. Digging in more detail, SpeakDocument() grabs the text from Scintilla directly and passes it to SpeakText(). I don’t see any conversion done on that text, and the internal Scintilla objects normally store things as UTF-8 (except when saving/loading from disk). I wonder if there needs to be a translation from UTF-8 to UCS-2-LE inside the plugin before sending the text to the speech API.On the other hand, maybe I’ve misremembered, and if the document is stored as UCS-2-LE, that Scintilla will hand the plugin UCS-2-LE-encoded text when requested inside the plugin. You can test that by converting to UCS-2-LE, saving, and seeing if that improves the speech behavior. But given my experiments with the internals (I’ve never made a plugin, but I do access the the Scintilla editor contents with SendMessage calls, like a plugin would), I really think Scintilla uses UTF-8 in all message-communication.
If it really is that the speech API accepts only UCS-2-LE and if it is also true that currently the plugin is throwing UTF-8 at the API call, it will require a code change , which @Michael-Vincent may or may not be able to support. Hopefully, with all my @-mentions, he’ll chime in when he has a chance. :-)
(I am also quite curious if my analysis is correct. I don’t know anything about the speech interface (though a quick search while writing this paragraph leads me to see that it appears to use an old COM interface, and the ISpVoice class has docs here)
-
It appears @PeterJones knows more about “my” plugin than me! The full story is I thought it was a cool plugin, it had no x64 version, the author seemed to have abandoned it - so I forked it to GitHub, made a few VCXPROJ file changes to compile a 64-bit version and it worked! You can see by the commit history I’ve made no substantial changes to the code - I have no idea how the Speech API works.
@OwlLowell not sure I can be much help in updating the source as I’d (like you presumably) would have to learn the Speech API and to be honest, I don’t use the plugin - like many “dead” plugins I’ve forked and made 64-bit versions of. Can you run the tests @PeterJones recommends and report back?
Cheers.
-
It’s over my head but I do have few more clues:
-“Spanish” is a red herring, same results in English (ie Narrator “Café”, speechplugin “Caf”)
-Same in Windows 7, and portable 32bit NP++ (with SpeechPlugin_0_2_1_dll circa 2008, using Win10)
-Same after convert/save/reload file with encodings: ANSI/UTF-8/UTF-8 BOM/UCS-2 BE BOM/UCS-2 LE BOM
-Also tried saving files in various formats in Notepad/Wordpad, then opening them in NP++, no luckEven if this goes unresolved, a sincere thank you for chiming in and contributing to the excellent and noble Notepad++!