|
About a year ago, the Microsoft
Research Speech Techology Group released the long - awaited SAPI 5.0
SDK. It was a big hit with developers -- C++ developers, that is! The
gaping omission was a COM interface to the runtime so that VB and other
COM - compliant programming environments could use it. Obviously, there
was a huge outcry from the VB development community, which has grown quite
large, and Microsoft promised that SAPI 5.1 which would have a more or
less complete COM interface. It got so bad that some enterprising developers
wrote their own COM wrappers in the interim.
Well, Microsoft delivered, and SAPI
5.1 provides everything they promised, and even more - there are even
Interop assembly interfaces to allow programming against the SDK with
C# and VB.NET. In addition, Microsoft recently presented developer-attendees
at the Los Angeles PDC 2001 with the first beta of the .NET
SAPI SDK. This marks the evolution of Microsoft's long dominant position
in speech technology, and signals that Microsoft is committed to providing
speech technology interfaces for its programming environments for the
forseeable future.
I've been involved with SAPI, off and on, since version
3.0 (that's ancient history, by Internet Standard Time). I saw the SAPI
interface, particularly combined with the ease of VB programming, as a
way to explore the creation of new and useful ways to interact with the
PC. Not only because I've always been fascinated by voice technologies,
but also because my son, Andrew, who is now 13, has autism. If you know
anything about autism and related disorders, then you know that often
the speech-language processing centers of the brain in autistic individuals
don't develop at the same rate as other areas of the brain. The result
can often be language / communication related learning disabilities. Many
autistics never speak at all. In Andrew's case, he can talk just fine,
although his pronunciation especially with "L's" and "R's"
still needs some work.. Yet it's difficult for him to piece together more
than a few words at a time. This obviously is one of the major causes
of social distress, as he'd love to play more with the rest of the kids,
but he's often unable to express himself properly, and as a result, some
kids who haven't had this explained to them may think he's "strange".
I think this is partly a biochemical thing - we've noticed that if he
is sick or very excited, that suddenly the connections in the brain seem
to improve and he's able to express himself better with language.
There are so many unique applications of speech technology
to computing, I can't even begin to describe them. Not just for people
with disabilities - but for everyone. I think the easiest way to get involved
with this technology is to go ahead and download the SAPI 5.0 SDK , fire
up some of the samples, and you'll be hooked!
The first thing you'll notice is that the quality of
the speech recognition engine - even before "training" - is
absolutely FIRST RATE. This means that you can dictate your favorite letters,
memos and so on into your favorite programming interface and be assured
that you won't have to make many corrections later. Another thing I found
is that you can pretty much talk as fast as you want - and although there
may be a delay while the engine sorts out the context and comes up with
its final result, it won't miss a beat.
The second thing you'll notice is how easy it is to use.
In Visual Basic, you basically just set a reference to the Microsoft Speech
Object Library, which contains all of the COM interfaces:
To use the Text to Speech interface, we would use the
following code:
Dim Voice As SpVoice
Set Voice = New SpVoice
Voice.Speak "Howdy!" ,SVSFlagsAsync
I don't think it could get much easier!
And to use the Speech Recognition classes,
we would do the following:
Dim WithEvents RecoContext As SpSharedRecoContext
Dim Grammar As ISpeechRecoGrammar
Set RecoContext = New SpSharedRecoContext
Set Grammar = RecoContext.CreateGrammar(1)
Grammar.DictationLoad
Grammar.DictationSetState SGDSActive
' the following is the event
handler for the recognition event...
Private Sub RecoContext_Recognition(ByVal StreamNumber As Long, _
ByVal StreamPosition As Variant, _
ByVal RecognitionType As SpeechRecognitionType, _
ByVal Result As ISpeechRecoResult _
)
Dim strText As String
' put
strText in a TextBox, or whatever..
strText = Result.PhraseInfo.GetText.
End Sub
The above is for the built-in dictation grammar, which
of course can be trained to your voice and speech inflection. (You may
be surprised at how well it performs with no training at all!)
Specific, command and control grammars are all now done
in XML format. Here is a sample grammar I wrote so my kid can play with
a Math Drill program I wrote for him in VB:
<GRAMMAR>
<RULE ID="1" Name="number" TOPLEVEL="ACTIVE">
<L PROPNAME="number">
<P VALSTR="ADD">add</P>
<P VALSTR="SUBTRACT">subtract</P>
<P VALSTR="MULTIPLY">multiply</P>
<P VALSTR="DIVIDE">divide</P>
<P VALSTR="QUIT">QUIT</P>
<P VALSTR="OK">OK</P>
<P VALSTR="NEW">NEW</P>
<P VAL="0">zero</P>
<P VAL="1">one</P>
<P VAL="2">two</P>
<P VAL="3">three</P>
<P VAL="4">four</P>
<P VAL="5">five</P>
<P VAL="6">six</P>
<P VAL="7">seven</P>
<P VAL="8">eight</P>
<P VAL="9">nine</P>
<P VAL="10">ten</P>
<P VAL="11">eleven</P>
<P VAL="12">twelve</P>
<P VAL="13">thirteen</P>
<P VAL="14">fourteen</P>
<P VAL="15">fifteen</P>
<P VAL="16">sixteen</P>
<P VAL="17">seventeen</P>
<P VAL="18">eighteen</P>
<P VAL="19">nineteen</P>
<P VAL="20">twenty</P>
<P VAL="21">twenty one</P>
<P VAL="22">twenty two</P>
<P VAL="23">twenty three</P>
<P VAL="24">twenty four</P>
<P VAL="25">twenty five</P>
<P VAL="26">twenty six</P>
<P VAL="27">twenty seven</P>
<P VAL="28">twenty eight</P>
<P VAL="29">twenty nine</P>
<P VAL="30">thirty</P>
</L>
</RULE>
</GRAMMAR>
What we do here is set the Grammar
active:
Private
Sub initialize()
Set Voice = New SpVoice
If (RecoContext Is Nothing) Then
Debug.Print "Initializing SAPI reco context object..."
Set RecoContext = New SpSharedRecoContext
Set Grammar = RecoContext.CreateGrammar(1)
Grammar.CmdLoadFromFile App.Path & "\newnums.xml", SLOStatic
Grammar.DictationSetState SGDSInactive
Grammar.CmdSetRuleIdState
1, SGDSActive
End If
End Sub
What happens is that we've set recognition
to occur only if a specific item in the properties for the RULE "number"
occurs. We can then determine which property was matched with the following
code in the event routine:
strText = Result.PhraseInfo.GetText(0,
-1, True) ' what they said
strNumber = Result.PhraseInfo.Properties(0).Value ' which property in
the rule was matched
The
above is the most rudimentary grammar; XML Grammars can get much more
sophisticated. We can program in rules that enable us to follow the flow
of conversation that would occur in, for example, making an airline and
rental car ticket reservation, selecting a flight, a time, a carrier,
and much more.
With the code snippets above, you basically have about
90% of all the coding you need to know to create virtually any kind of
speech - enabled application! You can match utterances against a database,
you can use it with SQL Server English Query, you can create sophisticated
command and control applications that let you check and send email, run
specific programs, dictate a letter, or, as in my case, construct learning
- type programs for specific groups of individuals or purposes.
I hope this intro will get you interested enough to get
started with SAPI 5.1. A short mention of thanks to the people of Microsoft
who worked hard at putting out a real quality product for developers.
Peter Bromberg is an independent consultant specializing in distributed .NET solutionsa Senior Programmer
/Analyst at in Orlando and a co-developer of the NullSkull.com
developer website. He can be reached at info@eggheadcafe.com
|