Make Your Controls Speak with C# .NET
by Peter A. Bromberg, Ph.D.

Peter Bromberg

"If at first you don't succeed, try, try again. Then quit. There's no point in being a damn fool about it." -- W.C. Fields

Some time ago, I wrote an article here about creating a Text to Speech library using a combination of the COM SpeechLib interfaces and a revised version of Ianier Munoz' WaveOut player. One of the interesting comments received was a reader asking if it was possible to play the generated audio stream in memory without having to save a file. So, having all the ingredients handy, and an hour of extra time, I decided to put something together. I think you'll find it useful, particularly in those instances where audio and not just visual feedback is desired in an application, such as where one is coding for the benefit of the visually impaired.



The library offers a static "TextToWavPlay" method that can easily be used for the Click or Selected events of virutally any Windows Forms control. The signature looks like this:

WaveLib.Converter.TextToWavPlay(theText, null);

Boy. I hope that's easy enough! Here is what the Demo form looks like:

Let's take a look at what was involved in creating this puppy. First, here's the choice part of the codebehind of the sample Demo form included in the downloadable VS.NET 2003 Solution below:

 private void button1_Click(object sender, EventArgs e)
  {
   DataSet ds = new DataSet();
   ds.ReadXml(this.textBox1.Text);
   this.dataGrid1.DataSource =ds.Tables[2];
  }

  private void dataGrid1_Click(object sender, EventArgs e)
  {
   GetSelectedRow(this.dataGrid1);
  }
  public void GetSelectedRow(DataGrid dg) 
  { 
   //speaking a selected row / column from a grid
   CurrencyManager cm = (CurrencyManager)this.BindingContext[dg.DataSource, 
dg.DataMember]; DataView dv = (DataView)cm.List; for(int i = 0; i < dv.Count; ++i) { if(dg.IsSelected(i)) { string s=(string)dv.Table.Rows[i][0]; s=HttpUtility.HtmlDecode(s); Converter.TextToWavPlay(s, this.selectedVoice); } } } private void frmDemo_Load(object sender, EventArgs e) { // Binding enumerated installed voices to a control Converter c = new Converter() ; string[] voices= c.getInstalledVoices(); this.comboBox1.DataSource =voices; } private void comboBox1_SelectedIndexChanged(object sender, EventArgs e) { // speaking the selected value of a control string s= (string)comboBox1.SelectedValue; Converter.TextToWavPlay("You selected " +s, s ); this.selectedVoice= s; } }

You can see that in the Load handler I create an instance of the Converter class and call it's getInstalledVoices method to return a string array of the installed SAPI voices on the machine, which is Databound to the combobox.

When the ComboBox's selected index is changed (the user selects a voice from the dropdown), we call the Text2WavPlay static method, which accepts a string of text to speak, and the voice id string (e.g. "Microsoft Sam") from the control. You can also pass null for the second parameter and it will use the default voice profile.

Finally, when the user clicks the "Get Rss" button on the form, we retreive an RSS feed from Url spcified in the textbox, and populate our datagrid with it. The GetSelectedRow method is used to find a row on the grid that the user has clicked, and then to play the column of information that represents the Title item from that RSS item.

Switching over to the actual Converter Class, which is now embedded with the rest of the CSWavPlay / Record library:

using System;
using System.IO;
using System.Threading;
using System.Web;
using SpeechLib;

namespace WaveLib
{
 public class Converter
 {
  public Converter()
  {
  }

  public string[] getInstalledVoices()
  {
   SpVoice speech = new SpVoice();
   ISpeechObjectTokens sot = speech.GetVoices("", "");
   string[] voiceIds = new string[sot.Count];
   for (int i = 0; i < sot.Count; i++)
    voiceIds[i] = sot.Item(i).GetDescription(1033);
   return voiceIds;
  }

  public void TextToWav(string inputText, string filePath, string voiceIdString)
  {
   try
   {
    HttpContext ctx = HttpContext.Current;

    if (ctx != null)
    {
     DirectoryInfo di = new DirectoryInfo(ctx.Server.MapPath("."));
     FileInfo[] fi = di.GetFiles("*.wav");
     foreach (FileInfo f in fi)
      File.Delete(ctx.Server.MapPath(f.Name));
    }

    SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
    SpVoice speech = new SpVoice();

    if (voiceIdString != String.Empty)
    {
     ISpeechObjectTokens sot = speech.GetVoices("", "");
     string[] voiceIds = new string[sot.Count];
     for (int i = 0; i < sot.Count; i++)
     {
      voiceIds[i] = sot.Item(i).GetDescription(1033);
      if (voiceIds[i] == voiceIdString)
       speech.Voice = sot.Item(i);
     }
    }
    SpeechStreamFileMode SpFileMode = SpeechStreamFileMode.SSFMCreateForWrite;
    SpFileStream SpFileStream = new SpFileStream();
    SpFileStream.Format.Type = SpeechAudioFormatType.SAFT11kHz8BitMono;
    if (! filePath.ToLower().EndsWith(".wav")) filePath += ".wav";
    SpFileStream.Open(filePath, SpFileMode, false);
    speech.AudioOutputStream = SpFileStream;
    speech.Speak(inputText, SpFlags);
    speech.WaitUntilDone(Timeout.Infinite);
    SpFileStream.Close();
   }
   catch
   {
    throw;
   }
  }

  public static void TextToWavPlay(string inputText, string voiceIdString)
  {

   byte[] b = TextToWav(inputText, voiceIdString);
   // play bytes here
   Player m_Player = Player.Instance;
   WaveFormat fmt;
  fmt = new WaveFormat(11000, 8, 1);
  m_Player.SetInput(new MemoryStream(b), fmt);
   m_Player.Start(b.Length);
  }

  public static byte[] TextToWav(string inputText, string voiceIdString)
  {
   byte[] b = null;
   try
   {
    SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
    SpVoice speech = new SpVoice();
    if (voiceIdString != String.Empty)
    {
     ISpeechObjectTokens sot = speech.GetVoices("", "");
     string[] voiceIds = new string[sot.Count];
     for (int i = 0; i < sot.Count; i++)
     {
      voiceIds[i] = sot.Item(i).GetDescription(1033);
      if (voiceIds[i] == voiceIdString)
       speech.Voice = sot.Item(i);
     }
    }

    SpMemoryStream spMemStream = new SpMemoryStream();
    spMemStream.Format.Type = SpeechAudioFormatType.SAFT11kHz8BitMono;
    object buf = new object();
    speech.AudioOutputStream = spMemStream;
    int r = speech.Speak(inputText, SpFlags);
    speech.WaitUntilDone(Timeout.Infinite);
    spMemStream.Seek(0, SpeechStreamSeekPositionType.SSSPTRelativeToStart);
    buf = spMemStream.GetData();
    b = (byte[]) buf;
   }
   catch
   {
    throw;
   }
   return b;
  }
 }
}

Above you can see how the various methods are implemented to get the installed voices, play a string of text in memory, or return a byte array representing the raw PCM audio portion of a TextToWav string (without WAV header). I hope this concept is useful to you.

Download the Visual Studio.NET 2003 Solution accompanying this article


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.