Build a SAPI Text to Wav Converter Library
by Peter A. Bromberg, Ph.D.

Peter Bromberg

Interesting things happen when you "find a need and fill it". Recently I was thinking about all the nice books and other text files on the web (such as at Project Gutenberg ) and I thought how nice it would be to be able to convert these text files to speech using one of the SAPI engines, save the resultant wav file, and then perhaps use Windows Media Encoder to convert it to either MP3 or WMA, and put it on my Smartphone to listen to with its built-in Windows Media Player 10. That would be cool.

So I looked around. Couldn't find anything that was free. Plenty of offerings, but they all either were crippled in order to get you to buy it, or they were too expensive. So I thought about it, and said to myself, "Wait a minute" - we are in the VOIP business at work, and we can even use something like this to announce the current time and other needed phrases - take a custom, dynamically assembled text message, convert it to speech, and play it.



Being familiar with SAPI, I wrote up the following Converter class library in less than an hour:

using SpeechLib;
using System;
using System.IO;
using System.Threading;

namespace TextToWavLib
{ 
 public class Converter
 {
  public Converter()
  {    
  }  
  public string[] getInstalledVoices( )
  {
   SpVoice speech = new SpVoice();
   ISpeechObjectTokens sot = speech.GetVoices("","");
   string[] voiceIds = new string[sot.Count];
   for(int i = 0;i < sot.Count ;i++)
   voiceIds[i] = sot.Item(i).GetDescription(1033) ;
   return voiceIds;
  }
  
  public void TextToWav(string inputText, string filePath, string voiceIdString)
  {
   try 
   {
    System.Web.HttpContext ctx = System.Web.HttpContext.Current;
    
    if(ctx != null)
    {
     DirectoryInfo di = new DirectoryInfo(ctx.Server.MapPath("."));
     FileInfo[] fi = di.GetFiles("*.wav");
     foreach(FileInfo f in fi)
     File.Delete(ctx.Server.MapPath(f.Name));
    }
    
    SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync; 
    SpVoice speech = new SpVoice();
    
    if(voiceIdString != String.Empty)
    {
     ISpeechObjectTokens sot = speech.GetVoices("","");
     string[] voiceIds = new string[sot.Count];
     for(int i = 0;i < sot.Count ;i++)
     {
      voiceIds[i] = sot.Item(i).GetDescription(1033) ;
      if(voiceIds[i] == voiceIdString)
      speech.Voice = sot.Item(i);      
     }
    }
    SpeechStreamFileMode SpFileMode = SpeechStreamFileMode.SSFMCreateForWrite;
    SpFileStream SpFileStream = new SpFileStream();
    SpFileStream.Format.Type = SpeechAudioFormatType.SAFT11kHz8BitMono;
     if( ! filePath.ToLower().EndsWith(".wav"))filePath += ".wav";
    SpFileStream.Open(filePath, SpFileMode, false);
    speech.AudioOutputStream = SpFileStream;
    speech.Speak(inputText, SpFlags);
    speech.WaitUntilDone(Timeout.Infinite);
    SpFileStream.Close();    
   }
   catch
   {
    throw;
   }
  }
  
  public byte[] TextToWav(string inputText, string voiceIdString)
  {
   byte[] b = null;
   try 
   {    
    SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync; 
    SpVoice speech = new SpVoice();    
    if(voiceIdString != String.Empty)
    {
     ISpeechObjectTokens sot = speech.GetVoices("","");
     string[] voiceIds = new string[sot.Count];
     for(int i = 0;i < sot.Count ;i++)
     {
      voiceIds[i] = sot.Item(i).GetDescription(1033) ;
      if(voiceIds[i] == voiceIdString)
      speech.Voice = sot.Item(i);      
     }
    }
     
    SpMemoryStream spMemStream = new SpMemoryStream();
    spMemStream.Format.Type = SpeechAudioFormatType.SAFT11kHz8BitMono; 
    object buf = new object();
    speech.AudioOutputStream = spMemStream; 
    int r = speech.Speak(inputText, SpFlags);
    speech.WaitUntilDone(Timeout.Infinite);
    spMemStream.Seek(0,SpeechStreamSeekPositionType.SSSPTRelativeToStart);
    buf = spMemStream.GetData(); 
    b = (byte[])buf;
   }
   catch
   {
    throw;
   }
   return b;
  }
 }
}


The class library uses the SpeechLib COM component ("Microsoft Speech Object Library", or "SAPI.DLL") which comes preinstalled on most Windows XP machines. You can also download the SAPI SDK at MSDN which will install it. I added three methods - one to enumerate the installed voice engines returning a string array of their description names (e.g., "Microsoft Mike") and two overloads of a TextToWav method, one which will save the completed WAV file, and another which returns a RAW audio stream as a byte array.

Then I built a Windows Forms front end which accepts some text you can type into a textbox, or optionally loads from a file, allows you to set the output file, and the voice personality to use, and it converts the string of text to a wav file and saves it. I also put in a method to exercise the RAW audio option and save the byte array as "filename.raw" - but that you would have to custom load into something like SoundForge since there is no WAV header on it. I also set the formata type as 11kHz 8 bit Mono to keep the file size down, since this is just spoken text. It looks like this:


Then I also built an ASP.NET web page to test it, and put in code to automatically clean up saved files each time it is run, plus a BGSOUND tag to play the converted text and also a Hyperlink to allow the user to download the wav file.

When I showed this to the people at work, they instantly realized that we could use it for a variety of things related to our particular business. So a little project that started out solving a personal need during my lunch hour ended up being useful in the business as well.

Download the entire solution below, and be sure to mark the "TextToWavWeb" folder as an IIS Application. In addition, you may need to set a web.config <identity impersonate="true" userName="priviledgedUser" password ="pass" /> so that the app will have file access permissions to save files and delete old files.

You can do a lot of cool things with a library like this. Just let your imagination run. For example, you can greet visually impaired users with a customized greeting, or use it to provide auditory enhancement cues in a web application.

N.B. a recent posted asked how to play a sound from memory. I happened to have some code handy, so I'm pasting it below:

	using System;
using System.Runtime.InteropServices;
using System.Resources;
using System.IO;
namespace Win32
{
  public class Winmm
	{
		public const UInt32 SND_ASYNC = 1;
		public const UInt32 SND_MEMORY = 4;
	// these 2 overloads we dont need ...		
	//	[DllImport("Winmm.dll")]
	//	public static extern bool PlaySound(IntPtr rsc, IntPtr hMod, UInt32 dwFlags);

//		[DllImport("Winmm.dll")]
//		public static extern bool PlaySound(string Sound, IntPtr hMod, UInt32 dwFlags);

          // this is the overload we want to play embedded resource...
		[DllImport("Winmm.dll")]
		public static extern bool PlaySound(byte[] data, IntPtr hMod, UInt32 dwFlags);
		public Winmm()
		{			
		}
public static void PlayWavResource(string wav)
{
	// get the namespace		 
	string strNameSpace= 
		System.Reflection.Assembly.GetExecutingAssembly().GetName().Name.ToString();

	// get the resource into a stream

	//string[] names = System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceNames();
//foreach(string nm in names)
//System.Diagnostics.Debug.WriteLine(nm);
	//strNameSpace +"."+
	Stream str = 
	System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream(  wav );
	if ( str == null )
		return;
	// bring stream into a byte array
	byte[] bStr = new Byte[str.Length];
	str.Read(bStr, 0, (int)str.Length);
	// play the resource
	PlaySound(bStr, IntPtr.Zero,  SND_ASYNC |  SND_MEMORY);
}
	}
}
	

Download the VS.NET 2003 Solution that accompanies this article.


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.

Article Discussion: