Computer Speech

/\/\/\/
   
One of the areas I am focusing on is computer speech technology. I am currently writing my own speech synthesis software, called Cynthia. The picture at the top of this page is of a waveform of Cynthia saying the words computer speech. I am keeping a small log of how Cynthia says things in different stages of development. You can listen to some of her earliest sounds here.

The first step in converting a piece of text into computer speech is to represent the text phonetically. Once the text is represented as a string of phonemes, the computer then generates the sounds of the phonemes, and attempts to use statistical information to apply intonation and dynamics to the speech.

The task of converting plain text into a phonetic representation is straightforward; it involves looking up words in a phonetic dictionary. Carnegie Mellon University has a phonetic dictionary for American English.

Generating sounds for the phonetic text is a hard problem. There are different approaches to solving this problem. Many speech synthesizers, such as the one available from Microsoft, digitally try to model the way the human mouth forms the sound of each phoneme.

Other attempts use databases of recorded phonemes from actual people. Each phoneme in the input text is matched with its corresponding sound. Since this approach uses static recordings, it is very hard to attach the sounds together continuously to form pleasing speech.

A good place to test out a text-to-speech program is at Project Gutenberg. Project Gutenberg is an online library of books in the public domain.