How do Siri and Alexa sound so much like people and not robots? One way: formants!

Formants are used in all sorts of voice-related technology and analysis.

Formants are bands of high-energy sound that occur at certain frequencies. Does this sound complicated?
How about this:

Have you been in a small echoey room or stairwell and noticed the echoes were stronger when you spoke or sung a certain pitch? The room resonated at a certain pitch (frequency). Your throat and mouth are the same way. They have different places that resonate best at different frequencies. When you move your throat and mouth, you change the size of these spaces, which changes what frequency they resonate with best. 

Check out the YouTube video below for a long-winded (no pun intended) but helpful explanation of how these resonate at different frequencies.

Linguists normally pay attention to the three lowest resonances, because they are all that people need to use to tell vowel sounds apart. We can visualize formants on a chart that shows frequency as time passes. This kind of chart is called a spectrogram. Below is what a formant for the sound (ee) in Standard American English.

 

The higher frequencies are at the top of the graph, and time moves across the graph, from left to right. Notice how each one has one or more dark bands doing something unique. These dark bands are formants.

As you change your mouth shape while speaking, you change the frequencies at which those areas resonate. This changes the height and angle of the formant bands.

What do formants in a sentence look like?

Below is a video of a computer saying “The boy played a bugle.” The formants shift as the mouth changes shape. For example, “Boy” has two vowel sounds: (oh) and (ee). the formants slide from one to another because the mouth moves from one shape to another.

 

 

Infant and adult brains use formants to process speech, as do many computer programs. Natural Language Processing in many digital assistants use formants to create speech by matching vowels or consonants with a unique set of formants that naturally occur with each sound. People do this too!

Aahnix poses for a picture on a chair, facing the sunset.

Aahnix Bathurst

Editor/publisher

Aahnix is a Project Coordinator in the Bergelson lab at Duke University

Elika Bergelson

Principal Investigator