Before babies can learn what a word means, they have to know what it sounds like.

What is wordform recognition?

Before they can even consider learning what words mean, infants need to be able to recognize the consistent way that words sound, even though the same word can sound super different each time it’s said! Take a listen to this audio recording of all the times a baby heard other people saying “baby” in one single day

Bulgarelli, Mielke, & Bergelson (2021)

To be able to recognize consistency, kids need to maintain adequate phonetic representations of familiar words. That is, they need to remember the important sound components to put together and have an expectation of what those words are going to sound like. For example, to learn the word “dog,” they need to know it consists of the English phonemes or speech sounds /d/+ /ɔ/ (the vowel that makes the “aw” sound) + /g/. However, the pitch of those phonemes, the volume of the word, or the length of that middle sound might vary a lot depending who’s saying it– and it doesn’t change the meaning!

But they don’t just have to remember individual words by themselves. Another part of  wordform recognition is the ability to slice words out of fluent speech as distinct, meaningful sounds. In real life, there’s no physical space or time between words when someone is speaking to you– it’s one continuous flow of sound. Segmentation is simply the grouping together of certain sounds in speech as one word, and separating them from the rest of the speech stream. Take this sentence for example:


To someone who doesn’t speak or read English, that will look like a long string of nonsense. But to experienced English readers, the letters group together into familiar words and form a coherent sentence. While we use different sensory processes to understand written versus spoken language versus signed language, we do a similar thing when we read, hear speech sounds, or see language signs, and babies need to get good at this very early on in their language journey.

A white baby with a blonde bowl cut holds on to a table and looks puzzed while a thought bubble spells out the phonemic transcription for the word "table."

Toddler on Trump” by quinn.anya is licensed under CC BY-SA 2.0.


Both phonetic representation and word segmentation build infants’ wordform recognition ability. Like with most things, practice makes perfect – the more times we hear a word, the stronger our expectations of how it should sound, and the faster we are able to recognize it (Zevin, 2009). Later on in development, as infants learn the meanings of words, they match meanings to the recognized collection of sounds stored in their lexicon (Carbajal, Peperkamp, & Tsuji, 2021). This process becomes faster and easier as their word comprehension gets better.

How do we know if babies can do this?

To know if an adult knows a word, we can ask. With babies, not so much. To test whether or not an infant can recognize a wordforms like “dog” (remember our phonemes /dɔg/), researchers can use discrimination studies to see if the infant can tell the difference between familiar and unfamiliar words. The idea here is that, if the infant already has a phonetic representation of familiar words, they should be able to treat them differently from unfamiliar words.

One study design that is used to do this is the Head -Turn Preference Procedure. As the name suggests, this procedure makes use of the direction that the infant turns their head to determine a difference in preference between two types of stimuli (familiar vs unfamiliar words) (Nelson, 1995).

One nice thing about babies is that we can count on them to do certain things. For instance, if there’s a blinking light in a dark room, they’re likely to look towards it just like grownups are. Taking advantage of this, this method flashes lights and plays sounds as long the baby looks at the flashing light on either side of them. This very basic method lets us know whether babies can tell certain types of sounds apart (e.g. trumpets vs. saxophones, foreign speech vs. native language speech, or known vs. unknown wordforms). If babies show a significant difference in the time they spend looking at the light between the two types of sounds overall, it tells us that they have spotted the difference, and thus recognized the wordforms of the familiar words.

a black and white cartoon of a woman wearing headphones, sitting with an infant in her lap in a booth with lights left, right and center.

Image from Gervain & Werker (2013)



Bulgarelli, F., Mielke, J., and Bergelson, E. (2021). Quantifying Talker Variability in North-American Infants’ Daily Input. Cognitive Science. doi: 10.1111/cogs.13075

Carbajal, M.J., Peperkamp, S. and Tsuji, S. (2021), A meta-analysis of infants’ word-form recognition. Infancy, 26: 369-387.

Gervain, Judit & Werker, Janet. (2013). Prosody cues word order in 7-month-old bilingual infants. Nature communications. 4. 1490. 10.1038/ncomms2430. 

Nelson, D. G. K. et al. The head-turn preference procedure for testing auditory perception. Infant behavior and development 18, 111–116 (1995). 

Zevin, J. (2009), Word Recognition. Encyclopedia Neurosci. 517–522. doi: 10.1016/b978-008045046-9.01881-7

A sunny selfie of Sam smiling. She wears a white tshirt and has curly red hair.

Samantha Chaney


Samantha is a junior at Duke double-majoring in Neuroscience and Linguistics with a minor in Psychology. She is specifically interested in how music and language impact someone’s perception of their surroundings. Having grown up speaking multiple languages and playing various instruments, she would love to know how that has impacted her cognitive development and helped her navigate the world. Most of her free time is spent drawing and playing Minecraft. The rest is spent napping with her cats and listening to music.

Elika Bergelson

PRincipal Investigator