How do we measure young children’s vocabulary size?

When studying early childhood language, researchers often want to know children’s vocabulary size, but it can be difficult for researchers to get a truly accurate idea of what children words children know. We can’t tell what’s in their usual vocabulary just from interacting with them during a research visit, since we are only around them for a short time, and the child might act differently in the lab than they do at home (it’s normal to be shy in unfamiliar surroundings!). Instead, our lab uses the MacArthur-Bates Communicative Development Inventories (CDI), so that the parents can tell us about their child’s language, since they know their child and they things they talk about much better than we do! Researchers can administer the CDI on paper or use the WebCDI (an online version of the survey), like we do in our lab. This has been really helpful during COVID-19 for conducting research online, and it can also save time during an in-person lab visit by allowing parents to fill out the form ahead of time. 

So what is on a CDI form? In general, all CDIs contain a demographics section and a checklist section asking about the child’s vocabulary. However, there are multiple versions: For younger children from eight to eighteen months old, we use the Words and Gestures form of the WebCDI. In this vocabulary section, there is a checklist of words with boxes for the parent to indicate whether their child “understands” or “understands and says” each one, plus a similar section about which communicative gestures the child might use, like pointing and waving. We use the Words and Sentences version of the WebCDI for children between sixteen and thirty months old. This vocabulary checklist only asks whether the child says the given word or not, and there is also a section that asks the parent about the types of sentences and word combinations that their child uses. 

The CDI is not a perfect measure. Trying to remember all the words your child says out of a list of possibilities is a really difficult task, and parents might have different definitions of what it means for their child to “understand” a word (after all, we can’t read babies’ minds!), so some people might report more or less words than their child actually knows. Nevertheless, the CDI is a useful and accurate enough measure for approximating vocabulary. Importantly, these surveys are only an assessment of what the parents report, and cannot be used as a diagnostic for developmental delays or any other psychological conditions. These are also normed lists, which means that they might not contain all the words your child knows. A very high or low number of words reported doesn’t mean that there is anything atypical about your child’s development! 

Having a standard form like the CDI makes it easy for us to collect and organize large amounts of data across multiple studies, and even with other labs. With it, we gain a broader idea of each child’s linguistic background, and we can see if vocabulary size is related to different stages of learning. Each time a parent fills out a CDI, we ask if we can share the (anonymized) list of words with other language researchers. By sharing with other labs across the world, we can even see broader trends across thousands of children with projects such as Wordbank. WordBank is an online database with CDI data from not only English speaking children, but from speakers of 28 other languages as well. We can find out which words are the most common in children’s early vocabulary, and we can look at month-by-month averages. 

The WebCDI is not the only widely-used metric for assessing children’s vocabulary size. Some labs use the Peabody Picture Vocabulary Test, or PPVT, to test children directly. Unlike the CDI, the PPVT is not meant exclusively for children and can be used for anybody over two and a half years old. It tests the participant’s receptive vocabulary, (in other words, the vocabulary that they can understand from speech). Participants are given a flipbook with pages with four pictures on each page, and for each page the researcher says a word and asks the participant to point out the picture that matches that word. The PPVT also generally tests fewer words than the CDI, and is limited to words that are easy to show in a picture, rather than abstract words like “of” and “when.”

Although both the WebCDI and the PPVT are widely used in child-development labs, they each have their distinct uses, with the WebCDI being better suited for wide and efficient distribution to participants, while the PPVT specializes in direct assessment of individual children’s vocabularies.



Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2016). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language. doi: 10.1017/S0305000916000209.

Fenson, L., Marchman, V. A., Thal, D., Dale, P., Reznick, J. S. & Bates, E. (2007). MacArthur-Bates Communicative Development Inventories: User’s Guide and Technical Manual. 2nd Edition. Baltimore, MD: Brookes Publishing Co.

Community-University Partnership for the Study of Children, Youth, and Families (2011). Review of the Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4). Edmonton, Alberta, Canada.

Olivia takes a selfie in a softly lit room. She has sandy hair and freckles and wears a denim jacket.

Olivia Leggio


Olivia is a senior majoring in linguistics and computer science. Outside of school she plays in the marching band, fences, daydreams about making bread, and highkey wishes she could go to culinary school at the same time as regular college.

Elika Bergelson

Principal Investigator