The visual world paradigm

So far we've learned about how to use eye-tracking to study reading. But there is another popular use of eye-tracking, called the visual world paradigm. Proceed to the following questions to learn more about this technique.

In a visual world experiment, participants look at some picture on the screen while they listen to a word or sentence. While they are listening, the eye-tracker can record what part of the picture they are looking at.

The picture below is an example from a famous/classic visual world eye-tracking paper.

Altmann, G., & Kamide, Y. (1999). Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition, 73, 247-264.

A drawing of a boy sitting among several objects: a ball, a cake, a toy car, and a train.

While a person is looking at this picture, they will hear the sentence "The boy will move the cake."

How do you think people's eyes will move around the picture as they listen to this sentence? At each word or phrase, where do you think they will look?

It turns out that when people hear a word, they tend to look at that thing. So, for example, people look at the boy when they hear the boy..., and they tend to look at the cake when they hear ...the cake.

Here is how we can look at the results from a visual world experiment. We can divide the duration of the whole sentence into brief chunks (e.g., 50 milliseconds each). For each brief chunk, we can calculate what percentage of the time people look at a certain part of the picture. (Keep in mind that in a real experiment there will be many sentences, not just one. So for example, if a person hears 100 sentences with 100 pictures, and during a particular 50-ms time window they are looking at the cake in 90 of those 100 sentences, we would say they look at the cake 90% of the time.) Then we can make a graph like this:

A graph of visual world eye-tracking data. The x-axis represents time, and the y-axis represents proportion of looks. Proportion of looks to the target (cake) are represented by black squares, and proportion of looks to other words are represented by white squares. See text for full description of the pattern shown.

The black squares show how much people look at the cake, as the sentence is unfolding. The white squares show how much people look at the other objects. We can see that, at the beginning of the sentence, people don't look much at the cake or at the other objects (probably they're mostly looking at the boy). As the sentence goes on, they look at the cake more and more. By the end of the sentence, they are almost always looking at the cake, and they are looking at the cake much more than they look at the other objects.

Weirdly, it seems like in this experiment, people looked more at the cake than they looked at the other objects, even before they heard the word "cake". Can you guess why? (It might help to look at the picture again.)

You can notice from the picture that the boy seems to be staring at the cake; that probably made people want to look at the cake too, even before they hear the sentence. Plus, maybe cake is just more interesting to look at than other things. So this is definitely not a perfect experiment; but it's the first experiment to try this visual world paradigm, so it helped us learn about all these issues.

But I haven't told you about the cool part of this experiment yet. Remember that people saw a picture (shown again below), and they heard "The boy will move the cake".

The authors also tried this another way, too. They showed the same kind of picture, but let people hear a sentence like "The boy will eat the cake".

Do you think the results might be different when people see "The boy will eat the cake"? If so, what do you think will happen differently, compared to the previous results we saw?

Here's what was found in that experiment.

The graph below is arranged the same as what you saw before; it's the same graph, with a few new lines added.

Recall that when people hear "the boy will move the cake", they eventually look at the cake. In fact they are already looking at the cake very early (because of the problem with the picture). But they don't start really looking at the cake a lot until around when they hear the word "cake"; that's the time that they start to look at the cake much more than they look at other things.

On the other hand, the black circles in this graph show how much people look at the cake when they hear "the boy will eat the cake", and the white circles show how much they look at other objects when they hear this sentence.

You should notice that they start looking at the cake a lot (i.e., the black circles get much higher than the white circles) much earlier. Before they even heard the word "cake", they are already looking at cake much more. Why does this happen?

It happens because "cake" is the only thing in the picture that is edible! Therefore, as soon as they hear "The boy will eat...", they can already predict that the word is going to be cake. This experiment was one of the first ever to give evidence that people predict upcoming words before we hear them.

Another kind of visual world experiment

That is a classical visual world study showing how people predict upcoming words. But visual world doesn't only have to be used with full sentences. It can even show us how we figure out single words as we are hearing them.

Keep in mind that it takes some time to hear a word. You don't hear a word all at once; it takes time for the word to unfold. (You can try this out yourself and see. If you have completed the "Phonetic analysis" module, then you've already downloaded Praat on your computer; or even if you haven't, you can use any simple speech recording app on your phone. Record yourself saying some short word, then open it in Praat and see how long it takes. In Chinese, even a one-syllable word often takes around 300 milliseconds or so.) That means that when you are hearing a word, there is a moment when you have only heard part of it and you might still be guessing what you are hearing.

Consider, for example, the word beaker (實驗室烧瓶). When you are hearing someone say this word, first you just hear [b]. Then you hear [bi]... maybe you think you're hearing the word bee! Then you hear [bik]... maybe you think you're hearing the word beak!

Obviously, a lot of stuff might happen in your mind before you finish hearing the complete word. Can we use visual world eye-tracking to track what's happening in someone's mind over time as they are hearing a word unfold?

This is one of the most popular uses of the visual world paradigm. A famous early experiment that did is one by Allopenna and colleagues (1998); below we'll discuss that experiment.

Allopenna, P., Magnuson, J., & Tanenhaus, M. (1998). Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. Journal of Memory and Language, 38, 419-439.

In that experiment, people would hear a word (such as beaker) while looking at a picture like this:

A visual world array including four objects: a beaker, a beetle, a speaker, and a carriage.

The real version of the picture wouldn't have words written on it; I just put those here for our reference. From the top, going clockwise, the picture has four key objects: a speaker (揚聲器), a beetle (甲蟲), a carriage (BB車), and a beaker (實驗室烧瓶—actually if you are a chemistry expert you might say the thing shown in this picture is a "flask" rather than a "beaker", but in colloquial English, people who aren't chemistry experts call all of these things "beakers").

For this question, I want you to predict what the visual world eye-tracking results will look like when people hear beaker while looking at this picture. Using the below graph as a template, draw four lines: how much people will look at the beetle over time, how much they will look at the beaker over time, how much they will look at the speaker over time, and how much they will look at the carriage over time. For each one, you should think about what will happen as the word unfolds. In other words, when people hear just [b] how much will they look at the beaker, how much will they look at the carriage, etc.; when people hear [bi] how much will they look at each of these; etc.

You can draw your results on the blank template below.

The visual world paradigm (1 hour)

Another kind of visual world experiment