Beyond Replication: Misleading Reports of a Provocative Experiment

Jonathan Ellis is an Associate Professor in the Department of Philosophy at the University of California, Santa Cruz, where he has taught since 2002.  He received his PhD in Philosophy from the University of California, Berkeley (2002).  Since 2005, he has been Co-Director of the Santa Cruz Linguistics and Philosophy Group at UC Santa Cruz.  Ellis’s primary areas of research are the philosophy of psychology, the philosophy of mind and epistemology.  Recently he co-edited *Wittgenstein and the Philosophy of Mind* (published by Oxford University Press in 2012).  He is currently writing a book on the philosophical implications of motivated reasoning and other forms of compromised cognition. You can see more on his website at

Anyone familiar with the exploding genre of books from the cognitive sciences—Predictably Irrational; Thinking, Fast and Slow; Stumbling on Happiness; Blink—will know of John Bargh’s striking experiment on priming. The study, published in 1996, has recently been the subject of heated discussion among cognitive scientists, and reflection on the experiment continues to generate novel lessons for the scientific community even today. In the middle of it all is Nobel Laureate, Daniel Kahneman.

The experiment involved the venerable Scrambled-Sentence task, where subjects are given five scrambled words (e.g., it, there, Matt, right, hold) and asked to make a sentence out of exactly four of them (Hold it right there). Every subject completed thirty of these, yet half the subjects received many words associated with the elderly (forgetful, bald, gray, wrinkle, Florida) whereas the other half received entirely “age-non-specific” words. The result: the subjects whose words connoted old age later walked down the hall more slowly, on average, than the other subjects.

These words, Bargh and his colleagues explain, primed nonconscious ideas of the elderly, which in turn affected walking speed. The experiment is widely cited in support of the general notion that people are often unaware of the reasons why they act as they do.

Last year, however, a major brouhaha erupted concerning Bargh’s experiment, after a careful attempt to replicate it failed to yield a significant difference between the two groups. Bargh fiercely replied to suggestions that his group had unintentionally done something that skewed the results. The controversy proved especially feverish on account of the defensive and dismissive tone some readers heard in Bargh’s response. Ed Yong’s post about it, on his former blog, was influential.

More recently, matters reached a critical level with the entrance of psychologist Daniel Kahneman, who in September sent a stern email to psychologists of social priming, admonishing them of “the storm of doubts” that now plagues their field and the harmful “train wreck” he sees looming. In the email, Kahneman counsels, “you should collectively do something about this mess,” and then outlines in detail the sort of rigorous, laborious protocol for replicating studies that he advises be implemented.

Not surprisingly, the email has garnered the steadfast attention of all stripes of psychologist, no doubt due to Kahneman’s stature in the discipline. Kahneman’s research with Amos Tversky in the 1970s and 80s spawned the exceedingly fertile and widely influential “heuristics and biases” tradition, making “Kahneman & Tversky” a household name in economics and psychology and Kahneman a recipient of the Nobel Prize in Economic Sciences.

When the surprising priming results were first published, Kahneman writes, researchers in related areas accepted them as facts, but now they attach a question mark to the entire field of priming.

“[R]ight or wrong, your field is now the poster child for doubts about the integrity of psychological research.” 

Even scientists far beyond psychology—political scientists, economists, and so forth, whose own research depends on psychological findings—are now grappling with Kahneman’s concerns.

The plot thickens yet again though, as Kahneman’s email brings inevitable scrutiny to his own treatment of Bargh’s study in his recent book, Thinking, Fast and Slow. Exposed is an unexpected variety of missteps including, most importantly, an instance of a growing and detrimental trend of unwarranted generalization in the interpretation of scientific studies on human cognition—an inference that threatens consequences very similar, we will see, to those of which Kahneman warns in his letter.

If, in the face of Kahneman’s email, psychologists do ultimately undertake the serious methodological self-reflection Kahneman rightly prescribes, two further topics should be added to the docket for review: the criteria for warranted generalization, and the standards to which reporting in the new genre is accountable.

The problem is not that Kahneman relies in his book on priming results like Bargh’s. He acknowledges as much in his email, which is not intended to discredit the research behind these results but to facilitate its rehabilitation. Kahneman makes clear that he remains a “general believer” in the field.

What then is Kahneman’s flawed reasoning, and why is it important?

One of the principal themes of Thinking, Fast and Slow is that “the confidence we have in our beliefs is preposterous.” This goes for everyone (you, me, Kahneman himself): we are all prone to biased, overconfident, and otherwise problematic thinking. “You have no choice,” he says, “but to accept that the major conclusions of these studies are true. More important, you must accept that they are true about you.”

In his sustained effort, though, to emphasize the applicability to everyone of the studies he discusses, Kahneman infers too much from the data. After explaining Bargh’s experiment, he addresses his reader:

“Although you surely were not aware of it, reading this paragraph [which contained many words relating to the elderly] primed you as well. If you had needed to stand up to get a glass of water, you would have been slightly slower than usual to rise from your chair—unless you happen to dislike the elderly, in which case research suggests that you might have been slightly faster than usual!”

Bargh’s study, however, did not show that the words had an effect on everyone in the first group (who didn’t dislike the elderly). Nor did Bargh ever claim as much. What it showed (worries about replicability aside) is that the mean (or average) speed of the subjects in that group was slower than the mean (or average) speed of the subjects in the other group, and that the difference was unlikely to be due to chance. The data Bargh presents could be the result of fewer than half of the subjects in the former group having been affected by the words connoting old age. The study provides no justification at all for supposing that all, or even most, readers of Kahneman’s paragraph would be slightly slower than usual to rise from their chairs (even excluding those who dislike the elderly).

If a new cold medicine helps half the population to recover more quickly from a cold than they otherwise would have, but has no effect on the other half of the population, it would be libelous for the manufacturer to say that research shows that if you take this medicine, you will recover from your cold more quickly.

Drawing generalizations from mean differences is a dangerous game. And it is all too common in contemporary culture. Malcolm Gladwell makes the same mistake in Blink, and about the same experiment. After describing Bargh’s study, he writes:

“After you finished that test—believe it or not—you would have walked out of my office and back down the hall more slowly than you walked in.”

No! I mean, maybe I would have, but that’s not what the data showed. Bargh himself is much more careful when interpreting the data. He never says or even intimates that all or most subjects were affected. Nor does he claim of his reader (or of anyone else) that she herself would have walked more slowly. If we need to be vigilant about replicating particular studies, we need to be no less vigilant in extrapolating from them.

But wait, am I being too critical? After all, Kahneman’s book is written for a broad audience, and not primarily academics, and in order to sustain his readers’ interest (and patience), some issues must inevitably be glossed. This shouldn’t be one of them though. It is true that claims about some people are less simple, and often less exciting, than claims about all people. So are claims to the effect that all people are affected but only on some occasions. This is one reason advertisers typically phrase matters ambiguously:

“Clinical Studies Show Our Medicine Cures Colds.”

Everyone’s, or some people’s? Always, or sometimes? Ambiguity protects advertisers from overt misrepresentation without preempting the more exciting reading. Too many readers, though, are insensitive to this maneuver. As a result, they often form beliefs that simply aren’t true. Our culture needs to be more fluent with these distinctions, not less, and popular books on reasoning are well-placed to facilitate the improvement. At the very least, Kahneman shouldn’t be modeling the illicit inference himself.

Kahneman might argue that we can infer from the fact that some people showed a priming effect that all people are at least susceptible to the effect, on account of the fact that the mind is a noisy, dynamic system involving a multitude of variables that might inhibit the priming effect on any particular occasion. Such a reply would surely be contentious, but more importantly, it would not help Kahneman here. An appeal to a noisy system would undercut Kahneman’s claim that if, after reading his paragraph, you had needed to stand up to get a glass of water, you “would have been slightly slower than usual to rise from your chair.” A noisy system would suggest that maybe you would have, maybe you wouldn’t have.

Unwarranted generalizations are not the only errors made in books like Thinking, Fast and Slow, and that is important here. A look at discussions of Bargh’s study alone is disheartening. When Daniel Gilbert cites the experiment in Stumbling on Happiness, he writes: “When the word elderly is flashed, volunteers walk slowly.” But when we read the article Gilbert refers to—the same article Kahneman refers to—we learn that the word elderly was not even used in the experiment. And it certainly was never flashed. Nor was any other word. What is going on??

The word slow was also not among the words presented to either group of subjects. But the word old was. Strangely, this is precisely the opposite of how Kahneman reports matters in Thinking, Fast and Slow, where he explains “the word old is never mentioned,” yet says nothing about the words elderly and slow. Kahneman also writes, “When they had completed that task, the young participants were sent out to do another experiment in an office down the hall.” But they weren’t. That’s what happened to subjects of a different experiment Bargh conducted, an experiment intended to prime the constructs rude and polite. In fact, Bargh explicitly elucidates why it’s important that as the subjects walked down the hall they believed they were done there for the day.

Of the four books I mention at the top, only Dan Ariely’s discussion of Bargh’s experiment (in Predictably Irrational) does not involve infelicitous reporting. And that’s just looking at discussions of this one experiment.

What does all this have to do with the issue of replicability and Kahneman’s email to colleagues in psychology? My concerns mirror Kahneman’s. Flawed extrapolations like the one above are precisely what lead some readers (many academic philosophers, in fact) to approach with suspicion, if not automatically dismiss, alleged scientific findings about human rationality.

But this is unfortunate. Like Kahneman, I am a general believer in the area of research about which I am raising questions. I am also an ardent proponent of the value of books that relay findings in the cognitive sciences to broad audiences. That is why it is frustrating, if understandable, that the sorts of errors I’ve underscored engender insidious doubts in careful readers: “Is this the sort of reasoning the whole book’s based on? How can I be sure he’s not making similar errors elsewhere? Do I need to look up every study now to be sure he’s getting it right?” The mistakes generate skepticism.

The regrettable upshot: unsound reasoning such as Kahneman’s serves to obscure the legitimate reasons for which the science does in fact support many of the ideas in Kahneman’s book, even if not the idea that every reader (who does not dislike the elderly) would be slightly slower than usual to rise from her chair.

As I argue elsewhere, our culture has not assimilated the significance of Kahneman’s ideas enough, especially those in Thinking, Fast and Slow. But his discussion of Bargh’s experiment there exhibits unhealthy trends that, like the pernicious doubts about priming, call for acute collective attention.


