Scientific Data | Scientific Data

Author’s Corner: Is fame fair?

Guest post by Amy Yu and César A. Hidalgo, MIT Media Lab, Cambridge, Massachusetts, USA

Is fame superficial? Or can it be a signal of accomplishment?

In a world where many media outlets seem dominated by characters of inexplicable fame (such as the Kardashians), asking ourselves if our social reward systems are misfiring is both a fair question and a relevant one. The relevance of this question stems from the fact that humans are social learners – we are a species whose success depends on the ability of individuals to learn from others. But choosing whom to learn from, in a world populated by more people than we can meet, is not easy. To facilitate those choices, humans have evolved cognitive biases that nudge us to learn from those who demonstrate skill, accomplishments, and also, fame or prestige[1].

So the question of whether social recognition is fairly attributed is relevant, because a world that attributes popularity unfairly is also a world where people are nudged to learn from inadequate models.

But the conspicuous fame of teen icons and reality show celebrities is not enough evidence to conclude that our social rewards systems have gone berserk. To test this conclusion we need statistical evidence, instead of anecdotes, since these examples could well be outliers in what is otherwise a world where the correlation between fame and accomplishment is strong. But how can we test that alignment? Do we even have the data that we can use to create proxy measures for fame and accomplishment?

In our paper published today in Scientific Data, we introduce the Pantheon 1.0 dataset, a dataset that measures the historical fame of all of the individuals in human history that are recorded in more than 25 languages[2] in Wikipedia. The Pantheon 1.0 dataset annotates each individual with their occupation, demographics (year and country of birth), and several metrics of popularity (derived from the number of language editions in Wikipedia and the pageviews received across different languages). But what makes the Pantheon dataset special is that it focuses on a multilingual corpus (more than 200 language editions of Wikipedia), and it introduces a detailed taxonomy of occupations that classifies biographies into 88 distinct categories. The multilingual nature of the Pantheon dataset allows us to focus on globally famous individuals, while discarding those who are only locally famous (for instance, most American Football Players, who are popular in the United States, but unknown for the rest of the world, do not make the cut). Our taxonomy of occupations, on the other hand, allows us to identify individuals that have made similar contributions, allowing us to test the alignment between fame and accomplishment for narrowly defined groups of individuals.

When it comes to measures of fame, the Pantheon 1.0 dataset provides us with a few alternative measures. One of these is L: the number of languages in which a biography is present in Wikipedia. Another alternative is the Historical Popularity Index or HPI, which is an aggregate measure that combines L with other measures of popularity and historical relevance, such as the number of pageviews received by the biography in non-English editions of the Wikipedia, the distribution of pageviews across languages, and more importantly, the date of birth of the historical character, since biographies that have sustained popularity over long periods of time deserve a boost. In our data descriptor paper we use L and HPI to test the relationship between fame and accomplishment, but since the data is now open, readers are empowered to create their own measures of popularity as well.

So how strongly does fame correlate with accomplishment? To answer this question we now need objective measures of accomplishment, which are not easy to find for all occupations. Yet, since Pantheon 1.0 also classifies individuals by occupation, we can focus on testing this relationship for the few occupations where accomplishments are clearly measurable. This is the case of individual sports.

Quantifying the level of accomplishment of a renaissance painter, or an enlightenment humanist, is difficult, and we will not attempt to do it, but quantifying the level of accomplishment of a tennis player, Olympic swimmer, or racecar driver, is relatively straightforward. In tennis, there are clear accomplishments, like the number of weeks in the #1 slot of the ATP, or the number of grand slams won. Similarly, racecar drivers that have won more championships, have started more races, and have been at the podium more times, are more accomplished than those who have been at the podium only a couple of times. So in these cases we can measure accomplishment, and more importantly, we can test whether the correlation between accomplishment and fame is strong and significant.

The figure below compares one of our measures of fame (HPI) with measures of accomplishment using a multivariate regression (since accomplishment is not univariate, even in individual sports). In all cases we find a positive correlation between accomplishments and fame, meaning that more accomplished athletes tend to be also the most famous ones. Of course, there are clear outliers, like Anna Kournikova in tennis, who is more popular than what her accomplishments can explain, or Paul Morphy in chess, who is relatively obscure given his outstanding track record (probably because he was a chess champion at a time in which chess was not as organized as it now is). Still, the figure shows that our society is one in which, at least for the case of these four individual sports, fame and accomplishment are decently aligned.

Comparing a measure of fame (HPI) with measures of accomplishment for Swimmers, Chess Players, Formula 1 Drivers, and Tennis Players. Derived from Figures 5b, 6b, 7b and 8b in our related Data Descriptor.

But what would explain this alignment? One alternative is to think that fame is an aggregation of individual choices. This would mean that individuals, on average, avoid liking those who do not truly deserve attention, since these go against the evolutionary biases that nudge our choices on whom to learn from. Another alternative is to think that fame rooted in accomplishments is more sustainable over long periods of time than fame rooted in simple rich-get-richer processes. An explanation for this would be that individuals who have produced true accomplishments are more frequently recalled as benchmarks to evaluate the accomplishments of similar individuals. Of course, our data descriptor paper cannot distinguish between these two alternatives, but it provides a novel dataset that scientists looking to study the dynamics of human attention and popularity can use to expand this line of research.

Ultimately, the goal of our paper is to provide quantitative and reproducible measures of global attention that we can use to improve our scientific understanding of questions about our collective memory and about how humans select what information to listen to and record.

Pantheon 1.0, a manually verified dataset of globally famous biographies, by Yu et al., is available online at Scientific Data.

[1] We are also biased to learn from those that are similar to us, in terms of age and other attributes. Henrich, J. “The Secret of Our Success: How learning from others drove human evolution, domesticated our species, and made us smart.” (2015).

[2] at the time of data collection (May 2013)


There are currently no comments.