Mortality research leads to spin-off technology that mines Facebook and Twitter for opinions

The interactive multi-touch ‘Magic Wall’ first used by CNN in the 2008 presidential elections will be back for the channel’s election night coverage of the US midterm Congressional elections on 2 November – but with a difference. As well as using the wall to explore election results coming in, it will also be using software developed by Cambridge, Massacusetts, start-up Crimson Hexagon to mine and visualize millions of Twitter tweets and Facebook, to judge the mood of online citizens across the country, and their reaction to the results. CNN first used the software earlier this year to monitor reactions to President Obama’s State of the Union address (see video here). What’s less well known is that Crimson Hexagon’s technology, licensed from Harvard University, had its genesis in epidemiological research.


Gary King, a statistician at Harvard University in Cambridge, Massachusetts, tells me how it was his research on mathematical models for verbal autopsy (see my Nature article this week on verbal autopsies for more on the latter) that led him to develop the social networks analysis software. “I was working on verbal autopsies, and also a project where we downloaded every English language blog post on the US presidential elections, where we wanted to know what people were thinking.” But their efforts to analyse the unstructured blog text was proving a “disaster,” he says, until the verbal autopsy research provided the solution.

Verbal autopsy is a technique which assigns a probable cause of death based on interviews with families about the deceased’s prior symptoms, which then is used to estimate the distribution of causes of deaths, known as cause-specific mortality fractions (CSMFs). These are crucial to setting health-system and research priorities, in poorer countries lacking medical certification systems, where more than two-thirds of the world’s population lives.

King was researching a promising approach involving probabilistic models, where instead of trying to assign a single cause of death to each case at all, he instead calculated the probabilities that various disease symptoms are associated with a death, and then aggregated those probabilities across an entire set of cases. “One day I came in, and I realized that the verbal autopsy problem was mathematically the same problem as the presidential one,” recalls King. “I realized I could apply the verbal autopsy method to text, where the symptoms in verbal autopsy are words in text, and death categories are the categories into which you want to put blog posts” to estimate the distribution of blog and Tweet opinions on a given topic.

It worked. King went on to become co-founder and chief scientist of Crimson Hexagon, whose customers (as well as CNN) also include Microsoft, Hewlett Packard, Thomson-Reuters and Dow Jones, who use the technology to judge public opinions and perceptions. King has made the software available free (installation instructions here) for academic research on social networks.

Leave a Reply

Your email address will not be published. Required fields are marked *