« A Catalog for Nature.com | Main | Google, Obama and God: good. H1N1, Elsevier and Merck: bad. »

Sentiment analysis on science blogs

FeelingsLP.jpg We've been thinking about new features for Nature.com Blogs recently, after spending a lot of time on the back end doing boring yet vital things like enabling trackbacks for journal articles on nature.com.

One particularly cool new feature (potentially) is sentiment analysis. Nature.com Blogs already performs entity extraction, pulling out all of the names, places and things mentioned in each blog post. We use this to cluster posts about the same topic together in the "stories" section.

Sentiment analysis tries to give emotional context to entities. For example, if I blog:

"I love Biology. It rules, Physics drools"

and Nature.com Blog processes my post then it might store the following metadata alongside it:

<entities>
   <entity name="Biology" emotion="Positive" score="0.6" /> 
   <entity name="Physics" emotion="Negative" score="0.3" />
</entities>

... here "Biology" and "Physics" are the entities; each has an emotion associated with it in the text. There are more positive emotions associated with "biology" than there are negative emotions associated with "physics" - that's the score part.

Sentiment analysis is still a young field and frankly it gets things wrong a lot of the time. It's also difficult to find a system that can do both entity extraction and sentiment analysis properly - to build a proof of concept I had to use a combination of Yahoo! Term Extraction and OpenAmplify.

Having said that, I think results over large datasets are promising. I've run a couple of thousand posts through the proof of concept system and compiled lists of the entities most strongly associated with positive and negative emotions in science blogs this week (published in the next couple of posts). Is this information useful? Interesting? Fun? Misleading? Any suggestions for how it might be presented are welcome!

Postgenomic TrackBack

Similar items from Scintilla

Comments

Hi Euan,
This is a nice (and fun) idea :-)

I would release the metadata as RDF , using URI (e.g. "http://en.wikipedia.org/wiki/Physics" , "http://en.wikipedia.org/wiki/Happiness" ) instead of literals.

FYI: There is also a "Emotion Markup Language Incubator Group" at W3C: http://www.w3.org/2005/Incubator/emotion/.

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 387 total comments.
The most recent three were on:
Powered by
Movable Type 3.2