We’ve been thinking about new features for Nature.com Blogs recently, after spending a lot of time on the back end doing boring yet vital things like enabling trackbacks for journal articles on nature.com.
One particularly cool new feature (potentially) is sentiment analysis. Nature.com Blogs already performs entity extraction, pulling out all of the names, places and things mentioned in each blog post. We use this to cluster posts about the same topic together in the “stories” section.
Sentiment analysis tries to give emotional context to entities. For example, if I blog:
“I love Biology. It rules, Physics drools”
and Nature.com Blog processes my post then it might store the following metadata alongside it:
<entities> <entity name="Biology" emotion="Positive" score="0.6" /> <entity name="Physics" emotion="Negative" score="0.3" /> </entities>
… here “Biology” and “Physics” are the entities; each has an emotion associated with it in the text. There are more positive emotions associated with “biology” than there are negative emotions associated with “physics” – that’s the score part.
Sentiment analysis is still a young field and frankly it gets things wrong a lot of the time. It’s also difficult to find a system that can do both entity extraction and sentiment analysis properly – to build a proof of concept I had to use a combination of Yahoo! Term Extraction and OpenAmplify.
Having said that, I think results over large datasets are promising. I’ve run a couple of thousand posts through the proof of concept system and compiled lists of the entities most strongly associated with positive and negative emotions in science blogs this week (published in the next couple of posts). Is this information useful? Interesting? Fun? Misleading? Any suggestions for how it might be presented are welcome!