Who leaves comments on scientific papers - and why?
I drafted a long paragraph for here about how science publishers are generally rubbish at commenting implementations, but if you read journals online you already know this. Can you name three science publishers that allow online commenting? When last did you leave a comment on a paper? Have you ever?
BioMedCentral are doing their best to buck the lack of comments trend. BMC are quietly innovative in lots of ways - they've allowed online commenting on all of their papers for the past five years, for a start. Over time they've built up a reasonably large database of user comments. Matt Cockerill at BMC kindly sent us over a dump of all comments submitted since their system's inception (all of BMC's data, including the comments, are open access) and I figured it'd be interesting to do a bit of analysis on the dataset.
So.... who leaves comments? How frequently? What are they about?
The basics
From November 2002 to July 2008 BMC journals accumulated 945 comments from 753 different users on 732 different papers.
How widespread is commenting at BMC?
BMC have published 37,916 papers since they launched. That means that ~ 2% of BMC papers have attracted comments. Most of BMC publishes low to medium impact papers (this is not a reflection of the quality of those papers, I hasten to point out), it'd be interesting to see if the percentage is any higher when you only look at their higher impact journals. I didn't have time to do that.
The graph below shows the number of comments attached to papers grouped by the year of publication. Perhaps unsurprisingly there's an upwards trend. I guess this is partly because more people are becoming familiar with commenting and partly because there are more papers to comment on.

Who leaves comments?
~ 1/3rd of comments on papers are left by the authors themselves, either in order to present supplemental information or to make readers aware of errors or typos in the manuscripts or to reply to an earlier comment by somebody else (authors added their comment to an existing thread 37% of the time).
64 people (~ 8%) have commented on more than one paper. Only two dozen people have commented on more than two different papers. In other words it's not the same people commenting all the time (as you find with blogs).
What are the comments about?
Broadly speaking the comments fall into seven categories (there is some overlap):

Updates from authors 25%
Authors providing corrections, updates and replies.
"As of 4/13/06, correspondence for Peter K. Rogan should be sent to:"
"Please DO NOT utilize the version of the software as currently available. It will not function as specified ... [we're working on a fix]"
"Please note that there is a typo in the phosphopeptide sequence in the METHODS section"
One reason this percentage is so high is that it seems authors are encouraged to comment on their own papers if they spot typos or other errors in their published manuscripts, as it can take BMC a wee while to sort out corrections in the PDF and HTML. This way readers can be informed of errors immediately.
Requests for clarification 8%
Readers asking for more information from the authors.
"Were the PAO2 values directly estimated by sampling end tidal alveolar gas?"
Interpretation and see also... 22%
Readers suggesting how the results of a paper might be interpreted, often pointing others towards more, relevant research.
"Recent research data suggests that the lethality of the H5N1 strain..."
"We would like to inform readers of this paper of our prior efforts in the area of PCR primer design"
"Researchers planning to follow the 'plan A' approach discussed in this paper may benefit from first checking out..."
Direct criticism 17%
Readers pointing out possible flaws and errors.
"We think, however, that the authors have overlooked an important confounder when adjusting the relationship between exposure and allergic disease.."
"The fact that the network is trained on only 20 samples and validated on the rest 20 at the end of the flow schematic does not mean that a correct validation has been done"
Bonus material 17%
Things that could plausibly have been included in the paper. Quite common in the bioinformatics journals. Links to datasets, implementations and software downloads.
"The described method is available as an R script and can be found at..."
"The software described in this article is available online at http://dulci.org/sage/."
"I have made the test data we used for this paper available from http://biotext.org.uk/ on the Downloads page."
Other comments 8%
Appreciative comments.
"I feel that this is a timely commentary, addressing the issue of semantic enrichment of our scientific literature"
"This paper is, in my opinion, the by far most clear and up to the point paper I ever read on the analysis of microarray data"
Crazies 2%
Self explanatory.
Takeaway message?
The quality of comments at BMC is high and the vast majority add value to the paper, though the numbers involved are relatively low (would a larger audience reading higher impact papers be different?).
Perhaps unsurprisingly comments on papers are not like comments on blogs; they're far more formal (only 8% of comments were of the chatty, supportive variety) and it's not the same people coming back each time (with the exception of the crazy 2%).
