« Igor - a Google Wave robot to manage your references | Main | Four short links »

Lies, damned lies and download counts

lies.jpg
Shirley Wu posted on Friendfeed earlier about some of the things she'd overheard people saying about PLoS ONE papers. PLoS ONE Manging Ed Peter Binfield weighed in early to point out that the best way of combating misconceptions about the journal is to push out positive info and mentioned the journal's article-level metrics program.

Near the end of the (long) thread was this exchange:

"You could try asking them exactly how many downloads their last paper in a 'high impact' journal got... - Peter Binfield

Fair enough, but you know, I really don't think they think about that. They think "what will be in my CV?" and they think any journal that is somewhat competitive [includes other PLoS journals, BMC journals, etc] looks better than one that accepts anything that's methodologically sound. Again, not my view, but perhaps one that is held by many. Do people list # of downloads on their CV for publications? - Shirley Wu

They dont, because they dont have the data. However, people do list if their paper was rated by F1000; or if BMC designated it a 'highly accessed' article. So I think they will start to say "this paper was downloaded 5000 times in the first 3 months which put it in the top x% of all PLoS ONE articles, the top y% of all PLoS articles, and the top z% of ALL articles" (when the rest of the world starts quoting this data) - Peter Binfield"

Do people here think that article downloads stats should be put on academic CVs? (serious question)

It feels wrong to me. IMHO encouraging anybody to take download statistics seriously as a measure of success / quality would be a mistake. Taken on their own they're meaningless, surely - nice to know for the author, but meaningless. For them to be at all useful you'd have to supply a lot of context - as Peter suggests - though I don't think the journal level "top 10% of papers in first three months" context he outlined would be enough either.

(just to be clear I don't think Peter was necessarily saying that people should put only the download count on their CV - am using his comment above simply as a jumping off point for discussion)

A download counter can't tell if the person visiting your paper is a grad student looking for a journal club paper, a researcher interested in your field or... somebody who typed in an obscure porn related search that turned up unconnected words in the abstract. A search bot. Somebody on Google Images looking for free clipart. Got a blog? Check your traffic stats. Journals get those crazy queries too, lots of them. Mainstream search engines are a major source of traffic for journals but not always for the reasons publishers might want.

As a publisher do you account for this and only record 'good' traffic? What if your competition don't?

Institutions and ISPs transparently cache pages. If my lab mate and I both download your paper depending on the publisher's stats package it might register as only one hit (from the university proxy server). Do you compensate for that somehow?

Am I going to be penalized if I host my papers on my homepage? In my institutional repository? Should I add all those counts up for my CV? Do I need to cite my sources?

Should I tell my mum to set my paper as her homepage (and to be sure to delete her cookies each morning)?

If Science spends $50m on SEO next year and hits on their article pages double will the articles in 2010 be twice as good as those in 2009?

As an author should I be repeating keywords in my title to get more Google traffic? Should I try to include a figure of Britney Spears?

If we stick to giving 'top x percentage' context then do we make concessions for smaller disciplines publishing in multidisciplinary journals? More people work and publish in genetics than in quantum physics. Even if every important person in your field downloads your paper they might be outnumbered by grad students from the three dozen groups working on Rab4A effectors that download the genetics paper next to yours in the TOC.

I'm not saying that download stats aren't useful in aggregate or that authors don't have a right to know how many hits their papers received but they're so potentially misleading (& open to misinterpretation) that it doesn't seem to me the type of metric we want to be bandying about as an impact factor replacement.

Postgenomic TrackBack

Similar items from Scintilla

Comments

All good points Euan, and I am very pleased to see this debate start up. The transparent provision of usage data at the article level is uncharted waters for almost all publishers (exceptions are the Journal of Vision and the Frontiers series) and so an open debate will help to frame the data and the issues for people.

PLoS (not just PLoS ONE) will be providing usage data at the article level in the next few weeks.

The alternative, which you seem to be advocating, is to keep this data hidden and proprietary for fear it may be abused (i.e. the status quo - as practiced by almost all other publishers). If we were in a world in which citation metrics had never been provided then I think that many of the points you raised could equally apply to citation figures (or any other metric you care to name) - "Goodness - if we make citation data public, people will start to self cite; they will publish more review articles; they will load their articles with language relating to novelty; they will try to publish in journals which they feel might confer high citation rates on them" etc. The truth is, that all such measurement systems can be gamed, and none are ideal. However, we feel that open provision of the data will allow people to make their own decisions. Then, as to how people use (or abuse) it is at least up to them (and not denied them, due to lack of data).

We will also be providing a wide range of metrics (e.g. usage, citations, social bookmarks, blog coverage etc) which will make the data more valuable, and harder to consistently game. Once again, is it preferable to hide this data, or to make it open to scrutiny, analysis and debate? One final aside - I didnt really have the space in the FriendFeed discussion to go into this, but we will be providing averages (and other summary data) down to sub-discipline levels, and also providing the data for anyone else to do their own analysis on.

Hi Peter,

I'm actually in favour of exposing download stats - and as many other different article level metrics as possible. As you say, a wider range of metrics make the data more valuable and the system harder to game.

That wasn't my point though - I was asking if putting download stats without the proper context (and you'd need a lot of context) on your CV was a good idea. I don't think it is. Yes the existing citation counts system sucks - imho download stats suck harder.

Fair question. I may be a bit less concerned if I get a CV that lists downloads/month (or whatever) as it is just one more piece of information about the candidate. Most will realize that a download does not equate to an "impact index" for scientific pubs. Let's face it a CV is an ad and often activates the same defensive mechanisms as a car salesman. You take the content with a grain of salt. Good post - thanks.

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 400 total comments.
The most recent three were on:
Powered by
Movable Type 3.2