« June 2009 | Main | August 2009 »

July 30, 2009

Lies, damned lies and download counts

lies.jpg
Shirley Wu posted on Friendfeed earlier about some of the things she'd overheard people saying about PLoS ONE papers. PLoS ONE Manging Ed Peter Binfield weighed in early to point out that the best way of combating misconceptions about the journal is to push out positive info and mentioned the journal's article-level metrics program.

Near the end of the (long) thread was this exchange:

"You could try asking them exactly how many downloads their last paper in a 'high impact' journal got... - Peter Binfield

Fair enough, but you know, I really don't think they think about that. They think "what will be in my CV?" and they think any journal that is somewhat competitive [includes other PLoS journals, BMC journals, etc] looks better than one that accepts anything that's methodologically sound. Again, not my view, but perhaps one that is held by many. Do people list # of downloads on their CV for publications? - Shirley Wu

They dont, because they dont have the data. However, people do list if their paper was rated by F1000; or if BMC designated it a 'highly accessed' article. So I think they will start to say "this paper was downloaded 5000 times in the first 3 months which put it in the top x% of all PLoS ONE articles, the top y% of all PLoS articles, and the top z% of ALL articles" (when the rest of the world starts quoting this data) - Peter Binfield"

Do people here think that article downloads stats should be put on academic CVs? (serious question)

It feels wrong to me. IMHO encouraging anybody to take download statistics seriously as a measure of success / quality would be a mistake. Taken on their own they're meaningless, surely - nice to know for the author, but meaningless. For them to be at all useful you'd have to supply a lot of context - as Peter suggests - though I don't think the journal level "top 10% of papers in first three months" context he outlined would be enough either.

(just to be clear I don't think Peter was necessarily saying that people should put only the download count on their CV - am using his comment above simply as a jumping off point for discussion)

A download counter can't tell if the person visiting your paper is a grad student looking for a journal club paper, a researcher interested in your field or... somebody who typed in an obscure porn related search that turned up unconnected words in the abstract. A search bot. Somebody on Google Images looking for free clipart. Got a blog? Check your traffic stats. Journals get those crazy queries too, lots of them. Mainstream search engines are a major source of traffic for journals but not always for the reasons publishers might want.

As a publisher do you account for this and only record 'good' traffic? What if your competition don't?

Institutions and ISPs transparently cache pages. If my lab mate and I both download your paper depending on the publisher's stats package it might register as only one hit (from the university proxy server). Do you compensate for that somehow?

Am I going to be penalized if I host my papers on my homepage? In my institutional repository? Should I add all those counts up for my CV? Do I need to cite my sources?

Should I tell my mum to set my paper as her homepage (and to be sure to delete her cookies each morning)?

If Science spends $50m on SEO next year and hits on their article pages double will the articles in 2010 be twice as good as those in 2009?

As an author should I be repeating keywords in my title to get more Google traffic? Should I try to include a figure of Britney Spears?

If we stick to giving 'top x percentage' context then do we make concessions for smaller disciplines publishing in multidisciplinary journals? More people work and publish in genetics than in quantum physics. Even if every important person in your field downloads your paper they might be outnumbered by grad students from the three dozen groups working on Rab4A effectors that download the genetics paper next to yours in the TOC.

I'm not saying that download stats aren't useful in aggregate or that authors don't have a right to know how many hits their papers received but they're so potentially misleading (& open to misinterpretation) that it doesn't seem to me the type of metric we want to be bandying about as an impact factor replacement.

July 26, 2009

Igor - a Google Wave robot to manage your references

(Google Wave hasn't been released yet but if you're interested in working with the preview you can request a developer account on the sandbox here)

Google Wave is a new open source project from Google that holds a lot of promise as a platform for scholarly communication. It's a little bit like email but allows for collaborative document editing, versioning and real time conversation within groups - check out Cameron and Martin's archives for more.

Igor is a proof of concept Wave robot that allows Wave users to pull in citations from Pubmed or their libraries on Connotea and CiteULike as they type.

To use it invite helpmeigor@appspot.com to join a wave.

Say you'd like to cite 'Chaperonin overexpression promotes genetic variation and enzyme evolution' by Nobuhiko Tokuriki and Dan Tawfik from last month's Nature.

In the Wave you'd write:

... as shown by Tokuriki et al. (cite chaperonin tokuriki)

Igor will notice the (cite x), connect to PubMed, search for articles where the title, authors or journal contain "chaperonin" and "tokuriki" and then pull in the relevant citation. The (cite x) will be replaced with a number and the citation will be appended to the end of the document.

... as shown by Tokuriki et al. [1]

References

1. Chaperonin overexpression promotes genetic variation and enzyme evolution. Tokuriki et al 2009 Nature
(http://www.ncbi.nlm.nih.gov/pubmed/19494908)

If Igor comes up empty handed or multiple articles match the cite query then it'll tell you by dropping in a message after the relevant part of the document.

To cite a web page just do

Google Wave (cite http://wave.google.com)

To switch to using your Connotea or Citeulike library you can use the (cite from x) command.

e.g.

(cite from citeulike dullhunk)
(cite from connotea euanadie)
(cite from pubmed)

You can switch between citation libraries in the same session:

candidate genes include NRG1 (cite from connotea euanadie)(cite schizophrenia neuregulin) and (cite from pubmed)DISC1 (cite PDE4B evans schizophrenia)

Commands are processed in order of appearance so Igor will search Connotea for "schizophrenia neuregulin" and PubMed for "PDE4B evans schizophrenia".

References are always numbered in order of their first appearance in the text. If you move a reference from the bottom of the article to the top then reference numbers will be change accordingly.

Igor is written in Java and runs on App Engine. It's almost inevitable that you'll experience some turbulence, especially when introducing him to a new Connotea or CiteULike account for the first time - Wave robots are very unforgiving of sites timing out. If something looks broken try leaving the wave and coming back to it later or reloading the page. Let us know how you get on!

July 07, 2009

Streamosphere update

This month's iteration of Streamosphere is now up. It's still more a preview than a product but imho it's approaching usefulness!

grid.png

The main changes are:

  • a new way of exploring the site - the list view shows you the most popular items within a given time frame. It's sort of like Digg but to vote an item up you need to have commented on it or shared it on a social media site.
  • simplified sidebar, visual cues on the grid / timeline view and a help link will hopefully help new users work out what they're seeing
  • the aggregation logic now uses Friendfeed's SUP feed and connects directly to Twitter, so messages are picked up much faster.
  • trending topics - this is a list of topics that are appearing more frequently than you might expect. Bear in mind that it's generated algorithmically so items are sometimes grouped together in odd (but technically correct ;)) ways...
  • clicking on "see details" in the list view or on an item in the grid view brings up a breakdown of comments and tweets which you can use to jump straight into a conversation on, for example, Friendfeed.

There are still lots of little niggles. On smaller timescales (anything under than four hours) there's lots of items that aren't strictly speaking about science, too. Still not sure if that's a bug or a feature.

The next version will focus on people - both the people being followed by Streamosphere and visitors to the site - and grouping items by topic.

July 01, 2009

"I am not a scientist, I am a number"

On Monday I was at the BioLINK Special Interest Group at the Intelligent Systems for Molecule Biology meeting in Stockholm. Amongst the many thought-provoking talks was one by Phil Bourne, he of the Protein Data Bank, SciVee and other goodies. Phil made a cogent plea for a system of unique identifiers for scientists.

Giving unique IDs to scientists is an idea that has recently moved quickly from being deep down wish-lists to being bang in the middle of many people's navigation screens. It would, to give just one example, make it much easier for a researcher to aggregate their contributions to science through many channels - journal publications, publicly deposited datasets, blog posts, whatever. What surprised me is how comfortable Phil is, or at least would be, to be assigned a number. He's right - being a number can be liberating.

"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 404 total comments.
The most recent three were on:
Powered by
Movable Type 3.2