WikiProteins – a more critical look



Knewco is a life sciences software company whose mission is to help scientists find information on the web. Recently they launched WikiProteins which is a sort of database / community annotation mashup that pulls together information about, well, proteins and makes it editable by the public. There’s a paper describing the system in Genome Biology, the author list of which includes Michael Ashburner (of Gene Ontology fame), Matt Cockerill (of BioMedCentral, which publishes Genome Biology) and Amos Bairoch (of Swiss-Prot), three people whose opinions about this sort of stuff I respect and admire very much.

Jimmy Wales is also on the author list. That’s pretty cool although I have to admit that after reading the actual paper I wondered briefly how strictly the ICMJE guidelines for author attribution had been adhered to (Matt Hodgkinson has a good post about this). It’s not because I’ve any reason to doubt Knewco – in any case Genome Biology is very good about listing author contributions – it’s just that the temptation to give honorary authorship on celebrity researchers must be pretty strong, especially when it’s such a good hook for a press release. I know that sounds cynical, blame Merck.

Anyway, here’s the idea in a nutshell: WikiProteins has sucked in content from a bunch of existing scientific databases and identified the unique concepts (proteins, GO terms etc.) across all of that data. For each concept there’s something called a Knowlet, which records the relationships between that concept and all the other concepts in different ways. Using the data from the Knowlets they’ve linked different concepts together into a ‘concept web’. Each concept also has an associated wiki page on the system which is publicly editable. By default each wiki page contains relevant information from the existing scientific databases mentioned previously.

As concepts I like both the semantic web and community annotation of databases. Both ideas have been around for years and it’s great that we’re finally reaching the point where we can start putting them into practice.

Unfortunately WikiProteins is a bit rubbish.

It seems almost churlish to say that about a site that only launched two weeks ago and there’s been some debate about this within Web Publishing (you can take the above as individual rather than collective opinion) so I’m willing to stand corrected but as it is the site just doesn’t do it for me.

My objection is primarily to the implementation, not the vision. Having said that I probably wouldn’t have written this post if I hadn’t been irked by all the ‘we call on a million minds’ rhetoric and associated hype coming from Knewco. If you talk the talk…

…. make sure that your site is at least fit for purpose, which WikiProteins isn’t. Considering that scientists have only just gotten to grips with Wikipedia wouldn’t it have been better to spend a little bit more time on the user interface, which is cluttered and confusing?

Speaking of Wikipedia and without singling out any Knewco executives – dudes, there are notability criteria to meet before you can create a page for yourself there (edit histories are public, remember). No biggie but considering the ‘wikipedia for professionals’ tag you probably want to read the help pages more.

There’s a very high crap to content ratio and the actual information about concepts is condensed into a tiny sidebar. I consider myself fairly au-fait with scientific databases and the web yet it still took me a couple of hours to properly grok how the site works. I find it

hard to believe that a thousand bench scientists – never mind a million – are going to get past the initial barrier to entry, browse, register and then actively contribute.

Once you do start using the site it becomes pretty obvious that for all of the cleverness of the RDF backend the actual content is nothing special. I guess that’s why there’s a call for community annotations but there’s not much of a lure in sub-GeneCard level stats on proteins.

I’m concerned that the site has no mirrors, no export facility for the wiki data and a one way data suck – content from public databases and wiki users goes in, nothing comes out. That well respected OA proponent Jan Velterop is CEO of Knewco is some reassurance on this front but also makes it doubly disappointing that they’re currently all talk and no trousers. “Eventually we may have [the community annotations] available in a suitable form for downloading”? Seriously? For proper community collaboration that’s really not good enough. What if Knewco goes bust? What if they can’t support themselves through advertising (dozen staff, ads on sparse wiki content, you do the math) and decide to switch to a subscription model?

What do you get out of participating, in any case? A system like this needs to keep track of provenance and provide recognition for good contributors. There’s nothing like this in WikiProteins which strikes me as a missed opportunity.

I’m disappointed by other aspects of Knewco’s corporate approach to science on the web: firstly Knowlets is a stupid name, secondly the trademark sign makes it more so, thirdly the actual technology is patent pending. I have great difficulty reconciling software patents with the spirit of open scientific progress – which is what I feel Knewco is implying that you’ll be working towards if you contribute to WikiProteins. Very uncool, Knewco.

The vision underlying WikiProteins is a worthy one and the fact that Knewco has actually implemented something while everybody else just talks about it deserves credit. If this was an academic project I’d chalk my criticisms up to the site’s beta status and be cheerleading with BoingBoing, but the fact is that Knewco is a venture funded, for-profit company whose business is trafficking the work of scientists. There’s nothing wrong with that – Nature is also profit making, after all – but it means we need to hold them to higher standards.


Comments are closed.