« WikiProteins - are a million minds listening? | Main | The Life Scientists »

WikiProteins - a more critical look

wookie.jpg
WookieProfessional
Knewco is a life sciences software company whose mission is to help scientists find information on the web. Recently they launched WikiProteins which is a sort of database / community annotation mashup that pulls together information about, well, proteins and makes it editable by the public. There's a paper describing the system in Genome Biology, the author list of which includes Michael Ashburner (of Gene Ontology fame), Matt Cockerill (of BioMedCentral, which publishes Genome Biology) and Amos Bairoch (of Swiss-Prot), three people whose opinions about this sort of stuff I respect and admire very much.

Jimmy Wales is also on the author list. That's pretty cool although I have to admit that after reading the actual paper I wondered briefly how strictly the ICMJE guidelines for author attribution had been adhered to (Matt Hodgkinson has a good post about this). It's not because I've any reason to doubt Knewco - in any case Genome Biology is very good about listing author contributions - it's just that the temptation to give honorary authorship on celebrity researchers must be pretty strong, especially when it's such a good hook for a press release. I know that sounds cynical, blame Merck.

Anyway, here's the idea in a nutshell: WikiProteins has sucked in content from a bunch of existing scientific databases and identified the unique concepts (proteins, GO terms etc.) across all of that data. For each concept there's something called a Knowlet, which records the relationships between that concept and all the other concepts in different ways. Using the data from the Knowlets they've linked different concepts together into a 'concept web'. Each concept also has an associated wiki page on the system which is publicly editable. By default each wiki page contains relevant information from the existing scientific databases mentioned previously.

As concepts I like both the semantic web and community annotation of databases. Both ideas have been around for years and it's great that we're finally reaching the point where we can start putting them into practice.

Unfortunately WikiProteins is a bit rubbish.

It seems almost churlish to say that about a site that only launched two weeks ago and there's been some debate about this within Web Publishing (you can take the above as individual rather than collective opinion) so I'm willing to stand corrected but as it is the site just doesn't do it for me.

My objection is primarily to the implementation, not the vision. Having said that I probably wouldn't have written this post if I hadn't been irked by all the 'we call on a million minds' rhetoric and associated hype coming from Knewco. If you talk the talk...

.... make sure that your site is at least fit for purpose, which WikiProteins isn't. Considering that scientists have only just gotten to grips with Wikipedia wouldn't it have been better to spend a little bit more time on the user interface, which is cluttered and confusing?

Speaking of Wikipedia and without singling out any Knewco executives - dudes, there are notability criteria to meet before you can create a page for yourself there (edit histories are public, remember). No biggie but considering the 'wikipedia for professionals' tag you probably want to read the help pages more.

There's a very high crap to content ratio and the actual information about concepts is condensed into a tiny sidebar. I consider myself fairly au-fait with scientific databases and the web yet it still took me a couple of hours to properly grok how the site works. I find it
hard to believe that a thousand bench scientists - never mind a million - are going to get past the initial barrier to entry, browse, register and then actively contribute.

Once you do start using the site it becomes pretty obvious that for all of the cleverness of the RDF backend the actual content is nothing special. I guess that's why there's a call for community annotations but there's not much of a lure in sub-GeneCard level stats on proteins.

I'm concerned that the site has no mirrors, no export facility for the wiki data and a one way data suck - content from public databases and wiki users goes in, nothing comes out. That well respected OA proponent Jan Velterop is CEO of Knewco is some reassurance on this front but also makes it doubly disappointing that they're currently all talk and no trousers. "Eventually we may have [the community annotations] available in a suitable form for downloading"? Seriously? For proper community collaboration that's really not good enough. What if Knewco goes bust? What if they can't support themselves through advertising (dozen staff, ads on sparse wiki content, you do the math) and decide to switch to a subscription model?

What do you get out of participating, in any case? A system like this needs to keep track of provenance and provide recognition for good contributors. There's nothing like this in WikiProteins which strikes me as a missed opportunity.

I'm disappointed by other aspects of Knewco's corporate approach to science on the web: firstly Knowlets is a stupid name, secondly the trademark sign makes it more so, thirdly the actual technology is patent pending. I have great difficulty reconciling software patents with the spirit of open scientific progress - which is what I feel Knewco is implying that you'll be working towards if you contribute to WikiProteins. Very uncool, Knewco.

The vision underlying WikiProteins is a worthy one and the fact that Knewco has actually implemented something while everybody else just talks about it deserves credit. If this was an academic project I'd chalk my criticisms up to the site's beta status and be cheerleading with BoingBoing, but the fact is that Knewco is a venture funded, for-profit company whose business is trafficking the work of scientists. There's nothing wrong with that - Nature is also profit making, after all - but it means we need to hold them to higher standards.

Postgenomic TrackBack

Similar items from Scintilla

Comments

Note that Knewco was a major sponsor of the Swiss-Prot 20 conference
[http://www.swissprot20.org/exhibit]. There's more I could say but I'll leave it at that :-)

I have a different viewpoint. I applaud these guys for putting together something that offers people the chance to make edits to their favorite proteins or GO concepts. The fact that they have pre-filled the database goes a long way to helping the public get started. And this also satisfies the databases owners because their data is read only, keeping it as an archive. This first beta version may not have the easiest interface but I imagine there will be incremental improvements. All the other scientific wikis I have looked at were just blank slates, making the activation energy an impossible mountain. I don’t know why you are so opposed to the company aspect as it seems your blog is hosted by Nature. Are you sure this just isn’t sour grapes on your part because this wiki will be a competitor to the Nature molecular pages?

You said "I'm concerned that the site has no mirrors, no export facility for the wiki data and a one way data suck"

Actually, this is one of my biggest worries, too. I also fear that an unclear intellectual property protection model will people hold-back from contributing real knowledge?

I personally would not mind contributing, but there should be some rewarding system, e.g. the ability to access things I have contributed to or getting some appreciation from the community. I hope this will not turn out to be a knowledge sink.

Edits to Swiss-Prot entries in the ProteinWiki should trigger notifications to the Swiss-Prot curators (or at least that is what was supposed to be set up). The curators will then review and (if warranted) integrate the changes into Swiss-Prot. So at least this data won't be lost or locked up somewhere (not that I think that is Knewco's intention). I imagine other source databases have similar arrangements. Bigger concerns IMHO are conceptual issues such as the way changes are propagated back (editing an entry creates a "fork" that is no longer updated when the orignal entry changes...) and all the quite obvious "usability" issues.

@ J Andries:

True, Euan was a tad more provocative than my previous post about WikiProteins, but there are a couple of things I think pretty much most people here at Nature Publishing Group would agree with. One is that database curation is an intensive process that isn't easy to scale up to genome-scale levels. And the second is that if anybody manages to get scientists to contribute in mass to a wiki then we'd be clapping as loudly as anybody. Really!

I am Peter-Jan Roes and I am the lead developer of the relational wiki part of the WikiProfessional portal. I read the blog post with much attention, however, I am a bit lost when trying to relate comments in the post to the exact parts of the WikiProfessional application. Unfortunately there is not any link to any page in the post to support or clarify the author's problems with the system. I would be happy to know which pages were visited by the author triggering his commentary.

Having that said, I would like to comment on some frequent remarks found here as far as the relational wiki is concerned. People seem to have two major problems with the system which I might alleviate a bit.

The first problem would be that people are afraid that there is no way to attribute edits to the user that contributed it. However, the relational wiki stores a full history of incremental changes to a concept, along with the information about the user who made the change. This information is available in two ways. First, every concept page has an associated history page which shows the versioning information. Here is an example. Second, a full log of all changes to the database is available through the recent changes page. This page can be filtered on user name to find out about the recent changes applied by individual users.

The second problem would be that data cannot be retrieved from the system once it has been added. Although export facilities are currently quite limited, we are certainly planning on making the data available through a web service API. Currently we even have such an API call in place, for instance metacaspase 9, Arabidopsis. This functionality is currently used in the WikiProfessional portal to show the community concept information from the relational wiki in the right hand side bar of the navigator. However, anyone could use this call to fetch the same information from the system. We are planning on making the information available in a more friendly and complete format.

Finally, I want to add that of course we are very interested in comments from users on how to improve on the system's usability. You can leave your comments on the Support page.

Thanks for your comment, Peter-Jan. I apologise if the post seemed somewhat troll-y. ;)

When I was complaining about usability I was thinking of the main ConceptNavigator page with the Knowlet in the middle, and the associated wiki pages that look like this (not objecting to the layout of the latter, just that to casual viewer it looks like gibberish and certainly doesn't invite you to edit anything).

To an extent, though, my usability gripes are subjective. My main issue with WikiProteins is with the openness - or rather lack thereof - of the project, that it pulls in public data and strongly encourages people to contribute but doesn't give anything back in return. You've addressed some of those concerns, it's good to know that you're working on an API, I look forward to using it. ;) Do you know roughly what the timeframe for implementation is? Will there also be any bulk download options?

Although the rather sour blog by Euan is quite an exception in the overall positive reactions we receive on the beta site of WikiProteins, I feel that a matter-of-fact reaction from the lead author of the article in Genome Biology that announced it is warranted. It goes hereby.

First of all on Authorship: Jimmy was instrumental in making the initial contacts between me and Gerard Meijssen who was then working on WiktionaryZ, now Omegawiki. He also gave invaluable advice on several aspects of the system and he therefore deserves as much of an authorship acknowledgement as the average senior author/professor who ‘conceived of the study’.
See also Gerard Meijssens’ Blog about that

On the interface etc., we all know this is beta and we struggled for a long time to make it as ‘good’ as it is. Obviously a flat file is easier than managing a relational database and therefore the interface can never be ‘really easy’. I agree with Peter Jan that constructive criticism would have been more useful.

Criticism on the commercial nature (as it were) of a company on a blog made available by another commercial company – one that makes money on others’ scientific contributions for as long as we have been studying nature – is a bit peculiar as well. With the involvement of Amos Bairoch, Michael Ashburner , Mark Musen, Abel Packer, Roberto Pacheco, Matt Cockerill and many others in this process, not to mention Jan Velterop’s reputation, it seems to me that the OA nature of the projects is sufficiently safeguarded.
With my personal background in malaria, working for 15 years with colleagues in developing countries, I also built a public track record in pushing free access to information for developing countries.

The content in WikiProfessional applications is completely freely available under the Creative Commons Attribution license (we are working on making author credits more clearly visible). The Knowlets are indeed proprietary as we create added value and apply algorithms that by themselves now have taken several million dollars to develop. It has proven exceedingly difficult to get sufficient public funding for this project, which has been carefully internationally discussed and prepared for several years. Bill Melton and Al Berkeley are to be highly commended for taking the risk to fund the vision.
Also the Knowlet space is in Open Access for non-commercial use. I sincerely hope that seasoned investors like Bill and Al would be more imaginative than trying to monetize this site and the others to come by ads only.

On potential fear of competition: let me tell everyone up-front that the authors on the paper have every intention to connect all information on important concepts via WikiProfessional, not trying to put it behind any barrier or to compete with anyone. Some may see us as a competitor to IHOP or Wikipedia pages on biomedical concepts for instance, which is not true, as you will soon see.
We are planning to add locally maintained databases on genes such as www.dmd.nl to the appropriate concept page in WikiProteins much more prominently placed than today (now an indirect link via SwissProt data), but also locally-maintained databases on single gene mutations such as the growing number of Leiden Open Variation Databases (LOVD’s). We have a project starting to map all concepts in WikiProfessional, including all biomedical concept pages, to the corresponding pages in Wikipedia and other emerging wiki’s. People who find the WikiProfessional interface too difficult will be soon able to contribute to their own wiki of choice and their contributions will be seen in Wikiprofessional anyway.

We collectively ‘own’ that basic data and anyone is free to ‘add value’ to these and make that ‘added value’ freely available to all or just for public not-for-profit use. Knewco is just one of the companies that derives value from the data and has decided to make the added value available to the scientific community for free.
I cannot wait until Nature will be Open Access as well, at least as far as the scientific articles are concerned. Then it will be easier to make full use of Nature content for the benefit of the scientific community.

One more point on equity and access: The collaboration with our Brazilian colleagues, with whom I co-developed and signed the Salvador Declaration on Open Access, referred to in the supplementary data of the Genome Biology paper, will soon result in crossing the language barrier to Spanish and Portuguese.
The record for my beloved ‘malaria’ in Omegawiki will show you our ambition on in how many languages we would like to support in the indexing on-the-fly. For Free.

I hope these further explanations take away at least the worst of Euans fears. I see in today’s version of the blog that he did not only change the original title of the contribution, but I also saw a more balanced reaction to Peter-jan Roes.
However Euan, if you still feel that some of your comments were justified and not yet properly addressed, please substantiate your claims and in the process it is highly appreciated if you give some constructive criticism. You would really help the community – and us – by doing that. Let’s keep discussing this project to make it better.

Euan, couldnt agree more.The site is very very underwhelming!.
I loved reading the paper and when I checked out the site. Very very disappointing.
Although I like to think I am an early adopter , I wouldnt recommend any serious scientist to waste their professional time on this site. Their knowlets seemed very sparse. Most knowlets had the most obscure papers that were listed as their basis. If this is what the semantic-science-web has as its first offering ..the next one better be way better

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Powered by
Movable Type 3.2