In response to the ‘computing in science’ post a couple of weeks ago, Anna Winterbottom asked about distributed computing and peer-to-peer networks, and whether we’d be covering them in this blog. I must admit to being pretty ignorant of these areas, especially in their applications to science, so I invited Anna to send me something that I could post. Here it is:
Controversy continues to surround peer-to-peer (P2P) networks as a result of ongoing court battles over music and film copyright. However, the efficacy of file sharing protocols such as BitTorrent in speeding up transfers of large data sets, like those involved in genome and phenome projects, is evident. In recent years, scientific uses of P2P have progressed from running large programs using distributed computing power, to developing analytical tools and facilitating multi-institutional collaboration.
Think, a collaborative P2P project begun in 2001, runs as a screen saver on thousands of personal computers, testing binding interactions of proteins against small-molecule drug candidates. Grid.Org now coordinates various similar projects.
The proliferation of bioinformatics software often means researchers struggle to keep up-to-date when selecting appropriate tools. Chinock, designed for self-administered P2P communities, uses Java or Perl to unify access to alignment software and facilitate comparisons of programs submitted using XML. Chinook’s effectiveness was demonstrated by an assessment of transcription factor binding site discovery programs.
A UN Working Group on science recently highlighted the utility of P2P networks for academic collaboration (PDF, 370K). An attempt to put this into practice is LionShare, a project of the Pennsylvania State University, with collaborators including the open source authentication project, Shibboleth, MIT’s Open Knowledge Project, and the P2P Working Group.
LionShare includes personal file servers and networking to support file sharing. To avoid the obvious disadvantage of P2P networks, that file availability depends on the users connected at a given time, ‘peer servers’ aggregate documents and provide persistent mirrors for files. Advanced search functions are being developed by building on Gnutella’s protocol.
The project’s most ambitious aim, however, is facilitating collaboration across academic institutions, using Shibboleth and Internet2’s EduPerson to overcome traditional administrative barriers. While LionShare is free for any member of a higher education institute to download, unlike most P2P networks, users are required to identify themselves, rendering copyright violations less likely.
A few words about Anna: She is setting up a free website for scientists www.firstauthor.org and also does some editing work for WitH Ltd., an Oxford-based science communications company.