P2P in science
In response to the 'computing in science' post a couple of weeks ago, Anna Winterbottom asked about distributed computing and peer-to-peer networks, and whether we'd be covering them in this blog. I must admit to being pretty ignorant of these areas, especially in their applications to science, so I invited Anna to send me something that I could post. Here it is:
Controversy continues to surround peer-to-peer (P2P) networks as a result of ongoing court battles over music and film copyright. However, the efficacy of file sharing protocols such as BitTorrent in speeding up transfers of large data sets, like those involved in genome and phenome projects, is evident. In recent years, scientific uses of P2P have progressed from running large programs using distributed computing power, to developing analytical tools and facilitating multi-institutional collaboration.Think, a collaborative P2P project begun in 2001, runs as a screen saver on thousands of personal computers, testing binding interactions of proteins against small-molecule drug candidates. Grid.Org now coordinates various similar projects.
The proliferation of bioinformatics software often means researchers struggle to keep up-to-date when selecting appropriate tools. Chinock, designed for self-administered P2P communities, uses Java or Perl to unify access to alignment software and facilitate comparisons of programs submitted using XML. Chinook's effectiveness was demonstrated by an assessment of transcription factor binding site discovery programs.
A UN Working Group on science recently highlighted the utility of P2P networks for academic collaboration (PDF, 370K). An attempt to put this into practice is LionShare, a project of the Pennsylvania State University, with collaborators including the open source authentication project, Shibboleth, MIT's Open Knowledge Project, and the P2P Working Group.
LionShare includes personal file servers and networking to support file sharing. To avoid the obvious disadvantage of P2P networks, that file availability depends on the users connected at a given time, 'peer servers' aggregate documents and provide persistent mirrors for files. Advanced search functions are being developed by building on Gnutella's protocol.
The project's most ambitious aim, however, is facilitating collaboration across academic institutions, using Shibboleth and Internet2's EduPerson to overcome traditional administrative barriers. While LionShare is free for any member of a higher education institute to download, unlike most P2P networks, users are required to identify themselves, rendering copyright violations less likely.
A few words about Anna: She is setting up a free website for scientists www.firstauthor.org and also does some editing work for WitH Ltd., an Oxford-based science communications company.

Comments
Another example of P2P use in science is the sharing of bibliographic data as enabled by Bibster: http://bibster.semanticweb.org/
Posted by: Enro | April 11, 2006 02:50 PM
As for development platforms that offer general purpose p2p functionality, see Sun's Project JXTA http://www.jxta.org. All manner of p2p applications can be developed with JXTA, which is available free of charge and is open source.
Mark
Posted by: Mark Petrovic | April 14, 2006 03:42 PM
I doubt that p2p file sharing would be an efficient means of distributing large, entire genomic sequences, or genetic data in general. Protocols like BitTorrent can be very fast, but only if there are a lot of users downloading the same data at the same time.
The download rate you get with BitTorrent and the like depends mainly on the number of seeders (users who are distributing without downloading) and leechers (users who are downloading and at the same time distributing what they have already downloaded), and also on the up- and download bandwith of leechers and seeders. All p2p file sharing methods are based on the idea to use the otherwise un-used upload-bandwith of users to speed-up the download for other users. With data like movies, pop music, popular applications, computer games etc this works really well; but there are only very few people who want to download large amounts of genetic data.
For example, if you download the entire data from the Human Genome Project, the FTP servers at NCBI will provide you with a download speed of around 300 KB/s. To achieve the same download rate with BitTorrent, you would need a swarm (seeders and leechers) which can provide 300 KB/s upload bandwith for each leecher. I guess there are only few people who will ever download the human genome (compared to people who will download a pirated copy of, say, Adobe Photoshop), and it is very unlikely that they will download it at the same time.
But it would be interesting if NCBI and other sites who distribute the HGP and other genome projects could set up a BitTorrent tracker (it's not difficult and would cost them nothing as the tracker needs almost no computing resources) in addition to their FTP servers; then one could see if p2p filesharing is really useful for providing genetic data.
Posted by: Anton Kratz | April 17, 2006 01:19 AM
Probably the most highly-developed distributed computing platform for scientific applications, leaving aside GRIDS, is the Berkeley Open Infrastructure for Network Computing (BOINC). I wrote about this recently in news@nature.com; article here.
Posted by: Declan Butler | April 19, 2006 04:07 AM
the speed of download in bitlord is too slow & this
url the tracker return 50 peers is showing.my broad band connection of bsnl but download speed is too slow only 15 to 17 kb/s
so which method to increase the speed of download if ip address thrugh but which method .
Posted by: rajeev bhagchandani | September 2, 2006 01:23 AM