A new website that allows people to share scientific data using the principles that underlie hugely successful peer-to-peer systems has been unveiled by University of California, Davis scientists, who present the tool this week in the open access journal PLoS ONE.
“Before BioTorrents, there wasn’t a really good way for any researcher to share and easily distribute their large dataset or results instantly,” says lead author of the new paper, biologist Morgan Langille. As datasets get larger and demand for open access increases, current methods that don’t scale well for large files cause long transfer waits.
For example, in a single month, National Center for Biotechnology Information users download datasets from 1,000 Genomes (8981 GB) around 100,000 times.
Co-author and biologist Jonathan Eisen’s lab sequences genomes of microorganisms and gets constant requests for raw and processed data. “It would be more convenient for us and probably for other people if we made that available through a site like this,” he says.
BioTorrent works like Napster from the turn of the millennium and current systems, which have sometimes attracted the ire of music and film industry associations over the fact they are sometimes used to exchange copyright-infringing files.
“Certainly people use convenient file sharing systems to share things they’re not supposed to share, like large pirated movies or mp3 files,” says Eisen. The site will be semi-curated, though “there’s no doubt that this tool could be used to do some nefarious deed, like any other tool”.
The BitTorrent protocol splits data from large files into small pieces (as little as 514 Kb), allowing transfer of datasets between computers containing full or partial copies. Built-in error checking means users receive an exact copy of the original. Bandwidth is shared among all computers in the transaction, instead of having a single source provide all the required bandwidth—allowing an easier exchange compared to large repositories and personal or institution servers.
But for a torrents system to work, lots of people need to participate. Its speed and effectiveness depends on the number of peers, especially those with complete copies who can act as seeds. The sum of available bandwidth grows as the number of transfers increases, scaling indefinitely.
The end result: faster transfer times, less bandwidth requirements from a single supplier, and decentralization of data. “So why not have a BitTorrent page dedicated to biological datasets?” says Eisen.
BioTorrents doesn’t require an official submission process or accompanying manuscripts, and it could expedite science for time-sensitive events (such as H1N1 or SARS outbreaks) and between large international groups of collaborators. “Anyone can post their most recent data, software, or results immediately,” Langille says.
“Someone could download all the Nature papers and post them there, but we’re not encouraging that,” Eisen jokes. All PLoS papers are already on BioTorrents.