News blog

Sequence read archive won’t be lost

A doomed database of data from next-generation genome sequencing projects will live on at a privately hosted mirror site.

In Feburary, the US National Center for Biotechnology Information said that it would close the Sequence Read Archive, a storage space for raw data from next-generation sequencing projects, due to budget cuts. Today, DNAnexus based in Mountain View, Calif., said it will host a mirror site of the archive and has developed better tools for researchers who want to use it.

The company said today that it has raised $15 million from private sources, led by Google Venture and TPG Biotech, and that Google Cloud Storage is hosting the mirror site.


Researchers can freely search publicly available sequencing datasets on the archive. DNAnexus says it has made improved search and browse portals for the archive. Researchers had criticized the old archive for being difficult to use.

One of those researchers, Rob Knight of the University of Colorado at Boulder, said that the ease of uploading and accessing data, and the ability to store more information about the source of the individuals sequenced, would make or break the new service. He also said it should adopt community standards for sequence data that have been developed by groups such as the Genomic Standards Consortium.

Knight also said that the migration of the SRA into the cloud is a taste of things to come: “The failure of NCBI to host SRA points towards the necessity of cloud computing for large datasets, and NIH and other agencies will increasingly need to move in this direction,” he wrote in an email.

DNAnexus will also allow users of the archive to import data into its analysis and visualization tools for free for the next 30 days, after which time they must pay $10 per gigabase of data to use the tools.

Andreas Sundquist, CEO and co-founder of the company, said that bioinformatics firms have traditionally not achieved great commercial success, but that this will change because the share of sequencing project costs due to data management is skyrocketing.

“The cost of data management is beginning to overwhelm the cost of these projects,” Sundquist said. “We think there is a huge opportunity as a company to build technologies that will allow you to unlock the promise of DNA-based medicine without having to think about building a datacenter.”

Comments

Comments are closed.