CORRECTION: A previous version of this blog quoted OSTP Director John Holdren as saying , “The world is now producing dadabytes of information,” when Dr. Holdren in fact said “zettabytes,” which is the correct term.
Scientists love data, there’s no denying it. But the ballooning quantities of data that are gathered today in virtually every field of research, from climate science to genomics, presents a dilemma of unprecedented magnitude: how best to use it?
A broad spectrum of answers by six US science-related agencies and departments was on offer on 29 March as the administration’s Office of Science and Technology Policy (OSTP) unveiled a US$200-million initiative to improve the management, analysis and sharing of the vast data sets now accumulating as a consequence of federal research.
“The world is now generating zettabytes — that’s ten to the twenty-first power or a billion trillion bytes — of information every year, and that number is growing with extraordinary speed,” OSTP’s director and presidential science adviser John Holdren told reporters at a news briefing today in Washington DC. “It became clear that there really was a strong case for a national big data initiative that multiple agencies could contribute to.”
The impetus for the effort comes from a December 2010 report released by the President’s Council of Advisors on Science and Technology (PCAST). (Download report here.) The report stresses both the scientific and economic benefits that stem from leveraging big data sets and made a host of recommendations related to data research and cyberinfrastructure.
Elements of the report were echoed today as the heads of various agencies described how they would implement their share of the initiative. “The goal is to move from data to knowledge to action,” said Holdren.
– a $10-million award from the National Science Foundation (NSF) to researchers at the University of California at Berkeley to explore ways of integrating algorithms, machines, and people
– the National Institutes of Health (NIH) will make data from their 1000 Genomes project, the world’s largest data set on human genetic information, publicly available on the Amazon Web Services cloud
– the Defense Advanced Research Projects Agency (DARPA), is launching a programme called XDATA that puts $25 million annually towards developing new software and computational tools to bring data analysis up to speed.
(An extensive fact sheet with a complete list of projects can be found here.)
Several projects are aimed at providing a new generation of data-driven researchers with appropriate training to work in a data-rich landscape. “We are very determined to try to provide those training pathways to increase the population of such skilled individuals and also to provide in some instances training programs for individuals who need to acquire those skills mid career,” said NIH director Francis Collins.
Although not represented at the briefing, other agencies such as NASA and the National Oceanic and Atmosphere Administration will be engaged in the effort as well, Holdren said.
“Many different individuals in and out of science and government have been thinking about big data for a long time and I’m sure many of those individuals had the thought that we weren’t doing enough,” said Holdren.