Hoarding research data is a “serious impediment” to the scientific process, the UK Royal Society says in a report out today, ‘Science as an open enterprise’. It urges researchers — and their funders and institutions — to “shift away from a research culture where data is viewed as a private preserve”.
The study, more than a year in the making, also emphasizes that merely making data openly accessible is not good enough. What’s needed is an “intelligent openness”, the report says: “mere disclosure of data has very little value per se.” That’s true when scientists talk to scientists, but is especially relevant when non-specialists are trying to access and understand data (such as, to take two recent examples, on climate science or the effects of nuclear radiation).
Many scientists already appreciate the value of sharing data sets in organized public databases, simply because it provides more efficient and creative ways to do research, says Geoffrey Boulton, the chairman of the study’s working group, and a professor emeritus of geology at the University of Edinburgh, UK. But some still cling possessively to their data, and top-down constraints, such as the lack of recognition for generating and communicating data, also blunt the urge to share.
Academic journals have explicit policies that require authors to make their data available, but rates of compliance are low, the report notes. Last year, a team led by John Ioannidis, an epidemiologist at Stanford University in California, looked at 500 papers published in 50 top biomedical journals in 2009, and found that of the 351 papers covered by a data-availability policy, 59% didn’t adhere to that policy, and only 47 papers deposited full raw primary data online (Alsheikh-Ali A., et al. PLoS ONE 6: e24357; 2011).
Some scientists also haven’t adjusted to the huge data volumes of modern research, says Boulton. “Research groups are getting petabytes of data but dealing with it as if it were the small volumes of yesteryear,” he says. Good data management doesn’t come cheap — perhaps 1% of the cost of research for large repositories, and up to 10% for small research groups. (The cost comes from employing specialist data scientists, not the storage and backup of the data itself.) But like a sophisticated piece of equipment, data collection should be viewed as part and parcel of doing better science, and data-management techniques should be pre-specified in grant funding, as the US National Science Foundation now requires.
As for how best to make data available to non-specialists, “too many politicians have the illusion that scientific data can be made readily available through an Excel spreadsheet,” says Boulton. In the framing of the report, data need to be not just accessible, but also “intelligible”, so might need to be cast in multiple forms to meet the needs of specialist and lay audiences; “assessable”, so that disclosure of sources, funding, methods and other influences allow audiences to make a judgement of the trustworthiness of claims; and “usable”, meaning that data are accompanied by explanatory metadata. Freedom of Information (FOI or FOIA) requests that don’t produce data to these requirements aren’t good ways of opening up access to research, the report says: “we all know that FOI requests can just produce swamps of data that no one can make head or tail of,” Boulton adds.
Sometimes there are commercial, personal privacy, safety or security reasons for research data to be kept confidential. Even so, these can be overblown. “The economic rationale for tighter control of intellectual property by universities is dubious,” the report says, for example — quoting evidence that early intellectual-property protection may work against longer-term economic benefit.
The report’s headline recommendations to learned societies, journals and funders seem rather generic; although the UK government is urged to “consider a major investment” in areas such as training data scientists and providing software tools for research. But the Royal Society will follow up its reports with a programme of meetings, discussing the extent to which scientific data should be considered as an international public good and made available freely or at affordable cost.