With the emergence of web-based lab notebooks, digital image “enhancement”, and the quick and easy (and possibly dirty) generation and dissemination of colossal amounts of data, it’s becoming increasingly clear that technology provides new challenges to maintaining scientific integrity. In an attempt to tame the beast while it still has its baby teeth, the US National Academy of Sciences released a report today that provided a framework for dealing with these challenges: “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age”.
The NAS commissioned the report back in May 2006, when the editors of Nature, Cell, Science, and Nature Cell Biology wrote a letter to NAS president Ralph Cicerone regarding the recent rise in cases of data manipulation.
But that was merely “the straw that broke the camel’s back”, said Daniel Kleppner of MIT, who co-chaired the committee that produced the report. “It was one of the many problems that had been brought up about the challenges due to digital technology, and the way it’s changing the culture of the practice of science.”
On 21 July Kleppner and co-chair Phillip Sharp, also of MIT, held a briefing for their sponsors, including Nature. They focused on three core issues of research data — integrity, accessibility, and stewardship — but kept the recommendations general because practices vary so widely from field to field.
“Over the last several years the academy has dealt with the issue of culture change — how researchers view data, how central the data is to researchers, and what we’ve used historically to maintain integrity, access, and stewardship of data,” said Sharp. “Data integrity has to be integral to the field or it won’t exist.”
One of the key issues is peer review, which, though vital, is limited in its ability to deal with the massive amounts of data often produced in, for example, simulation experiments. The best way to deal with this is the wisdom of the crowd — i.e. correcting errors or fraud by collective scrutiny, ideally before others have tried to build upon them and mine them for meta-analyses.
Accessibility, Kleppner maintains, is both inevitable and “seminal to the advance of science”, he says. “Research data, methods, and other information integral to understanding the data should be publicly reported and publicly accessible.”
But while accessibility is all well and good, it brings up at least one thorny issue: no one wants to make their data public before they’ve had a chance to review, validate, and, most importantly, publish.
Kleppner and Sharp also raised the issue of stewardship — preserving data for future purposes — and the pressing questions of who should decide which data to keep, how long it should be kept for, and, most importantly, how it should be stored.
Kleppner hopes the report sparks discussion in the scienfitic community, at the level of individual researchers, their institutions, sponsors, professional societies and scientific journals.
“We all know it’s happening,” he says. “But it’s very difficult to assimilate it all. Now there’s a new generation coming in, and this is the world they grew up in. We need to start this now.”