Data Matters: preservation of data

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

SD_Advisory_150

Mark Thorley

Head of Science Information and Data Management Co-ordinator at the National Environment Research Council, UK

This month’s theme is preservation. Mark Thorley identifies two issues: one is preservation of the data, and the second is preservation of the knowledge to use data, which tends to be lost as people move on and retire, meaning that we lose knowledge about how to reuse data effectively as a result. Continue reading

Data Matters: interview with Mark Thorley

Mark Thorley is the Head of Science Information and Data Management Co-ordinator at the National Environment Research Council, UK.

What are the current preservation practices in your field?

Preservation practices in environmental science are varied; some are quite good, some are less good. We’re probably better off than some other areas as it has long been recognised that in certain areas of environmental research your research depends on building up long term time spheres of data to look for change, so people have had to undertake good digital preservation or data preservation for a long time now. From my perspective, we know we’re reasonably good at managing long term large scale datasets, the community resource datasets, but probably more at risk are smaller scale individual experimental datasets, where they’re managed more by the researcher or the institution rather than by national or international facilities. Continue reading

Data Matters: interview with Robert Cook

Robert CookRobert Cook is a scientist at the Distributed Active Archive Center (DAAC), Oak Ridge National Laboratory, USA

What are the current preservation practices in your field?

There is a range of practice. Some agencies are very good about requiring that data from their funded projects be preserved. And some investigators are very good about preserving their data, while others are not quite so good. Some researchers see the benefit of sharing their data, and so they prepare the data well, making sure it’s available for others to use. Others perhaps don’t have the background in data management to prepare data to share, or perhaps they’re just using it themselves and don’t really need to prepare it for others to understand and use. We see this range in our archive here at Oak Ridge National Lab, a programmatic archive for the NASA terrestrial ecology programme, there are some that prepare data really well and for others it’s very difficult. Continue reading

Data Matters: interview with Russell Poldrack

Russ PoldrackRussell Poldrack is Professor of Psychology and Neurobiology and Director of the Imaging Research Center at the University of Texas in Austin.

What are the current data preservation practices within your field?

Data preservation practices are really non-existent. If people do anything it’s usually saving something to DVDs or tapes, and then sticking it somewhere to rot. I’ve spoken to my colleagues, trying to find some of the early landmark datasets of fMRI papers to put into OpenfMRI (openfMRI.org), but most of them either say, we can’t find the data anymore, or it’s on a tape but we don’t have the drive that can read it anymore. I have data from 10 years ago on various tape formats that I couldn’t get to if I wanted to, though it seems that the technology has stabilised a bit. The other worry is that you put it on a DVD or a hard drive, but those things decay; people often have the assumption that once you put the data onto physical media it will be there as long as you want it, and that is definitely not the case. I think the best strategy is to replicate data geographically across as many different systems as possible so that there’s no single point of failure. Continue reading