Storing data forever


From Nature Geoscience 3, 219 (May 2010)

Unlike accountants, scientists need to store their data forever. This expanding task requires dedication, expertise and substantial funds.

Data are at the heart of scientific research. Therefore, all data and metadata should be stored — forever, and accessibly. But it would be naïve to think that such a ‘gold standard’ of preservation could be achieved. In one spectacular example of the failure of science to save its treasures, some of NASA’s early satellite data were erased from the high-resolution master tapes in the 1980s. The lost data could now help extend truly global climate observations back to the 1960s — had they not been taped over. At the time, the storage capacity of the tapes seemed more valuable than the data they contained.

Until the introduction of full-scale supplementary information, ensuring that accessible records were kept was down to the authors. Of course, the loss of important information is unacceptable from a scientific point of view. But it is hardly surprising and probably widespread: scientists are not well-placed to guarantee continuity of data storage, especially while they are still in their vagabond years of PhD and post-doc work.

Nature Geoscience, in common with all the Nature journals, requires that authors make their data available on publication. The easiest way of ensuring that all the relevant information is accessible, and will remain so in the long term, is to use professionally run databases, which are now available for all sorts of Earth science data.

The creative push in science will always be for the production of better-resolved, more complicated data sets. Ingenious ways of storing and releasing these data are invariably developed with considerable lag. But this is not an excuse to neglect the issue. The preservation of valuable data sets and their distribution on demand is of utmost importance for the progress of science. The continuous attention of dedicated professionals — and substantial funds — is needed for database development to keep up with the science.


Comments are closed.