Climate data mavens alert: A £120,000 year-long research project at the University of East Anglia’s Climatic Research Unit (CRU) started this month, with the task of ‘improving ways of exposing climate data for re-use’.
Many of the investigations into the climate scientists’ e-mails leaked from CRU concluded that climate scientists should make available all the data and methodologies — including raw data and computer codes — that support their work.
This project, however, does not directly answer that challenge. Instead, it addresses a related point: it’s all very well dumping data and methodologies on the internet, but that in itself may not be very informative to an outside user. Also, researchers may be reluctant to do this unless they feel they will be credited for the data and computer codes they provide, which could be argued to be as worthy of citation as a research paper. There are, as yet, no standard ways to present raw data and methods, nor to easily cite them in a research paper.
One answer to this is led by ‘Datacite’, an international consortium to establish easier access to scientific research data on the internet, and to increase acceptance of research data as legitimate, citable contributions to the scientific record. Their idea is to classify bits of data with DOIs (digital object identifiers), the computer-readable handles by which research papers are currently classified and cited.
UEA will be testing the technicalities of this approach on four climate datasets. It will also work on linking its data, allowing an outside observer to understand the relationships between different datasets. (For experts: it will be testing out the ‘Resource Description Framework’ that is a model for data interchange on the web).
The project is one of eight funded by a total £660,000 investment call from JISC: “Climate science is by no means unique in the need for researchers to analyse complex data from a number of different sources. By showing how research data can be made more open, this programme will help achieve proper recognition for the essential place of data creation and management in the research process,” says Simon Hodson, a JISC programme manager (press release).
Perhaps the world-leader in this area is the Australian National Data Service, a set of web pages describing data collections produced by or relevant to Australian researchers. The US National Science Foundation is also starting to address data-sharing policies: from October, it will ask researchers to prepare data management plans when they receive NSF funding.