Primary research papers are the currency of academics, but they’re also part of a much wider body of knowledge that is restricted by a lack of transparency.
Guest contributor Lakshini Mendis
Historically, a great deal of trust has been placed in statements made in research papers for which the underlying data have not been shared. The invention of the laser was described in a paper containing just three data-points, for instance, and Watson and Crick first described the structure of DNA in a paper without any data at all. But with about 1,500 papers retracted since 2012, and 26.6% due to misconduct, scientific papers are now firmly under the microscope.
Improving the availability and readability of original research data would go a long way to improving matters. And as scientific publishers largely determine how research data is disseminated, their involvement will be central to any change. Speaking at Publishing Better Science Through Better Data in late October 2015, Dr Joerg Heber and Dr Andrew Hufton, editors at Nature Communications and Scientific Data respectively, emphasised that to make the most of research data it must be more open.
Overcoming the data-sharing challenge
According to Hufton, the status quo is for researchers to only share data with others directly. As well as being inefficient, data associated with published work disappears at a rate of about 17% a year as a result of researchers failing to properly catalogue findings. There is now, therefore, a move from scientific publishers to make data findable, accessible, interoperable and re-useable – or, to use an acronym as those of a scientific persuasion are so often inclined to do, FAIR.
Making data accessible is about a lot more than just providing access. “Dumping data is not enough,” Heber argued; it needs to be presented in a form that others can make use of. Supplementary information tables have proved adequate in the past, but as researchers acquire larger, more complex datasets this is no longer a viable option. Data repositories like OpenFMRI and PRIDE provide an online space to efficiently store and share data, but with so many different data types being generated there’s no one-size-fits-all solution. However, quality curation, a commitment to long-term preservation, and tools that support collaborative analysis are common to all good data repositories. In order for the data to be interoperable – to a standard that other people can understand and use – the context of the data, known as metadata, must also be shared.
The benefits of sharing data openly. Despite its challenges, openly sharing data in a useful format can bring great benefits. Scientific publishers have been driving change, spurred by retractions which cost taxpayers roughly $400,000 a time. The onus is now on researchers to realise the full potential of this changing publishing landscape, and both Heber and Hufton believe that researchers stand to benefit from sharing their data and metadata in a number of ways.

Dr Andrew Hufton, Managing Editor, Scientific Data. Presenting a workshop entitled: Beyond supplementary material: sharing data effectively through repositories and data journals{credit}Image credit: Lakshini Mendis{/credit}
Overcoming reproducibility challenges. There is growing concern about the number of scientific findings that cannot be reproduced. The true extent of this problem, and whether it can be attributed primarily to methods which are poorly shared and lack detail, remains to be seen. But Heber believes that openly sharing data will rectify this problem and improve reproducibility across groups. Additionally, Hufton said that this approach can support the reproducibility of work over an individual’s career, as it encourages better note-taking and cataloguing.
Delivering impact, visibility and credit. Sharing data via journals such as Scientific Data gives you an indexed, citable publication that guarantees credit for your work. From decades old research to computationally processed data, Hufton highlighted that because each dataset is citable, credit is given each time they are re-used. And because these citations are linked to the usual indices, it increases the impact of your work.
Meeting funder mandates. 34 research funding bodies around the world now require data to be archived; another 16 encourage it. Earlier this year, the UK’s Engineering and Physical Sciences Research Council launched an open research data policy which requires publicly funded research to be shared as widely as is possible without damaging the research process. In the US, sparked by a statement from the Obama administration in 2013, the National Science Foundation and National Institutes of Health now expect the timely sharing of data. It’s likely that such mandates will become the norm, and how to meet them should be considered as part of the experimental design.
Problem-solving through open data. Research supported with public funds must be held to account, and making data available is helpful in that respect. However, the usefulness of data sharing goes further. For instance, a lack of sharing of species distribution data could “hamper prospects of safeguarding biodiversity.” And during the recent Ebola outbreak, Heber said, data needed to be shared with relevant authorities as soon as possible – even before peer-review, as was encouraged by Nature. That same rapid dissemination of data is now accelerating research into other neglected diseases, such drug-resistant tuberculosis and malaria.
Heber and Hufton believe that the usefulness of data ultimately relies on the ability to integrate and re-use that information. While the changing research publishing landscape is driven by publishers, it’s up to researchers to embrace this change and realise its potential and theirs.
Further reading
Open Access publishing at Nature
BioSharing Information Resources: information about content standards, databases, and (progressively) data policies in the life sciences, broadly covering the biological, natural and biomedical sciences.
