Guest Post from Nathan Westgarth of Digital Science, the younger sibling business of Nature Publishing Group.
We’ve been hearing a common theme from the academic community – researchers are having difficulty managing and accessing their data. It seems to be an ongoing problem for research scientists, at any stage of their careers.
So we decided to do some investigation work and look at the stats from recent research studies into data management. What we found was a rather concerning picture.
The amount of research data being generated is currently increasing by 30% every year. Worryingly, one study has found that a massive 80% of scientific data is then lost within two decades and the odds of sourcing datasets decline by 17% each year (Vines T.H. et al. 2013).
As data output grows, effective data organisation is only going to get more difficult. If data continues to be managed poorly then science will ultimately suffer; experiments will be hard to replicate, findings called into question, papers retracted and careers will be impacted.
Our infographic tells the story of the impact of poor scientific data management in 9 worrying stats:
– Data output is growing rapidly
1. 90% of all the data in the world has been generated over the last 2 years.
2. Scientific data output is currently increasing at an annual rate of 30%.
– Despite significant investment, data is not being managed effectively
3. The current estimated total global spend on research and development is $1.5 trillion, which could be at risk.
4. Much of the data generated is lost – in one study, the odds of sourcing datasets declined by 17% each year.
5. The same study found 80% of datasets over 20 years old not available.
– Much of the data remains unverifiable
6. 54% of the resources used across 238 published studies could not be identified, making verification impossible.
– Time and money is wasted, impacting on science and society
7. Since 2000, over 80,000 patients have taken part in clinical trials based on research that was later retracted because of error or fraud.
8. The number of retractions due to error has grown over fivefold since 1990.
– Funders now require data management and sharing policies
9. 34 countries have signed up to the “Declaration on Access to Research Data from Public Funding”. Key funding bodies such as the NIH, MRC and Wellcome Trust now request data management plans be part of applications.
We think it’s time to start practising safe science and protect your data! But how?
One option is to make use of the various electronic laboratory notebooks available to help researchers collect notes and metadata about their research and protocols. Another option is to try to make the host of generic tools fit into existing research workflows. Some such tools proving popular are Evernote, cloud storage services like Google Drive and Dropbox, and code hosting sites like GitHub.
To help resolve this problem we’ve also started to develop some tools of our own at Digital Science. Two tools proving popular are Projects, a simple desktop app that lets researchers safely manage and organise their research data, and figshare, a cloud based repository where researchers can store their data, share it with colleagues, or make it publicly available and citable with a permanent DOI.
Have you been affected by issues with data availability and how do you manage your own research data?
The ‘Love Your Data’ infographic was produced by Projects, a Digital Science product.
Vines T.H. et al. (2013), The availability of research data declines rapidly with article age. Current Biology (24)1: 94-97.
About the Author: Nathan Westgarth joined Digital Science (Twitter) – a technology company that serves the needs of scientific research, as a Product Manager for Research Tools. Nathan manages the Projects (Twitter) product at Digital Science – a new research data management tool to help researchers organise their research outputs in a safe, simple and structured way.