The advent of big data has caused scientists to rethink data sharing, but several problems are preventing it from happening, says Nina Divorty.
Guest contributor Nina Divorty
“If I have seen further, it is by standing on the shoulders of giants.” – Isaac Newton.
This classic quote sums up the nature of scientific collaboration: only by building on the work of our predecessors can we make scientific advancements, and only by sharing our own discoveries can they be built upon by others. Most researchers understand this, but only since the recent surge in technologies that generate very large datasets have we begun to recognise the value of sharing raw data, in addition to publishing results in their processed and polished form. The advantages are clear: raw data offers complete transparency so that other scientists can compare their own results and analyses when attempting to replicate findings, and also allows others to ask novel questions of existing datasets. Despite this, the majority of researchers across a variety scientific disciplines report that lack of access to data detracts from the progress of research in their field, yet 64% admit to not making their data easily accessible. So what’s stopping them?
Publish or perish
Every scientist knows the personal and professional investments required to do good, thorough science – huge amounts of time, energy and sometimes emotional attachment are devoted to generating high quality research data. Researchers understandably want a return on these investments, and published papers are the currency of choice, as determined by the powers that be (funding bodies, review panels, etc) – hence the ‘publish or perish’ culture that pervades life science academia. This attitude is detrimental to sharing because many researchers worry about being first to publish, or being ‘scooped’ using their own data. It will take big changes to alter this mentality, but first and foremost, impact metrics and research institutions must start recognising the merit of data sharing and collaborative working.
Money, money, money
In addition to publications, commercial activity is becoming a hot priority for universities. While this undoubtedly has benefits in terms of generating high-quality research with real-world applications, concerns about intellectual property policies, such as material transfer and nondisclosure agreements, are another major reason researchers choose not to share. However, many organisations now encourage open innovation, the value of which has been proven by successes such as the MRC’s work with therapeutic monoclonal antibodies. Early sharing of data on this subject encouraged a rapid development of the science, after which selective commercialisation led to big profits to be re-invested in research. The sustained profitability and continuing innovation in this field demonstrate that data sharing can have commercial, as well as scientific, benefit.
What’s mine is (not) yours
Is it OK to share your data when it’s not all ‘yours’ in the first place? One consideration specific to the life sciences is the privacy of human patients or donors who contribute to the data. Researchers have a responsibility to protect the rights of individuals and ensure informed consent – a 2006 survey found that many people are open to their medical data being used in research, but only when they are informed and confidentiality is guaranteed. This becomes very difficult if data are widely shared, an emerging problem that can only be solved by developing purpose-built governance frameworks.
Old habits die hard
Probably one of the greatest barriers to data sharing is simple lack of knowledge and infrastructure. Although we’ve come a long way since the days of seeking out dusty journals on library shelves, the regular generation of large, digital datasets is still a relatively new development in biological science. Many fantastic online repositories already exist, but more needs to be done to educate researchers about both the scientific value and the practical aspects of data sharing.
In short, there are many reasons scientists don’t share data, even though most appreciate the value of the practice. The digital age gives us a wonderful opportunity to foster a new model for scientific collaboration. As more technology becomes available and the behaviour is better incentivised, hopefully a ‘sharing culture’ will develop in which data are routinely made available for re-use, meaning it can continue to benefit future generations for years to come.
Nina Divorty is a runner up in the 2015 Scientific Data writing competition. She is also a PhD student at the University of Glasgow, where she is using molecular pharmacology approaches to find new drug targets for cardiovascular disease. She also writes about biology and biotechnology for student science magazine theGIST, and enjoys sharing her love of science by taking part in outreach activities. She hopes to continue with science writing and communication after her studies.
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions