There’ll always be reasons not to share data. It’s time we stop making excuses and start making plans, says Atma Ivancevic.
On the morning of October 26, 2016, a group of scientists convened in London to discuss the state of open data. The third Publishing Better Science through Better Data conference kicked off with morning tea, international introductions, and furious scribing from @roystoncartoons. The premise was simple: “Today is all about being open”, said conference chair Iain Hrynaszkiewicz. We settled in to learn the advantages of data sharing at both the individual level and for the scientific community at large.
“Open data should be easy,” said Dr Jenny Molloy from the University of Cambridge as she explained the importance of building a data management plan. She pulled up a poster of a missing black backpack: “CASH REWARD” it read, “contains 5 years of research data which are crucial for my PhD thesis!” I laughed along with everyone else, internally reflecting how similar my life had been before I discovered version control.
She went on. As a student or early career researcher, “organization is important!” Having a plan helps you stay on top of your data. Having a back-up ensures you avoid disaster. By starting early, you can develop techniques that will help you make the most of your early years in science. Understand your data, and document it well, before you consider sharing it with others.
This brought us to the next question: why should we share our data?
Publishing data online can lead to collaborations, as it presents you as an active participant in the field. Your citations will increase as other people expand on your research, and some institutions have even started asking for evidence of open science practices. Most importantly, your online presence is a way of marketing yourself. If your research is so robust and accessible that it can be used anytime, that presents you as a strong candidate for jobs in many fields.
Dr Molloy’s talk got me thinking. Everything she said made sense — but if it’s really that easy, why aren’t we already sharing our data? What’s holding us back from a worldwide hub of publicly available research?
There are, of course, real problems associated with big data management. For example, who maintains the open data? Students leave and professors change topics; who’s left to ensure that the data is kept and updated as needed?
The sheer volume of data generated in some fields presents another challenge. Every hour of every day, NASA collects gigabytes of data streaming from spacecraft to Earth. The Large Hadron Collider in particle physics produces petabytes of collision data each year. And the collection rates continue to grow, meaning that our storage methods have to evolve correspondingly. Dr Kevin Ashley compared data curation to Dwight Eisenhower’s immortal line on war: “Plans are useless, but planning is essential.”
By the end of the day, I’d come to a realisation: we’re not yet at the stage where we can begin to address the big issues. We regularly see experiments that can’t be reproduced because the methods are insufficiently explained, authors have the right to refuse data, and journals do not enforce open data. The current problem is not the data itself; it’s the inherent laziness of human beings. In essence, this conference was about the need to openly communicate our findings to the best of our abilities, and encourage others to do the same until it becomes a common scientific practice.
Share your data. It may get you recognition, citations, and job offers. If not, at least you’ve made it easier for yourself and your lab. Rejoice in the knowledge that by selfishly motivating your data management, you’ve selflessly helped other researchers floundering in this thing we call science.
Atma Ivancevic is a mathematician-turned-bioinformatician with a passion for writing. She is about to submit her PhD thesis at the University of Adelaide, Australia, describing the transfer of jumping genes between eukaryotic species. In her spare time, Atma can be found playing tennis, reading, or swimming at the beach. You can follow her on Twitter, ResearchGate, or GitHub.
You can access all the slides and videos from Publishing Better Science through Better Data 2016, as well as the great visual summary of the day, on the event website.
Suggested posts
Sharing data: Why it should be done
Has big data changed what it means to be a scientist?
Why don’t scientists always share their data?
#Scidata15: Make the most of your research: Publish better data
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions