Speakers at #SciData15 advocated for a wider degree of awareness of the field of data science and the implementation of data sharing technologies.
Guest contributor Caroline Weight
“We must engage in the idea of sharing,” said conference chair Iain Hrynaszkiewicz as the 2015 Publishing Better Science through Better Data meeting kicked off at the headquarters of Nature Publishing Group (NPG) in London on 23rd October.
Hrynaszkiewicz, who develops new areas of open research publishing and data policy within NPG/Macmillan, noted that 30 funding bodies — including the Engineering and Physical Sciences Research Council and The Royal Society — have written policies that outline requirements for data-sharing. Examples include detailed methods and protocols, microscopy images and mathematical workings, as well as meta-datasets of, for example, genotypes and microarrays.
The meeting’s aims were to increase awareness of ways to effectively share data and to discuss how to improve the efficiency, implementation and overall impact of sharing among the scientific community. A recurring issue throughout the day was how to enforce sharing, and get the concept to become part of standard, everyday scientific practice –one that seeps into the lives and habits of working researchers.
The answer repeatedly offered was relatively simple: start with the students. Today’s PhD programmes stress the importance and necessity of accurate and detailed data recording, and credit such practices as a formal part of coursework. In the fullness of time, this ethos should trickle into working laboratories. Coupled with persuasion of principal investigators, funding bodies, journals’ manuscript requirements and general peer pressure, acceptance of this inevitable change will spread, said conference speaker Stephen Eglen, a reader in computational neuroscience at the University of Cambridge, UK.
Of course, any new policy is likely to be met with resistance: in the case of data-sharing, researchers cite concerns such as lack of time, money and clear instructions. These obstacles must be surmounted, Eglen said. “You can either drown with the data science wave, or get a surf board, jump on and ride with it.”
The first session of the day covered ‘Research data for Discovery’: tools and technologies for data sharing. Jeremy Frey, professor of physical chemistry at the University of Southampton (UoS), UK, noted that both data-sharing and the tools and technologies used to do so, are forever changing — at a fast pace. He added that the knowledge of how experiments were conducted is key to the understanding and interpretation of results, and important for moving science forward. It is where all the useful information actually resides. “Data is the new spice.”
Subsequent speakers throughout the day reiterated that message – and lamented the fact that most data collected from experiments are lost, or sit unused. The revolution lies in the maximum extraction of data from each study, they said. Indeed, data sharing was defined by Matt Sydes, a senior scientist and medical statistician at University College London, UK, as using data for reasons beyond that for which it was originally collected.
Frey, for his part, referred to the so-called DIKW hierarchy pyramid concept to drive home the importance of data-sharing: Data is information, Information is Knowledge, and knowledge is Wisdom. He turns to LabTrove, a digital research notebook developed at the UoS, to encourage his staff and students to submit regular entries that accurately detail their experiments. These entries are visible to specified users, including collaborators. Frey also pointed to programmes such as OneNote and Matlab as further examples of interactive digital data exchange.
In Sydes’ talk, the topic turned to sharing clinical trial data. Independent verification of results ensures the reproducibility of data and trust in a trial’s scientific analyses, he noted—and also offers potential for new trials or the impetus to develop improved methods. He cautioned that controlled access and regulated re-use will be required for clinical data-sharing if its true potential is to be realized. And he stressed that active engagement between scientists and formal agreements are necessary to protect the subjects — a responsibility that falls to everyone involved.
In summary, the conference opened my eyes to the development and potential of data sharing. A wider degree of awareness of the field of data science and the implementation of data sharing technologies will inevitably lead to adaptation and change. It is truly an exciting time to be involved the world of science.