Open data is the new normal, says Anastasia Greenberg.
The 2017 Better Science through Better Data event in London, UK, hosted by Springer Nature and Wellcome, was a full day exposé of emerging open data practices, tools, strategies, and policies. Among the potential benefits of open data are replicability, reproducibility, and reusability. While open data is a relatively new hype, some evidence suggests that open data does indeed increase reproducibility.
The 2017 State of Open Data Report, handed out to every attendant at the event, states: “Open data is like a renewable energy source: it can be reused without diminishing its original value, and reuse creates new value”. In the opening remarks of the event program, Iain Hrynaszkiewicz, head of Data Publishing at Springer Nature, wrote that the event is meant to showcase the experiences of researchers and technologists who are “walking the walk of open, reproducible research”.
Among those walking the walk was Thomas Lecocq, a seismologist who is involved with the Gräfenberg array: a series of seismological stations in Germany that record continuous seismic data and make it available in real-time through an open source platform. Using the “noise” data — the part of the continuous data signal free of recorded earthquakes — Lecocq showed that the signal can be used for monitoring changes to ground water storage on a very fine spatial scale. This is an ideal example of how data that is collected for a given purpose, such as earthquake detection, can be reused in highly creative ways for novel purposes.
Neuroscientist Cyril Pernet discussed his decision to share painstakingly obtained clinical neuroimaging data of patients with tumours through a Data Descriptor publication in the journal Scientific Data. In addition to expected citations, this open data publication led to several collaborations that came with a few perks such as a joint grant and the founding of the European Network for Brain Imaging in Tumours, which he says will launch soon. Eager young researchers with analytical prowess have also started to apply new algorithms to the data; skills that fell short of Pernet’s expertise. Personal success aside, Pernet believes that scientists should share their data for a more altruistic reason: open data allows researchers in countries that may not be able to afford regular experimental work to nevertheless apply their analytical talents.
A more urgent call for a shift to open data culture came from Arul George Scaria, who presented some striking numbers: India has high rates of corruption in publishing; with high paper retraction rates and a common practice of purchasing paper authorship for a fee. Scaria’s recent survey showed that this phenomenon is coupled with low data sharing practice rates and negative attitudes towards data sharing in India. A shift toward open data practice could improve accountability and reproducibility.
A major theme throughout the day was the need for open data tools. To reap the benefits of open data, the FAIR data guiding principles establish that data must be findable, accessible, interoperable, and reusable (‘FAIR’). The idea is to turn open data theory into practical reality. That laudable goal requires managing data to make it easily downloadable, in a standardized format, including all the necessary metadata. Metadata is key for making the data set both discoverable for a machine using search algorithms and understandable to a human that wants to eventually analyze the data.
A host of data repositories that attempt to address all of these concerns exist, including Dataverse and FAIRDOM. A paper’s webpage can then link the reader to the data and source code associated with the discovery: what data manager Jez Cope called the “Holy Trinity” of paper-to-code-to-data.
The event wrapped up with a panel discussion on what the future of open data might look like from the perspectives of the major stakeholders in the science game — the researcher, the publisher, and the funder. From the perspective of the researcher, establishing a common ground for managing and sharing data – especially sensitive data – appropriately can move attitudes towards the direction of open data. From the point of view of the funder, the hope is that as strong data management plans and sharing track records are increasingly being taken into consideration by funding bodies, open data practices will improve. As for the publisher, Editor-In-Chief of Nature Communications, Magdalena Skipper, thinks that the future of open data will mean offering a larger variety of publication types; in line with what Scientific Data is already offering with its Data Descriptor articles. This will allow researchers to be rewarded with output types beyond the traditional scientific paper.
A final question from the moderator asked the panelists whether a move towards transparency and openness in science will overcome the aspects of science-denial and post-truth in today’s political landscape. The panelists agreed that scientists have a responsibility to society not only to publish papers and share their data, but to communicate effectively with the public about their work. It appears that when it comes to the future of open data, scientists must not only walk the walk — they must also talk the talk.
Anastasia Greenberg is a neuroscientist turned law student, currently working on her international law degree (B.C.L/LL.B.) at McGill University in Montreal, Canada. Anastasia obtained her PhD in Neuroscience from the University of Alberta where she studied the effects of rhythmic electrical stimulation on neocortical activity in the context of sleep-dependent memory consolidation. She then decided to pursue an unconventional path with her current legal education in the hopes of bringing scientific thinking and empirical-based practices to legal and policy decision making. You can find Anastasia on her website, LinkedIn and Twitter.