Researchers shared their tools to help scientists use and share data more effectively at the 2015 Publishing Better Science Through Better Data conference.
Guest contributor Rehma Chandaria
The session of lightning talks at the 2015 Publishing Better Science Through Better Data conference was strategically scheduled to combat the post-lunch lull that often occurs. Five speakers had seven minutes each to tell the audience about their tools for helping scientists to use and share data more effectively.
Dr Sam Payne and Dr Balint Antal have both written programmes that allow researchers to collaboratively analyze and visualize large amounts of data. Payne of the Pacific Northwest National Laboratory in Washington State developed Active Data Biology, a tool for interactively exploring and analyzing ‘big data’. He demonstrated how the programme can be used to assess proteomics data in the form of a heatmap — you can click on various proteins, conduct real-time analytics, save the proteins you find interesting and look at what your collaborators have saved. Rather than having the information hidden away in your notebooks or in your head, everything is stored on GitHub so it’s transparent and available to everyone involved. Mineotaur, developed by Antal of the University of Cambridge, UK, is based on a similar idea. It is an open-source tool designed for biologists to explore high-throughput microscopy data. Mineotaur can also be used to share research findings and allow others to analyse them further. It can even be embedded in publications to allow readers to explore the data for themselves.
Building on this idea of using computer programming to analyse and share data, Dr Stephen Eglan spoke about writing a ‘data paper’. Eglan, a computational neuroscientist at the University of Cambridge, published a paper that included code describing how the figures were generated, allowing readers to conduct further analysis on the data themselves. “Do it for selfish reasons,” he said, suggesting that the approach is good for your future self when you look back at previous work. Embedding code can also come in handy at the reviewing stage of publication — one of his reviewers wanted a figure presented in a different way, so the reviewer simply used the code embedded in the paper to redo the figure himself!
This idea seemed fantastic but my concern was that, like many other biologists, I have no idea how to write code. When I brought this up with Eglan in the coffee break, he suggested I look up Software Carpentry and Software Sustainability Institute, two groups that run boot camps to teach researchers the basics of programming. This is definitely something I will be looking into. As Eglan added, “a little bit of code goes a long way.”
The remaining two speakers, Dr Rufus Pollock and Dr Gary Saunders, talked about open-access databases. Pollock from the non-profit network Open Knowledge gave us an overview of the Open Trials project, which aims to form a comprehensive collection of every clinical trial conducted around the world. Not all clinical trial results are disclosed, especially those trials that produce negative findings, so we don’t get a complete picture of how safe and effective medicines are. Therefore this project may help to improve patient safety. Saunders is the curator of the European Variation Archive (EVA), an open-access collection of data on genetic variation. Users can submit their data, which include metadata (a description of the data, making it easier to search and use), and information about the samples and methods. A permanent accession number is then assigned to the data set. The database facilitates the open sharing of data in a structured manner that makes it simpler for researchers to explore detailed information about genetic variation.
My take-home message from this session of lightning talks was that we can all use digital platforms to collaborate, share and analyse data more efficiently. From learning to write a bit of code, to using openly accessible online tools and databases, plenty of resources are available to help researchers get the most out of their data.