TechBlog: Git: The reproducibility tool scientists love to hate

{credit}PLOS Comput Biol, 12, e1004668 (2016){/credit}

Early in his graduate career, John Blischak found himself creating figures for his advisor’s grant application.

Blischak was using the programming language R to generate the figures, and as he iterated and optimized his code, he ran into a familiar problem: Determined not to lose his work, he gave each new version a different filename — analysis_1, analysis_2, and so on, for instance — but failed to document how they had evolved.

“I had no idea what had changed between them,” says Blischak, who now is a postdoctoral scholar at the University of Chicago. “If the professor were to come back and say, ‘which version did you use to create this figure?’ I would have had no idea.”

Later, while attending a workshop on basic research computing skills, he discovered a better approach: Git.

Continue reading

#SciData15: Get more out of your research data

Researchers shared their tools to help scientists use and share data more effectively at the 2015 Publishing Better Science Through Better Data conference.

Guest contributor Rehma Chandaria

naturejobs-blog-Scidata15-Lightning-Talks

{credit}Image credit: SCIENTIFIC DATA/LUDIC GROUP{/credit}

The session of lightning talks at the 2015 Publishing Better Science Through Better Data conference was strategically scheduled to combat the post-lunch lull that often occurs. Five speakers had seven minutes each to tell the audience about their tools for helping scientists to use and share data more effectively.

Dr Sam Payne and Dr Balint Antal have both written programmes that allow researchers to collaboratively analyze and visualize large amounts of data. Payne of the Pacific Northwest National Laboratory in Washington State developed Active Data Biology, a tool for interactively exploring and analyzing ‘big data’. He demonstrated how the programme can be used to assess proteomics data in the form of a heatmap — you can click on various proteins, conduct real-time analytics, save the proteins you find interesting and look at what your collaborators have saved. Rather than having the information hidden away in your notebooks or in your head, everything is stored on GitHub so it’s transparent and available to everyone involved. Mineotaur, developed by Antal of the University of Cambridge, UK, is based on a similar idea. It is an open-source tool designed for biologists to explore high-throughput microscopy data. Mineotaur can also be used to share research findings and allow others to analyse them further. It can even be embedded in publications to allow readers to explore the data for themselves. Continue reading