Lorena Barba, a mechanical and aerospace engineer at George Washington University in Washington, DC, has long championed research reproducibility. In January, she traveled to Chile to run a weeklong course on reproducible research computing; the month before, she was awarded a 2016 Leamer-Rosenthal Prize, which celebrates those “working to forward the values of openness and transparency in research.” Here, she talks about flying snakes, “repro-packs,” and copyright.
1. Please tell us about your research and the key computational tools you use.
My lab has two foci. We study the aerodynamics of animal flyers and gliders, writing software to model those problems in fluid dynamics. In another project, we compute the physics of biological molecules interacting with biosensors, a task that requires differential equations. We use Python to develop the ideas, Jupyter notebooks to analyze data and organize results, and C++ to write heavy-duty code. For more computationally difficult problems, we use the CUDA language, which allows us to tackle them on Nvidia graphic processors. For collaboration, we use tools from the world of open-source software, like GitHub and Slack.
2. What is computational reproducibility, and how did you decide to focus on it?
I’ve always believed that the open-source model is ideal for science, as it exposes the complete sequence of steps that produces a given result. About five years ago, I made a pledge (the “Reproducibility PI Manifesto”) to always follow basic steps for transparency and rapid communication (for instance, posting articles on preprint servers). Since then, my students and I have honed our lab practices for reproducibility. But rather than reproducibility being our focus, it is the core of our methods, and it permeates everything we do. Computational reproducibility is a principle of conducting research so that any competent reader can re-create the results completely, to verify or build from them. It demands full transparency and the open sharing of code and data, but that’s not all: Reproducible research stands out for its high standards of process and practice.
3. You’ve written that the key to ensuring reliability and transparency in science is by “automating every step”. What does that mean, in practice?
Automating every step means turning protocols into code: Writing scripts to make plots of data rather than pointing-and-clicking on a GUI, running simulations with parameters fed from a configuration file rather than an interactive prompt, and so on. That creates a complete “recipe” in code—one that can be run, or shared, over and over again. Say you run a simulation of the flow around a flying snake (as we do!), and you want to test different scenarios to see what happens. If you try tweaking parameters “by hand,” you’ll have a hard time repeating the process later. And it will be impossible to give someone else a precise description of what you did, precluding reproducibility.
4. You told us about a strategy your lab uses in preparing publications called a “reproducibility package.” What exactly is that?
For every figure that presents some result, we bundle the files needed to reproduce it — input or configuration files used to run the simulation(s) behind the result; code to process raw data into derived data; and scripts to create output graphs — and deposit them together with the figure into an open-data repository, such as Figshare. Figshare assigns the bundle a DOI, which we then include in the figure caption so readers can easily find the data and re-create the result. Our lab uses these packages as test beds for our in-house software, to verify that the results haven’t been compromised by software modifications. And because we maintain a public history of all changes, we achieve what one of my students calls ‘unimpeachable provenance’.
5. You publish your figures from the repro-packs under a CC-BY license. What has journals’ response been to that practice?
To be perfectly honest, I was quite concerned that we’d run into trouble, but nothing happened. The first few times, I was very quiet about it. After several papers, I dared to talk about it at conferences, and on Twitter. The final step, after several years, was to present the whole idea to an associate general counsel of the university for her assessment. Four months and a dozen emails later, came her conclusion: journal copyright assignment rules allow for the possibility of a submission including excerpts of previously copyrighted works, and the author is responsible for obtaining permission to use these (in this case, from him or herself). She actually ended her note with, “Good for you for coming up with the idea.”
Jeffrey Perkel is Technology Editor, Nature.
Suggested posts
JOSS gives computational scientists their academic due
My digital toolbox: Santiago Perez De Rosso on Git, reimagined
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions