C. Titus Brown, a bioinformatician at the University of California, Davis, participated in a January workshop at Caltech on “The Paper of the Future,” and wrote about the experience on his blog. Here, he expands on how academic publishing may change in the years to come.
1. Why do we need to change the way papers are written or communicated?
Though our digital world is dynamic, scientific papers remain static. There’s no easy way for researchers to engage the authors on an article’s home page, for instance, to extract data for reanalysis, nor to re-execute the computational steps. Several themes along these lines emerged from the Caltech workshop. A common touchpoint was support for detailed and repeatable computation, where a paper’s entire computational workflow can be automated and easily re-executed. Victoria Stodden made the point that scientists are perhaps too wedded to the notion that published work is both “done” and “correct”, which has implications for how we view data availability, reanalysis, and meta-analysis. Yolanda Gil focused on automated extraction and meta-analysis of information from publications. I used my time to outline specific technological and conceptual advances that could advance the future of publishing.
2. What do you think the paper of the future will look like?
It’s already here, if you know where to look. “A decision underlies phototaxis in an insect”, by E. Axel Gorostiza et al., is a beautiful example of how to “package” an experimental paper together with all its materials. Even better, senior author Björn Brembs wrote a blog post about it that explains the meaning and context of the paper to non-experts like me. Of course, this is the paper of the near future, since Brembs and others are already using these approaches; we just need to figure out how to drive adoption. As for the far future, who knows?
In any event, I’m not convinced we know how papers “should” be written or communicated; it’s easier to talk about important goals. First, primary research papers must contain the details (data, source code, models, statistics) necessary to replicate any results. They should contain context for the study, and at least some guarded interpretation of the results. And they should be archivable, so that we can revisit the paper in 5, 15, or 50 years and be able to read and understand it in some detail.
“Data implies software,” is how I put it: in order to make any practical use of data, we must have software that can read and work with it. Other issues, like peer review, openness, executability, and reusability, are mere details – critical and necessary, but ultimately secondary.
3. What efforts now underway are making that future a reality?
The single most concrete effort underway in this regard is the Jupyter project, which allows researchers to combine Python code, documentation, and results in an electronic notebook. Their recent grant proposal, “Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science”, is breathtaking in its blend of practical capabilities and long-term vision. The project is building a system that takes advantage of many of the new capabilities that the Internet and the Web enable. And it already is widely used in academia, industry, journalism, and teaching.
4. What hurdles stand in the way of making the future paper a reality?
Scientists are notable for being incredibly conservative, and have no strong incentives to explore new practices. Publishers have a vested interest in stagnation as well, because the current system supports their profits. But perhaps the biggest obstacle comes from institutional compute clusters. These centers are typically slow to adopt new technologies like virtualization and containerization, and many scientists are addicted to the cheap compute and free support they offer. However, with federal infrastructure programs like XSEDE JetStream, the NIH Cloud Commons, and the DOE computing systems supporting cloud computing and containerization (e.g. with software like Singularity), I’m optimistic.
5. What can researchers do today to ensure their data are reproducible and future-ready?
For non-technophiles, the best investment of their time is probably in training workshops like those offered by Software Carpentry and Data Carpentry, and integrating the concepts and practices they learn there in their own research. Over time this will naturally lead to greater repeatability and better practice, and have the side benefit of building “communities of practice” specific to each research field.
(Note: Brown is an instructor and member of the Advisory Committee for Software Carpentry, and his spouse, Tracy Teal, is cofounder and executive director of Data Carpentry.)
Jeffrey Perkel is Technology Editor, Nature