Julia Stewart Lowndes, a marine data scientist at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara, published a paper this week laying out the challenges her team faces as they try to share and reuse data on the world’s oceans. Here, some key lessons.
You are part of the Ocean Health Index. What is the OHI, and why is data reproducibility so important for its work?
The Ocean Health Index (OHI) is a scientific framework to measure and track the health of the world’s oceans. OHI assessments have transparent methods and reproducible workflows that are available for other researchers to use and build upon. Our team at NCEAS and Conservation International assess oceans globally every year. Additionally, we support independent groups that incorporate their own data into the OHI framework to assess and manage their own regions. Currently about 20 groups are leading OHI assessments, including the governments of Mexico and Indonesia, and scientists in Sweden who are assessing eight countries bordering the Baltic Sea.
What was wrong with the practices you were using prior to 2012?
For the first global OHI assessment, we processed data from 100 different sources, with lots of copy-pasting and emailing files with names like ‘data_final_v2b.xls’. It took detective work and a lot of time to re-wrangle another year’s data to rerun the global models a second time. This is actually pretty common; environmental scientists are rarely trained to think about data in a deliberate way.
You found the transformation from your traditional reproducibility practices to a more open paradigm was “extraordinarily intimidating”. How so?
Our biggest challenge was changing our mindset: We had to stop thinking that our homegrown systems were good enough. It was intimidating to think of changing the way we had always worked and embracing practices like coding and version control. But, we needed to get onboard with practices and tools that have been developed specifically for the data-rich and collaborative world that we work in today.
Your article describes an OHI “Toolbox” for ensuring data transparency and reproducibility. What are the key components of that Toolbox?
We built the OHI Toolbox with R, RStudio, Git, and GitHub, which enable our work to be more reproducible and our collaboration more streamlined. For example, I can write code in R and then sync the files to GitHub directly from RStudio. Git and GitHub work similarly to Dropbox but they attribute my contributions line-by-line so anyone building off of my code knows who to contact with questions. And what is extremely powerful is that we also use these same tools and workflow for communication, creating static and interactive documents, presentations, and our website, ohi-science.org.
What difference has the Toolbox made in your lab?
They let us push the boundaries for how our science is done, shared, and used. For example, folks in Stockholm can go online and use the code we developed for global assessments, tailoring it so that it better represents the priorities and characteristics of the Baltic. They’ll use the same workflow so that other groups can then build off their code and experiences too. It’s like a positive feedback loop from all around the world.
What disciplines or kinds of research will find your Toolbox approach easiest to adopt?
These tools help you do better science now, and also help ‘future you’, your most important collaborator, do better science later. So they’re advantageous for individuals as well as for labs. They can be key for preserving lab memory so that when a grad student or postdoc leaves, their contributions don’t leave too. PIs can encourage this kind of culture even if they don’t code themselves.
How can other labs get started using these tools?
The biggest hurdles to getting started are exposure and confidence: you’ve got to see these tools used in ways that are relevant to you, and then build up the confidence to use them. I think community is the most important thing, and groups like RStudio, GitHub, Software Carpentry, rOpenSci, and #rstats on Twitter have been incredibly supportive. We’ve been teaching open and reproducible practices through the OHI project, but we’re sharing our story here because we want to help others embrace these practices and tools too. See more at ohi-science.org/betterscienceinlesstime.
Jeffrey Perkel is Technology Editor, Nature.
Suggested posts
My digital toolbox: Lorena Barba
JOSS gives computational scientists their academic due
My digital toolbox: Santiago Perez De Rosso on Git, reimagined
