Data management is a crucial component of scientific research and one that should be tackled by early career researchers before they become swamped with data, says Erica Brockmeier.
PhD students and early career researchers have a lot on their to-do lists, everything from writing papers and applying for grants to staying on top of the latest findings in their field. The third keynote of the #scidata16 conference highlighted yet another important facet of a research career: data management. Kevin Ashley, based at the University of Edinburgh, gave a thought-provoking presentation on this topic. As director of the Digital Curation Centre in Edinburgh, Scotland, Mr. Ashley and his team provide advice, guidance and training for researchers, alongside consultancy services on all aspects of data management and data reuse.
Why should researchers care about data management? Mr. Ashley presented a number of the benefits of good data management; including improved research data quality, better value for money for funders, and the promise of faster research progress. Re-using data from other research groups is cost-efficient since it allows researchers to invest in projects that will be used more than once. In addition, knowing how to re-use other people’s data means that researchers don’t have to invest additional time in generating new datasets for every new publication.
As an early career researcher, you might be wondering if you can avoid worrying about data management until you have your own lab or are involved with larger projects and datasets. But Mr. Ashley’s keynote highlighted the importance of data management in the early stages of any project and any research career. Data management is not something that happens at the end of a project, or as an afterthought once you’ve started on something new. Good data management practices should begin when you conceptualize the research: plan what data you’ll collect or create as part of your project, and consider how other researchers could use the data outside of your own objectives.
Mr. Ashley provided great examples of the power of good data curation from the field of astronomy. Astronomers have made observations of the stars for thousands of years, and one might think that with all of the state-of-the-art equipment we have today that the observations from early astronomers would no longer be of use. However, the field has progressed as far as it has because astronomers have been able to build on what previous researchers recorded. From the Chinese and Babylonian observations on the earth’s rotation, to the 8th century astronomers developing star catalogs and solar transit measurements, data collected over thousands of years are still useful for modern day astronomy for endpoints such as measuring changes in the earth’s rotation. As Mr. Ashley said, “we should all hope that our data has value over time.”
Modern astronomers are also savvy at using archived observations to further their own research questions. Large collaborative observation tools such as the Hubble telescope are competitive places to collect data. Astronomers are adept not only at building on historical findings, but also at re-using data and re-analyzing results from archived files, data that they themselves didn’t collect. In an era of science where new data is difficult or costly to obtain while archived data is prevalent, a crucial skill for young scientists and researchers is being able to use and effectively analyze experiments and observations from others.
It might be difficult to think of our own data and projects being as impactful as the work of the great astronomers of the past, but any high-quality dataset can readily withstand the test of time. If data is well-managed, accessible, and interpretable by others, then the impacts of our one-time projects could last well beyond our own publications and careers. This supports the idea that early career researchers should focus on developing the data management skills they need before they become swamped with data.
If you’re not sure where to start, you can check out the DCC website for information on training events, guidelines for project management, and additional resources to start developing your own research data management skillset. As Mr. Ashley commented, it’s more than just another item on your ever-growing to-do list: “If you’re not working with well-curated data, you’re not doing good science.”
Erica Brockmeier is a post-doc in computational toxicology at the University of Liverpool. She is also the lead writer of Science with Style, a weekly professional development blog for PhD students and early career researchers. You can follow her story of transitioning towards a career in science journalism writing at @EKBrockmeier.
You can access all the slides and videos from Publishing Better Science through Better Data 2016, as well as the great visual summary of the day, on the event website.