As Euan Adie and Alex Hodgson discussed in this month’s Naturejobs podcast on scientific publishing and a digital future, the amount of data being created in science is phenomenal. It is being created faster than the technology to store it. But as the volumes are increasing, are scientists getting any better at managing it? As it turns out, there are still a few kinks in the system.
An article on Research Information called Better management reduces data loss risk, highlights some of the problems that scientists might have.
“After moving all of his data home to write up, biologist Billy Hinchen returned one afternoon to find that his laptop and all his backup hard drives had been stolen.”
The founder of figshare also shares his experiences with data management:
“Mark Hahnel, typified a common challenge: ‘During my PhD I was never good at managing my research data. I had so many different file names for my data that I always struggled to find the correct file quickly and easily when it was requested. My former PI was so horrified upon seeing the state of my data organisation that she held an emergency lab book meeting with the rest of my group when l was leaving’.”
These aren’t ideal scenarios, and so the article suggests that data management needs to improve, and provides a few recommendations specifically for scientists that Digital Science have put together:
1) figshare: “a cloud-based repository where researchers can store their data outputs privately, share them with colleagues, or make them publicly available and citable with a permanent DOI.”
2) Projects: “an application that lets researchers safely manage and organise their research data on the desktop. It provides a visual timeline to make finding files easy, backup functionality to help seamlessly recover previous versions of files, annotation features and a structured hierarchy to encourage users to organise their files.”
One other quick tip: get a good nomenclature system that you and your colleagues can all understand and use.
Does anyone else who works with big data have any handy tips for managing their data?