Taking the time to plan how raw data will be recorded and shared can make all the difference when new research directions appear, says Matthew Edmonds.
In many research projects, there tends to be three major interested parties. The first is the researcher who actually performs the experiment and collects the data. The second is the scientist overseeing the research project, who may be collating related data from several researchers. Finally, there is the institution, which supports the research financially and provides a space in which to do it.
In order for researchers and group leaders to justify their position to the institution they need to accumulate data, the currency of science. Once enough data is in the bank, it can be cashed out in the form of publications, promotions, funding and ultimately a lasting contribution to human knowledge. Like any asset, data needs careful management in order to get the greatest return.
In the current age of big data, the sheer volumes of information produced makes good data management essential, and the same principles are also of benefit to smaller datasets. It’s very easy to fall into the trap of quickly labelling spreadsheet columns, files and folders, for example, with something trivial that makes sense at the time. But it’s surprising how quickly the ‘obvious‘ meaning of those labels becomes indecipherable, so it is worth taking an extra few minutes to organise your work more fully, logically and systematically, especially if the data comes from frequently performed experiments.
Preferably, everyone in a research group doing a particular experiment will use the same template. The dividends come later, when the numbers need to be combined in some unforeseen way, or when another person needs to interpret them. If the data are easy to follow, it benefits everyone. Ultimately this leads to higher quality publications and a larger impact, a desirable outcome for researchers, project leaders and institutions alike.
Research groups can sometimes be blinkered by their own research questions, but the data can be useful for others too. Recognition of this coupled with the cost involved in performing some experiments have led to a number of curated online data repositories. These are sometimes related to specific techniques, such as mass spectrometry, or specific discoveries, like proteins found to be chemically modified in particular ways in cells. Analysis of these massive combined data sets is becoming an area of research in itself, so a further contribution to the scientific community can be made by sharing data in this way. This renewed visibility opens up new opportunities for collaboration and career options which may not have been previously obvious. As a rule of thumb, funders and institutions strongly approve of these collaborative and interdisciplinary projects.
Forward planning for the management of data is a key aspect of this process. Before starting your experiments, it’s a good idea to familiarise yourself with pre-existing conventions to ensure what you produce will fit in with everyone else’s research. Many institutions now have in-house facilities (such as the Technology Directorate at the University of Liverpool, UK) with dedicated staff for streamlining particular techniques, especially the kind which fall into the category of big data, and should be able to offer advice. By collecting the data in a suitable format for sharing via these resources, a measurable contribution is made which can be used as further evidence of research output when applying for higher positions or funding.
Scientists depend on the data they produce. Careful management and appropriate sharing with the wider scientific community can greatly increase the value of these data and contribute to a successful career in science.
Matthew Edmonds is a postdoc at the University of Birmingham, UK, researching how cells which have defective mechanisms to repair damaged DNA can lead to cancer. He finds the pace of change in technology available to researchers astonishing, and tries his best to keep up. You can keep up with him at @benchmatt.
This piece was selected as one of the winning entries for the Publishing Better Science through Better Data writing competition. Publishing Better Science through Better Data is a free, full day conference focussing on how early career esearchers can best utilise and manage research data. The conference will run on October 26th at Wellcome Collection Building, London.