How to bring your hard-earned data from the lab to the world.
Contributor Anthea Lacchia
Amongst the scientific community, there is increasing awareness of the value of data transparency and reproducibility. But how can we achieve transparency in practical terms? Catherine Goodman, Senior Editor at Nature Chemical Biology, delivered a workshop on handling scientific data during the Boston NatureJobs Career Expo 2015.
“Careful experimental design is the foundation of data transparency and will also avoid wasting time with referees later on,” said Goodman. Different scientific fields have different requirements as to how much data are needed to make meaningful interpretations and how they should be collected, so it is important to be aware of your community’s standards.
Keeping good records can help clarify why a given experiment didn’t work. “If you are collecting data in a field new to you, it is useful to consult the experts in the field, follow protocols and collect all the data you can,” Goodman said.
Proper training in the lab and good communication among team members is fundamental to achieving high standards of data collection and interpretation. In fact, many of the papers that end up on Retraction Watch, a blog that reports on retraction of scientific papers, are born out of a disconnect between PI and postdoc or trainee. “Getting the killer paper is not as important as doing science properly and rigorously, because you want to contribute positively to the scientific community, not find yourself on Retraction Watch,” Goodman said.
Once you have gathered and analysed your data, how should you prepare them for publication? Goodman stressed the importance of avoiding clutter and making sure all the essentials are provided. Things referees need to know include: the number of samples, the type of statistics, what is being measured, the reagents used, how the experiment was performed, what errors are shown and how were they calculated. “Not including this information can often waste an entire round of review,” said Goodman. “I see this all the time.”
Often the amount of data is too great to fit into the confines of a single research paper. With the growing emphasis on public sharing of data comes a need to find a repository that will curate data properly. “A first port of call is to check the data archiving options at your institution,” Goodman said.
When choosing a repository for data, it is important to make sure it is committed to long-term preservation. For instance, “if you are issued a DOI for your data, that is a sign that the repository is here to stay,” Goodman said. It is a good idea, she added, to check that the chosen repository will keep the data private until it is ready to publish. Repositories may be specific to a given field, or more general ones such figshare or the Dryad Digital Repository.
The need for reproducibility has recently led to the birth of a new kind of scientific publication: the data journal. These journals, such as the Nature Publishing Group’s Scientific Data, publish data articles as opposed to traditional, hypothesis-driven papers. Whilst traditional papers contain goals, synthesis, analysis and conclusions, data papers are all about data. They typically consist of standalone datasets that don’t fit into other publications. They explore the steps involved in generating the data and how it was processed. Publishing a paper in a data journal serves science by allowing other researchers to use the data, Goodman said. She also cited other reasons that might appeal to a career-minded young researcher: the data journal paper may increase the citation index of the original, hypothesis-driven paper — and in any case it is another line to add to the C.V., she said.
The future of data sharing is by no means fully defined, Goodman said. Many questions remain about how data sharing will affect intellectual property, for example.
Despite the uncertainty, all researchers can do is strive for high standards of data collection and presentation, which will in turn strengthen their scientific arguments. Data can convey a message, prove a point, or detract from an opponent’s hypothesis. But above all, data are used to tell a story: as Goodman advised, “just tell your story as clearly and simply as you can.”