A big picture view for researchers on data repositories and data journals from Andrew Hufton at Publishing Better Science through Better Data.
More and more, funding bodies are requiring scientists to make their data open-access and available to the research community and others. So, why not wrap a publication around it at the same time, build your CV and get credit for all the hard work you’ve done?
Andrew Hufton, Managing Editor of Scientific Data, gave a great talk on the big picture view of publishing your data. I’ve reproduced some of it below for your reading pleasure along with further advice from Hufton. Have a read and tell us what you think – do you agree with the list? Do any ring particularly true for you? Are there any others you would add? Share your views in the comments section below.
Before even thinking about finding a repository to store your data, you need to make sure that your data is fit for storing. Hufton suggests a three-point checklist where you should:
– Make sure that your data is well structured
– Have lots of metadata to aid others’ understanding of your data
– If dealing with human data, have the appropriate consent forms organised
When you’ve prepared your data, it’s time to think about finding a repository for it. From Hufton’s point of view, there are 5 things you should think about:
– Quality curation. If there is a specialist repository for your particular field or data type, “value that…it will really increase the value of your research and the reuse of your data.” If there isn’t one, consider Figshare or Dryad. Scientific Data also has a list of subject-specific recommended data repositories for authors on their website, which is curated based on a standard set of criteria.
– Long-term accessibility. Like research teams, these data repositories are funded. Find out how long for: it will give you an indication of how long your data will be safe there. A good indicator of curation duration is to look for repositories and journals that offer data DOIs, “it’s a good sign that a repository is serious about long term commitment.”
– Collaborative analysis features. If you’re working on a research project that has multiple research teams from all over the world working on it, consider a repository that will allow you to collaboratively analyse your research.
– Privacy. Some repositories allow research data to be kept private until researchers are ready to publish their work. This might be suitable for you, if you want to continue to build on your data set before publishing it.
– Data archiving options at your institution. Many universities already have repositories on site, and others are working towards developing them for their researchers. “In many cases maybe your institutions will want a copy of your data, regardless of where you put it.”
Once you’ve put your data in a repository, how do you get credit? “In the scientific community, credit is intimately related with peer review and quality assessment” says Hufton. This means you might want to consider publishing your data in a data journal.
A data journal, put simply, is a journal that publishes articles about your data and only that. They don’t look for findings or interpretations and they don’t want your opinions. Only descriptions of data. Over the last few years, there has been a lot of activity in the data journal space. Scientific Data is one of a growing number of data journals and journals publishing data articles. Examples include GigaScience; F1000Research, Biodiversity Data Journal, Earth Systems Science Data, and more. .
Scientific Data’s approach focuses on helping scientists reuse data. Hufton encourages scientists that are unfamiliar with them to read a few Data Descriptors – as they are known in Scientific Data. Here’s a list of what Scientific Data considers to be important for data journals (informed by a survey of researchers opinions on data publication conducted by NPG in 2011):
– Researchers should get credit for their work
– Data should be open access (CC-BY in the case of Scientific Data)
– Focused on data reuse
– Peer reviewed
– Promoting community data repositories – data journals like Scientific Data are not repositories.
Finally, it’s time to consider when you should you submit your data to a data journal. Hufton offers four different options:
– Early. If you’re excited about your data or you’re a fan of sharing and open-access, it’s possible to publish your data in a journal so that many other researchers can benefit from it and use it in their own research.
– Alongside your research publication. (There are clear policies between Scientific Data and Nature about this – and Nature journals now encourage publication of Data Descriptors for many studies).
– Describe data sets that don’t fit with other publications. What happens to the mountain of data that doesn’t make it into your research papers? Technically sound data sets could be beneficial for others.
– Release data used in previous research articles. If your old data can add to the scientific literature, why not?
Hufton’s final words to the audience were encouraging. “Get the most from your data. You’re a young researcher: you want to make an impact with your research. Preserve it, encourage its reuse… and get credit for the work that you’re doing.”
If you have any questions about data repositories, feel free to get in touch with Andrew Hufton via email at scientificdata [at] nature.com
You can also watch Hufton’s talk, all the other talks, and download the speakers’ presentations from the event here.