Scientific Data | Scientific Data

Repository Highlight: figshare and the crucial service of generalist repositories

Scientific Data and Figshare

Scientific Data works with over 60 public data repositories and helps authors find the best place to store their data (see our list here). We require authors to store their data in community-recognized data-type-specific repositories when they exist. But for many data-types, specific repositories do not exist. In these cases, broad repositories, such as figshare, that can host a wide range of data-types provide a crucial service to the scientific community.

We work closely with both Dryad and figshare, two well-established “generalist” data repositories. About half of our authors choose to archive some or all of their data at either figshare or Dryad (see the chart below), demonstrating the importance of generalist repositories. Indeed, figshare is currently our most popular data repository, hosting data associated with about a third of our published articles.

repo_numbers

Number of Data Descriptors with data archived at each repository. Numbers were collected from the ISA-tab metadata records associated with all Data Descriptors published since launch in May 2014. The ISA-Tab metadata lists the primary repositories for the datasets centrally described in each manuscript, and may not include all repositories mentioned in the “Data Citation” section of the Data Descriptor manuscript. Data Descriptors may be associated with data at more than one repository. Correction: Graph updated on March 9th at 13:40 GMT, to reflect latest publication numbers.

Thousands of researchers already use figshare to deposit and share datasets. A publication at Scientific Data can help them gain further credit for their transparency, in addition to realizing the benefits that arise from our constructive peer-review and curation processes. Scientific Data is glad to consider manuscripts describing datasets of all sizes and complexities, from a wide range of fields, making us natural partners to generalist data repositories like figshare.

Some of the features that we think make figshare appealing to so many of Scientific Data’s authors include:

screen2

A screenshot of the Dataset Upload tab in our manuscript submission system. A guide to uploading datasets to figshare in association with a Data Descriptor manuscript can be downloaded here.

Integrated submission

Authors can upload data directly to figshare while submitting a Data Descriptor manuscript, without leaving our manuscript submission system. Data are kept private during peer-review – shared only with our Editorial Board and referees – and released to the public upon publication of the Data Descriptor.

With this system, authors can upload their data files and get their manuscript into peer-review quickly, even when a community repository does not exist for their data-type.

screen3

A screenshot of the in-article data browser, showing a preview of one of the data tables provided with Mazzoldi et al. (doi:10.1038/sdata.2014.18).

Data browsing and previewing

Figshare provides in-browser previews of many common file types including images, spreadsheets, text files, and zip archives. In addition, when data files are uploaded through our integrated system, data files can be viewed directly within the article, using a custom “lightbox” developed by figshare for Scientific Data. This makes it easy for readers to explore data files associated with our articles, without fully downloading the files or invoking additional software.

For an example of this feature, see the article by Mazzoldi et al, which describes more than fifty years of fishery data from the Adriatic Sea. Readers can browse and preview each of the data tables, and then download them for further analysis – click the figshare link in the article’s Data Citations section to open the data viewer. Or, check out this article by Maclaren et al, which includes a video version of Figure 4 that is playable within the figshare data browser.

screen2

A screenshot of the Data Citation section in Mazzoldi et al. (doi:10.1038/sdata.2014.18). Clicking the figshare link opens the in-article data browser.

Stable, citeable data archiving

Figshare assigns a DataCite DOI to every uploaded dataset. DataCite DOIs are stable identifiers that make it easy to link to and cite data objects.

This feature is shared by Dryad and a growing number of data repositories on our recommended repository list.

Complementing more specialized repositories

Figshare helps authors share more of their data, by making it easy to share data that cannot fit into more specialized repositories. One of our first published Data Descriptors, by Perkins et al, used figshare to share unprocessed data from an RNAi screen for genes involved in muscle maintenance, complementing a more structured version of the data that they shared through GenomeRNAi. Similarly, Baud et al used figshare to archive phenotypic data, haplotype maps, and R objects – data-types that have no well-defined repositories – in addition to the gene expression profile data that they archived at ArrayExpress.


Because scientists are constantly inventing new technologies and methods, cutting-edge research datasets, by their nature, often don’t have established repositories or well-defined standards. In these areas generalist data repositories such as figshare are truly essential.

Disclosure

Macmillan Ltd who own Nature Publishing Group, the publisher of Scientific Data, are an investor in figshare.

Comments

There are currently no comments.