Scientific Data will be a forum for publications about datasets, but will not be a repository for primary datasets. Primary data associated with Data Descriptors will be stored in one or more external data repositories. Why this distinction?
This strategy helps us draw some clear lines around the goals of Scientific Data. By ensuring that the primary datasets are stored in external systems, we make it crystal clear that our goal is to help authors publish content that promotes the scientific value and reusability of their datasets, not to control access to data. We feel that this is a progressive strategy that will help promote collaboration and data consolidation, rather than fragmentation.

Publicly available scientific data is located in many different repositories, making it hard to find relevant datasets (aka the “data silo” problem). Scientific Data will provide a searchable publication platform where researchers can find high-quality datasets across many different data repositories. Data Descriptor publications will be linked to related research publications at Nature Publishing Group journals and external publishers, allowing scientists to navigate easily between research findings, rich data descriptions, and the actual data. We are working with two generalist repositories, Dryad and figshare, so that all datasets will have a home, and plan to develop metadata transfer pipelines with other repositories using the ISA framework.
Continue reading →