Each published Data Descriptor will be accompanied by machine-readable metadata designed to help advanced users mine and search our content. These metadata will include basic information about the Data Descriptor article, as well as terms that describe key aspects of the experiments or procedures in the study. See Box 1 for a brief outline of this information.
After discussions with the community, and our Advisory Panel, we have decided to share this information under the Creative Commons Zero waiver (CC0), which is designed to free information of copyright restrictions. By applying the CC0 waiver to Data Descriptor metadata, we allow others to reuse it without legal limitation. Indeed, much of the content in these metadata files could be considered collections of “facts”, and may not be copyrightable in the first place – but, there can be substantial legal grey areas. The CC0 licence helps to remove ambiguity. Simply put, we don’t want data miners to have to hire a lawyer before using our metadata.
The main human-readable content of Data Descriptors — the body text of the main article, figures, etc — will remain covered by one of three open-access Creative Commons licences selected by the authors, all of which explicitly require attribution for any reuse (CC BY, CC BY-NC, & CC BY-NC-SA). The actual primary data files associated with Data Descriptors will be stored in one or more external repositories, which will have their own terms of use or licencing policies.
The CC0 licence – designed to maximize reuse of data and metadata
The Nature Publishing Group already releases article and journal metadata under the CC0 waiver via its Linked Data Platform (see the related press release), including annotation from NPG’s own subject term ontology. The metadata released at Scientific Data will be an extension of this policy, and will be downloadable in formats designed to aid research data miners, starting first with the ISA-Tab format.
Open science advocates have argued that data associated with published works should be explicitly released into the public domain, a view laid out in the Panton Principles. These principles strongly encourage the use of the CC0 waiver for published research data and data collections. In line with these principles, two of our partnered data repositories, figshare and Dryad, apply the CC0 waiver to datasets deposited in their repositories, and BioMed Central recently announced its intention to use CC0 for datasets associated with their published articles.
Peggy Schaeffer of Dryad, in explaining their use of the CC0 waiver, writes, “CC0 was crafted specifically to reduce any legal and technical impediments, be they intentional and unintentional, to the reuse of data. In most cases, CC0 does not actually affect the legal status of the data, since facts in and of themselves are not eligible for copyright in most countries … But where they are, CC0 waives copyright and related rights to the extent permitted by law.”
What about attribution?
Some have raised concerns that the CC0 waiver could open the door to reuse of published content or data without proper attribution. One of the main aims of Scientific Data is to help scientists get proper credit and recognition for sharing their data, so we take these concerns very seriously.
Proper citation of scientific data is currently enforced by the scientific community through editorial standards and peer-review, not legal copyright restrictions. The CC0 waiver removes legal ambiguity, but does not change these community-based protections. As noted before, data and metadata are often not copyrightable, and research data in public repositories are already covered under CC0 or other ‘terms of use’ agreements that do not legally require attribution.
Scientific Data believes that appropriate attribution is required by standards of scientific ethics, irrespective of the actual legal licence applied to any particular piece of content. Supporting this policy, figshare states, “Although CC0 doesn’t legally require users of the data to cite the source, it does not take away the moral responsibility to give attribution, as is common in scientific research”.
Ultimately, we believe that good metadata linking research articles, data publications, and datasets, will help scientists cite data, ensuring that scientists who share get the credit they deserve.
Box 1. Overview of information in Data Descriptor metadata
Metadata files will be released in the ISA-Tab format, and potentially in other formats in the future, such as Linked Data. An example metadata file is available here, associated with one of our sample Data Descriptors. The information in these files is designed to be a machine-readable supplement to the main Data Descriptor article.
- Article citation information: Manuscript title, Author list, DOI, publication date, etc
- Subject terms: according to NPG’s own subject categorization system
- Annotation of the experimental design and main technologies used: Annotation terms will be derived from community-based ontologies wherever possible. Fields are derived from the ISA framework and include: Design Type, Measurement Type, Factors, Technology Type, and Technology Platform.
- Information about external data records: Names of the data repositories, data record accession or DOIs, and links to the externally-stored data records
- Structured tables that provide a detailed accounting of the experimental samples and data-producing assays, including characteristics of samples or subjects of the study, such as species name and tissue type, described using standard terminologies.
For more information on the value of this structured content and how it relates to the narrative article-like content see this earlier blog post by our Honorary Academic Editor, Susanna-Assunta Sansone.