An Editorial published this week at Nature Genetics endorsed the goals of the DivSeek initiative and issued a call for wider sharing of plant phenotype data – particularly phenotypic data associated with important genetic and genomic studies. Scientific Data supports this call, and we invite researchers to submit manuscripts to our journal describing and releasing such datasets. Continue reading
Category Archives: Editor Posts
Trialling a pragmatic approach to clinical data disclosure
At Scientific Data we have been considering how we might develop our scope and editorial policies to better accommodate clinical research data. Publication of clinical research presents a number of challenges, which we will not be the first to have attempted to solve. In particular, we might need to support linking of our primary article type, the data descriptor, to non-public datasets – datasets that cannot be open access due to patient privacy or other legitimate constraints. While we advocate setting the default for research data to open, we are also conscious that full anonymisation of clinical data is often impossible to achieve with certainty. Continue reading
Scientific Data expands cooperation with the Nature journals
This week, Nature and the Nature research journals made some important updates to their data availability policies: updates that strengthen the editorial links between Nature journals and Scientific Data; updates that provide better resources and support for authors wishing to better support reproducible research; and updates that leverage the work of Scientific Data to curate datasets and identify suitable data repositories for more authors. See the related editorial published at Nature. Continue reading
Size doesn’t matter
Big data are, it seems, everywhere and attracting much attention, but in terms of size are hard to define. Scientific research generates a lot of “small data” too – the average file size for all datasets deposited in our partner repository figshare, for example, is just 1.35 Mb. However, big data are ironically somewhat agnostic of file size, and instead are more about complexity – of the processing techniques and sources the data are derived from. Scientific Data is, for the data underlying our publications, also size agnostic. We welcome data big and small and, in response to feedback from our Editorial Board, have updated our frequently asked questions and scope statement to reflect this. Continue reading
Scientific Data conferences & events calendar
The Scientific Data team is traveling heavily over the next two months as we run-up to our formal launch in late May 2014. If you are attending one of these meetings, we would be delighted to meet you and learn about your data. And, don’t miss the special events we will be hosting at AACR and EGU this year!
Continue reading
Endorsing the Joint Declaration of Data Citation Principles
We are very pleased to share that NPG has endorsed the Joint Declaration of Data Citation Principles. These principles are a synthesis of previous guidelines and have been released by the Data Citation Synthesis Group a collaboration involving CODATA, the Research Data Alliance, members of the Force11 community, publishers and others.
The guiding principles stress the importance of data resources in scientific communication and the need for citation to facilitate credit and attribution to those who contribute to data generation – summed up in the Joint Declaration of Data Citations preamble below.
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse. Continue reading
Scientific Data’s first publications
Earlier this month, Scientific Data published its first two Data Descriptors. These pre-launch articles recently cleared peer-review and we have decided to publish them before our formal launch in May 2014. They were published using a simplified article template, but they will be transferred to our more feature-rich publication platform in May, and will retain the same citation information and DOIs.
Both of these works present valuable, previously unpublished datasets. We are also actively considering Data Descriptor manuscripts that expand on previous publications (e.g. releasing important datasets in more detail, or making them more reusable), and we expect to have some excellent examples of these types of follow-up works for our launch in May.
Scientific Data’s metadata specification
Today, we released Scientific Data’s ISA-Tab metadata specification, a document describing in detail the format we use to capture and distribute machine-readable metadata content with our Data Descriptor articles.
Most authors will not need to understand our metadata specification in detail. Metadata records will be created with the help of our in-house curation support, after manuscripts have been peer-reviewed and accepted for publication, and authors will not need to have any special knowledge regarding metadata creation.
Advanced users, however, will be able to submit machine-readable directly with their Data Descriptor manuscripts with the help of this metadata specification. This specification document will also be invaluable to scientists that wish to mine the metadata associated with our publications.
Metadata associated with Data Descriptor articles to be released under CC0 waiver
Each published Data Descriptor will be accompanied by machine-readable metadata designed to help advanced users mine and search our content. These metadata will include basic information about the Data Descriptor article, as well as terms that describe key aspects of the experiments or procedures in the study. See Box 1 for a brief outline of this information.
After discussions with the community, and our Advisory Panel, we have decided to share this information under the Creative Commons Zero waiver (CC0), which is designed to free information of copyright restrictions. By applying the CC0 waiver to Data Descriptor metadata, we allow others to reuse it without legal limitation. Indeed, much of the content in these metadata files could be considered collections of “facts”, and may not be copyrightable in the first place – but, there can be substantial legal grey areas. The CC0 licence helps to remove ambiguity. Simply put, we don’t want data miners to have to hire a lawyer before using our metadata.
The main human-readable content of Data Descriptors — the body text of the main article, figures, etc — will remain covered by one of three open-access Creative Commons licences selected by the authors, all of which explicitly require attribution for any reuse (CC BY, CC BY-NC, & CC BY-NC-SA). The actual primary data files associated with Data Descriptors will be stored in one or more external repositories, which will have their own terms of use or licencing policies. Continue reading
The Data Descriptor – making your data reusable
Scientific Data’s call for submissions is fast approaching, and accepted manuscripts will be featured at our platform’s launch in Spring 2014. We hope you’ll want to take part!
If you’re interested in publishing your datasets—and getting credit for them—all the information you need to draft and submit a manuscript is now available on our website (See Box 1, “Getting ready to submit”).