Today, we released Scientific Data’s ISA-Tab metadata specification, a document describing in detail the format we use to capture and distribute machine-readable metadata content with our Data Descriptor articles.
Most authors will not need to understand our metadata specification in detail. Metadata records will be created with the help of our in-house curation support, after manuscripts have been peer-reviewed and accepted for publication, and authors will not need to have any special knowledge regarding metadata creation.
Advanced users, however, will be able to submit machine-readable directly with their Data Descriptor manuscripts with the help of this metadata specification. This specification document will also be invaluable to scientists that wish to mine the metadata associated with our publications.
This metadata specification has been developed and written in collaboration with Susanna-Assunta Sansone and Philippe Rocca-Serra, two ISA community leaders, and our Honorary Academic Editor and a member of our Editorial Board, respectively. Please see their comments below on the growing use of the ISA framework (Box 1).
These metadata files form the structured component associated with our publications. For more information on this content and its relationship to Data Descriptor articles, please see our earlier blog post, “The Data Descriptor — making your data reusable”. Structured metadata will be released with each Data Descriptor publication to ensure that our content, and the associated datasets, are maximally discoverable and reusable. In the future, we plan to release this information in additional formats to help serve different user groups, including as Linked Data.
For authors who choose to supply ISA-Tab metadata files directly with their submitted Data Descriptor manuscripts, there will be some additional benefits. Author-supplied metadata files can help us provide richer structured information and can reduce curation time, helping speed publication of Data Descriptors. For groups submitting multiple Data Descriptor manuscripts, providing metadata files directly can help ensure that related Data Descriptors manuscripts share common, standardized metadata descriptions.
For groups that are already collecting structured metadata at the source — e.g. consortia, project-specific data repositories — conversion to our format can be relatively straightforward. ISA-Tab has been designed to facilitate inter-conversion between different metadata formats in use at various life sciences data repositories.
Our metadata format is compatible with other ISA framework tools, including ISAcreator, an application for viewing and writing ISA-Tab files. Scientists can download an ISAcreator configuration file supporting the Scientific Data specification from the ISA-tools website.
Box 1. The growing adoption and use of the ISA framework
ISA-Tab and associated tools have been designed and implemented to help researchers tackle substantial data management challenges and to fit into the existing ecosystem of open source tools and de facto community standards. Our goal has always been to assist researchers to share the experimental (or contextual) information that is necessary to interpret and reuse data files; and reformat and submit their datasets to relevant biological databases, each using and requiring their own format and terminology to describe the experimental steps.
To this end, we brought together like-minded service providers and data managers in the life, natural and biomedical sciences and work collaboratively to deliver a simple and implementable solution. ISA-Tab was born as a general purpose format to track provenance and annotate the experimental information and a growing number of tools have been released to help researchers in the reporting, sharing and conversion processes.
Two years ago with more than 50 collaborators at over 30 scientific organizations around the globe we published a Commentary at Nature Genetics outlining its use in diverse domains to encourage and facilitate data sharing (see also the Nature Biotechnology News Feature on data sharing).
To date several communities are working to implement it, and we are now starting to see publications describing its use. An example of how this works at an institutional level is provided by the Harvard Stem Cell Institute where they can now use ISA-Tab based metadata to find relationships between experiments involving normal blood stem cells in fish and cancers in children (Ho Sui et al, 2011). This may seem a trivial achievement, but it is not if you do not organize, record and report experiments with the same breath and depth of information and in a common format.
An example of how this works for a global, international effort is provided by MetaboLights, a public data repository at the EBI, powered by ISA-Tab and ISA tools (Haug et al, 2012); MetaboLights is also at the centre of a European consortia of metabolomics data providers working to set and promote community standards (COSMOS).
The NIH NCI nanotechnology working group shows how ISA-Tab can be extended to fulfill the specific needs associated with reporting biomedical nanotechnology datasets. Here, ISA-Tab-Nano has also been formalized as a proper, de facto standard (Baker et al, 2013).
Finally, quite interesting is the use of ISA-Tab for annotating and serving experimental information at GigaDB, a data repository associated with the BioMedCentral journal GigaScience (“Open Data For The Win”); it seems ISA-Tab has also the potential to become a de facto standard for data publications.
Susanna-Assunta Sansone & Philippe Rocca-Serra