Guest post by Dr Alejandra Gonzalez-Beltran, Research Lecturer, Oxford e-Research Centre, University of Oxford
Since it’s inception, Scientific Data has been working to encourage a “show me the data” culture, with the aim of publishing Data Descriptors on fully reusable datasets. Datasets can only be used if data have been rigorously described; as recently shown, even when journal policies on data archiving are strong, data are not always shared in a reusable manner¹.
However, even well documented data will only be reused, if they are discoverable. Published Data Descriptors are indexed in all major bibliographic indexing services, such as PubMed, and so datasets can be discovered via these article indexes. However, accompanying every Data Descriptor article there are metadata files, specifically created to aid discovery and understanding of the data itself. Using the ISA (Investigation, Study, Assay) model, these metadata files provide a machine readable overview of the study that generated the data. The ISA model records the data’s provenance, how it was generated and where it is located – all prerequisites of publication with Scientific Data. The ISA-Tab metadata file is used to generate the Structured Summary table that appears after the abstract in every Data Descriptor article (see Figure 1). The ISA-Tab tabular metadata files available through Scientific Data are provided under the CC0 waiver to encourage their maximum re-use.
Figure 1. Summary Table for Hangartner et al. https://dx.doi.org/10.1038/sdata.2015.67
Currently ISA-Tab metadata files can be downloaded directly from the HTML page for each Data Descriptor. Scientific Data also provides a GitHub repository containing copies of these ISA-Tab files at https://github.com/ScientificDataLabs/ISA-tab.
However, we can do more! Given that datasets and their descriptions are made available, the next step is to enable a “find me the data” culture.
Aiming to increase the usefulness of the ISA-Tab metadata files, the ISA team based at the University of Oxford has produced a demo tool allowing the discovery and exploration of the datasets. This is aptly named the ISA-explorer, and uses the information in the ISA-Tab metadata files to facilitate dataset discovery (see Figure 2). Published Scientific Data ISA-Tab files can now be easily read and explored at the customised Scientific Data ISA-explorer at: https://scientificdata.isa-explorer.org/
The Scientific Data ISA-explorer allows users to filter datasets by data repository and the metadata in the Structured Summary table (see Figure 1). The filters can be combined in a boolean search allowing users to easily and quickly discover specific types of data; for example, a specific design type for data stored at a particular repository.
Figure 2. ISA-explorer for Scientific Data screenshot
The ISA-explorer also offers a generic search box, allowing for keyword-based searches of the ISA-Tab metadata files. Additionally, the tool allows browsing of dataset specific information such as related publications. The tool shows a visualisation of the distribution of sample characteristics: for example, Figure 3 depicts how many samples of each organism type. To see all the samples information, the original metadata tables can also be visualised.
Figure 3. Visualisation of the ISA-Tab metadata for Hangartner et al. https://dx.doi.org/10.1038/sdata.2015.67
We are pleased to be launching this beta version of ISA-explorer for Scientific Data published datasets. The datasets available from the Scientific Data ISA-explorer will automatically be updated with new publications; the screenshots above show datasets published May 2014 to December 2015. We have some improvements already planned; for example, showing overall statistics, supporting richer query searches considering the semantics of the data (e.g. query expansion based on synonyms) and relying on richer knowledge representations (such as linkedISA²).
However, we want to hear about other enhancements you would like to see! What features would you find most useful? What types of searches do you wish you could perform? We welcome your feedback on the ISA-explorer; you can contact the ISA tools team directly by emailing isatools@googlegroups.com.
Happy exploring!
References:
1 Roche DG, Kruuk LEB, Lanfear R, Binning SA (2015) Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoS Biol 13(11): e1002295. https://dx.doi.org/10.1371/journal.pbio.1002295
2 Alejandra González-Beltrán, Eamonn Maguire, Susanna-Assunta Sansone and Philippe Rocca-Serra. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 2014, 15(Suppl 14):S4. https://dx.doi.org/10.1186/1471-2105-15-S14-S4