Data worthy of integration with the results of other researchers need to be prepared to explicit export standards, linked to appropriate metadata and offered with field-specific caveats for use. The Editorial in the January edition of Nature Genetics ( 42, 1; 2010) explores the extent to which, to be useful at generating new analyses and hypotheses, data sharing needs to be about standardized formats as much as simply being made ‘available’. For example, the Editorial states, “Sample sizes, selection criteria, statistical significance, number of hypotheses tested, normalization and scaling procedures, read depth and sequence quality scores are all important considerations that can be misunderstood or missed in combining and reanalyzing data. Whether integrative approaches are useful may depend upon whether integration preserves or destroys essential information….
Integration is of most value in two areas: bioinformatic modeling, to predict the effects of genetic and environmental perturbation, and clinical utility, to increase the speed and accuracy of the transfer of preclinical knowledge to clinical trial. Funding bodies hope that encouraging researchers to integrate their results will reduce duplication of effort. Trivially, researchers can agree to work on the same systems and samples or to use agreed standard control materials, but this can be problematic in practice….
Researchers can enable integrative studies by publishing their quality metrics and exchange standards in a timely way in regularly versioned, citable preprints; and by holding integration workshops between data producers and data users from different fields. These exchanges should focus on honest assessment of what data are ready for use and explain the quality metrics used and where the pitfalls lie in using the data. In return, data producers can increase the citability of their datasets by better understanding the metadata needed by users. Requirements for open data deposition and integration that do not include mechanisms to agree on, publish and use data standards risk inflating inconsequential ‘integrative’ bubbles.”