Nature Biotechnology calls for better data-sharing practices

A universal tagging system that links data sets with the author(s) that generated them is essential to promote data sharing within the proteomics and other research communities. The July Editorial in Nature Biotechology (27, 579; 2009) reports the results of the journal’s survey of author compliance in depositing proteomics and molecular-interaction data underlying the papers they published. The editors found that even authors who are proponents of data deposition are not making data available in all of the papers they publish. Inhibitory factors include data quality and the user-unfriendliness of some databases. The Editorial concludes:

“One option would be to provide researchers who release data to public repositories with a means of accreditation. This would take the form of a universally standardized tag for data that could be searched and recognized by both funding agencies and employers. An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. In essence, the tag would be a digital object identifier (DOI), currently best known for its use in unambiguously identifying papers online.

Similar to citation information about publications, citation information about a researcher’s data DOIs could be gathered by funders assessing future support and used by institutions in performance evaluation. Researchers who disclose data sets that subsequently prove particularly useful to the community would end up with highly cited data DOIs, and could thereby be rewarded accordingly.

Such a system would not solve all the problems slowing data disclosure in proteomics and elsewhere. But it would provide greater incentive than the present system of evaluation, which is skewed almost exclusively to publications in high-profile journals and citation metrics. Data DOIs would not only enhance a researcher’s reputation but also establish priority of data generation. Most important of all, they would provide a way to acknowledge the time and effort individuals must invest in sharing data, which ultimately benefits the scientific community as a whole.”

See also a Correspondence in the same issue of Nature Biotechnology (27, 597-598; 2009): PRIDE Converter: making proteomics data-sharing easy, by Harald Barsnes, Juan Antonio Vizcaíno, Ingvar Eidhammer and Lennart Martens, a collaboration between the University of Bergen and the European Bioinformatics Institute.

Nature journal policies on data and materials availability.


    Gudmundur 'Mummi' Thorisson said:

    I want to point out that there is something like the ‘universal tagging system’ as you call it already in operation. The STDDOI initiative ( ) is devoted to publication and citation of data using the DOI infrastructure. They operate their own DOI registration agency and several organizations act as publication agents for specific types of data, such as geophysics and climate observations.

    Would it be feasible to extend this framework to other kinds of data (microarrays, sequences, proteomics)?

    Here’s a publication in Data Science Journal:

    Brase and Schindler. The publication of scientific data by World Data Centers and the National Library of Science and Technology in Germany. Data Science Journal (2006) vol. 5 pp. 205-208

