Five things you can do today to make tomorrow’s research open

Early career researchers have an essential role to play in the move towards open research, says #SciData17 writing competition winner Sarah Lemprière.

Data-sharing-naturejobs-blog

Continue reading

Enabling the effective sharing of clinical data

This blog was written by Mathias Astell & Iain Hrynaszkiewicz and was originally published on the DNAdigest blog.

The benefits of sharing data generated by researchers have long been understood to be of great value to science (as exemplified by this British Medical Journal piece from 1994). And over recent years there has been a rapid increase in the ability to share and access research data – as can be seen in the rise of data journals (such as Scientific Data and Gigascience), the increase in research data repositories (both general and subject-specific), and the establishment of data sharing policies around the world. Continue reading

Author’s corner: Sharing proteomics data to build community-based resources

Ruedi Aebersold & George Rosenberger photo

{credit}Ruedi Aebersold & George Rosenberger{/credit}

Guest post by Ruedi Aebersold, Professor of Systems Biology with a joint appointment at ETH Zurich and the University of Zurich, & George Rosenberger, PhD student in the Aebersold group at the Institute of Molecular Systems Biology, ETH Zurich.

Mass spectrometry-based proteomics is a data-intense research discipline that primarily aims at identifying and quantifying the proteins that constitute the proteome1. This is achieved by generating large numbers (104 to 106) of fragment ion spectra that represent peptides generated by proteolysis of the respective proteome. Mass spectrometers can operate in different data acquisition modes, referred to as data-dependent acquisition (DDA), targeted acquisition exemplified by selected reaction monitoring (SRM) or data-independent acquisition (DIA)2 exemplified by SWATH-MS3,4. Specific software tools then generate from these raw data processed mass spectra – from which sets of identified peptides, proteins and their abundance are inferred and annotated with metadata. Both, the generation and the processing of such raw data sets are resource and time intensive.  Further, if unique, irreplaceable samples are being analyzed, as is often the case with clinical cohorts the data cannot be re-generated. Therefore, the proteomics community has started to embrace data sharing by the means of different specialized public repositories, for example GPMDB5, PRIDE6, PeptideAtlas7 or ProteomicsDB8. For the last few years, the ProteomeXchange9 consortium has provided centralized deposition of raw data and their meta-annotation. Continue reading

Author’s Corner: Advancing the sharing and standardization of metabolomics data

Mark Viant photo

{credit}Mark Viant{/credit}

Guest post by Mark Viant, Professor of Metabolomics in the School of Biosciences at the University of Birmingham, UK, and Director of both the national NERC Biomolecular Analysis Facility – Metabolomics and the Phenome Centre Birmingham

In 2014, my research team published the first Scientific Data Data Descriptor for metabolomics measurements, Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control. This article described in great detail the many steps that are critical for ensuring the production of high quality (direct infusion) mass spectrometry (DIMS) data. It was our intention that this publication would help to establish the benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. The data was also made freely available in the MetaboLights public database for metabolomics data (dataset MTBLS79).1

Continue reading

#Scidata15: Make the most of your research: Publish better data

Primary research papers are the currency of academics, but they’re also part of a much wider body of knowledge that is restricted by a lack of transparency.

Guest contributor Lakshini Mendis

naturejobs-blog-Scidata15-ResearchData-for-Publications

{credit}Image credit: SCIENTIFIC DATA/LUDIC GROUP{/credit}

Historically, a great deal of trust has been placed in statements made in research papers for which the underlying data have not been shared. The invention of the laser was described in a paper containing just three data-points, for instance, and Watson and Crick first described the structure of DNA in a paper without any data at all. But with about 1,500 papers retracted since 2012, and 26.6% due to misconduct, scientific papers are now firmly under the microscope.

Improving the availability and readability of original research data would go a long way to improving matters. And as scientific publishers largely determine how research data is disseminated, their involvement will be central to any change. Speaking at Publishing Better Science Through Better Data in late October 2015, Dr Joerg Heber and Dr Andrew Hufton, editors at Nature Communications and Scientific Data respectively, emphasised that to make the most of research data it must be more open.

Overcoming the data-sharing challenge

According to Hufton, the status quo is for researchers to only share data with others directly. As well as being inefficient, data associated with published work disappears at a rate of about 17% a year as a result of researchers failing to properly catalogue findings. There is now, therefore, a move from scientific publishers to make data findable, accessible, interoperable and re-useable – or, to use an acronym as those of a scientific persuasion are so often inclined to do, FAIR. Continue reading

#SciData15: Research Data for Discovery: Prepare to Share

Speakers at #SciData15 advocated for a wider degree of awareness of the field of data science and the implementation of data sharing technologies.

Guest contributor Caroline Weight

naturejobs-blog-Scidata15-Research-Data-for-Discovery

{credit}Image credit: SCIENTIFIC DATA/LUDIC GROUP{/credit}

“We must engage in the idea of sharing,” said conference chair Iain Hrynaszkiewicz as the 2015 Publishing Better Science through Better Data meeting kicked off at the headquarters of Nature Publishing Group (NPG) in London on 23rd October.

Hrynaszkiewicz, who develops new areas of open research publishing and data policy within NPG/Macmillan, noted that 30 funding bodies — including the Engineering and Physical Sciences Research Council and The Royal Society — have written policies that outline requirements for data-sharing. Examples include detailed methods and protocols, microscopy images and mathematical workings, as well as meta-datasets of, for example, genotypes and microarrays.

The meeting’s aims were to increase awareness of ways to effectively share data and to discuss how to improve the efficiency, implementation and overall impact of sharing among the scientific community. A recurring issue throughout the day was how to enforce sharing, and get the concept to become part of standard, everyday scientific practice –one that seeps into the lives and habits of working researchers. Continue reading

Sharing data: Why it should be done

As data continues to be produced at staggering rates, scientists need to become more aware of the benefits of data sharing, says Eleni Liapi.

Guest contributor Eleni Liapi

CD-stack-naturejobs-blog

{credit}PhotoDisc/Getty Images{/credit}

The scientific community is currently experiencing an explosion in data generation. At CERN (the European Council for Nuclear Research), the rate of data production is 1 petabyte (=1015 bytes) per day inside the Large Hadron Collider (LHC), which is comparable to 210,000 DVDs.  At the European Bioinformatics Institute, 20 petabytes of biological data had been stored between 2004- 2012.  In the US alone, the volume of data produced by the healthcare industry in 2011 was estimated at 150 exabytes (=1018 bytes). Undoubtedly, this volume of information brings with it several problems, including data storage and sharing.

Access to data is a topic that initiates numerous discussions and opinions between scientists and other communities for a plethora of reasons, including concerns about inappropriate use, institutional or industrial restrictive policies where the gigabytes of obtained genomic data are to be utilised for pharmaceutical research, for example. To date, there have already been attempts to estimate the extent of the problem. In one survey, 67% of the participants expressed the view that inaccessible data hinder scientific progress. Continue reading