Enabling the effective sharing of clinical data

This blog was written by Mathias Astell & Iain Hrynaszkiewicz and was originally published on the DNAdigest blog.

The benefits of sharing data generated by researchers have long been understood to be of great value to science (as exemplified by this British Medical Journal piece from 1994). And over recent years there has been a rapid increase in the ability to share and access research data – as can be seen in the rise of data journals (such as Scientific Data and Gigascience), the increase in research data repositories (both general and subject-specific), and the establishment of data sharing policies around the world. Continue reading

Author’s corner: Sharing proteomics data to build community-based resources

Ruedi Aebersold & George Rosenberger photo

{credit}Ruedi Aebersold & George Rosenberger{/credit}

Guest post by Ruedi Aebersold, Professor of Systems Biology with a joint appointment at ETH Zurich and the University of Zurich, & George Rosenberger, PhD student in the Aebersold group at the Institute of Molecular Systems Biology, ETH Zurich.

Mass spectrometry-based proteomics is a data-intense research discipline that primarily aims at identifying and quantifying the proteins that constitute the proteome1. This is achieved by generating large numbers (104 to 106) of fragment ion spectra that represent peptides generated by proteolysis of the respective proteome. Mass spectrometers can operate in different data acquisition modes, referred to as data-dependent acquisition (DDA), targeted acquisition exemplified by selected reaction monitoring (SRM) or data-independent acquisition (DIA)2 exemplified by SWATH-MS3,4. Specific software tools then generate from these raw data processed mass spectra – from which sets of identified peptides, proteins and their abundance are inferred and annotated with metadata. Both, the generation and the processing of such raw data sets are resource and time intensive.  Further, if unique, irreplaceable samples are being analyzed, as is often the case with clinical cohorts the data cannot be re-generated. Therefore, the proteomics community has started to embrace data sharing by the means of different specialized public repositories, for example GPMDB5, PRIDE6, PeptideAtlas7 or ProteomicsDB8. For the last few years, the ProteomeXchange9 consortium has provided centralized deposition of raw data and their meta-annotation. Continue reading

Author’s Corner: Advancing the sharing and standardization of metabolomics data

Mark Viant photo

{credit}Mark Viant{/credit}

Guest post by Mark Viant, Professor of Metabolomics in the School of Biosciences at the University of Birmingham, UK, and Director of both the national NERC Biomolecular Analysis Facility – Metabolomics and the Phenome Centre Birmingham

In 2014, my research team published the first Scientific Data Data Descriptor for metabolomics measurements, Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control. This article described in great detail the many steps that are critical for ensuring the production of high quality (direct infusion) mass spectrometry (DIMS) data. It was our intention that this publication would help to establish the benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. The data was also made freely available in the MetaboLights public database for metabolomics data (dataset MTBLS79).1

Continue reading