Author’s corner: Sharing proteomics data to build community-based resources

Posted on April 26, 2016 by Mat Astell

Ruedi Aebersold & George Rosenberger photo

{credit}Ruedi Aebersold & George Rosenberger{/credit}

Guest post by Ruedi Aebersold, Professor of Systems Biology with a joint appointment at ETH Zurich and the University of Zurich, & George Rosenberger, PhD student in the Aebersold group at the Institute of Molecular Systems Biology, ETH Zurich.

Mass spectrometry-based proteomics is a data-intense research discipline that primarily aims at identifying and quantifying the proteins that constitute the proteome¹. This is achieved by generating large numbers (10⁴ to 10⁶) of fragment ion spectra that represent peptides generated by proteolysis of the respective proteome. Mass spectrometers can operate in different data acquisition modes, referred to as data-dependent acquisition (DDA), targeted acquisition exemplified by selected reaction monitoring (SRM) or data-independent acquisition (DIA)² exemplified by SWATH-MS^3,4. Specific software tools then generate from these raw data processed mass spectra – from which sets of identified peptides, proteins and their abundance are inferred and annotated with metadata. Both, the generation and the processing of such raw data sets are resource and time intensive. Further, if unique, irreplaceable samples are being analyzed, as is often the case with clinical cohorts the data cannot be re-generated. Therefore, the proteomics community has started to embrace data sharing by the means of different specialized public repositories, for example GPMDB⁵, PRIDE⁶, PeptideAtlas⁷ or ProteomicsDB⁸. For the last few years, the ProteomeXchange⁹ consortium has provided centralized deposition of raw data and their meta-annotation. Continue reading →