Mass spectrometry-based proteomics at Nature Methods

A look back at highlights in proteomics technology developments published in Nature Methods.

The last decade has seen amazing advances in mass spectrometry-based proteomics technology as well as ever-expanding use of the technology for varied biological applications. Here we take a look back at some proteomics technology development highlights published in Nature Methods over the last 10 years. (A second entry covering biological applications of mass spectrometry-based proteomics is planned for the near future; stay tuned.)

Sample preparation

The first step in a successful proteomics experiment is sample preparation. In 2009 Matthias Mann’s lab published a filter-aided sample preparation (FASP) method that is widely used by the proteomics community. In 2014 the same lab published an optimized approach that performs all sample processing tasks in a single enclosed tube.

Proteins are digested into peptides for ‘shotgun’ proteomics analysis. While trypsin is most widely used, it also comes with known limitations. Albert Heck and colleagues and Neil Kelleher and colleagues described useful alternatives to trypsin.

Proteomics researchers are always striving for higher sensitivity. John Yates’s lab’s DigDeAPr method and Bernhard Kuster’s lab’s use of DMSO to enhance electrospray response allow researchers to do deeper proteomic analysis.

Quantitative methods

Proteomics researchers want to quantify, as well as identify, peptides and proteins. Stable isotope labeling, either through metabolic incorporation or chemical labeling during sample preparation, enables researchers to quantitatively compare multiple samples. Spiking in labeled concatenated signature peptides into samples enables absolute quantification, as shown by Robert Beynon and colleagues.

The SILAC metabolic method has proved to be extremely popular, and we have published applications of SILAC for quantifying proteins and phosphorylation sites in human tissues, and in nematodes (Larance et al. and Fredens et al.).

A limitation with SILAC is that it cannot be used to compare more than three samples at one time. Joshua Coon and colleagues provided a clever way around this with their NeuCode SILAC approach, which in theory could enable up to 39-plex experiments.

Chemical labeling approaches (such as iTRAQ and TMT) currently offer higher multiplexing capability than SILAC, but can suffer from problems of quantitative accuracy. Coon’s lab and Steven Gygi’s lab each provided methods to obtain accurate quantitative data in multiplexed experiments.

Shotgun data analysis

In a typical ‘shotgun’ proteomics (discovery-based) experiment, MS/MS fragmentation spectra are generated for all peptides that can be detected by the mass spectrometer. The proteins are identified by matching these experimental spectra to theoretical or actual MS/MS peptide spectra found in databases. Well-performing tools to do this and methods to control for false discoveries are therefore crucial.

To generate good proteomics data, one must tune the mass spectrometer to the best of its ability. The HCD method from Stevan Horning and Matthias Mann and colleagues and a decision tree algorithm from the Coon lab enable researchers to obtain improved MS/MS data for protein identification.

We have published tools for peptide identification – PercolatorSpectraST, and MS-Cluster – and quantitative data analysis (Census). Lennart Martens’ group showed that combining various data processing workflows leads to greater proteome coverage. Proteogenomics-type approaches using custom databases generated using genomic data are becoming popular as they allow novel peptides not found in standard protein databases to be identified (see Evans et al. and Branca et al.).

Researchers must be careful to not overinterpret their proteomics data. Gygi’s lab wrote a useful Perspective on the target-decoy approach for determining false discovery rate, a metric that has become broadly adopted by the field.

In order to keep tools sharp and highlight areas for development, it is important to systematically put them to the test. In 2005, Gygi’s lab performed a comparison of three platforms. In 2009, a large group of researchers tested their ability to identify proteins in a small test sample. This analysis highlighted common problems that occur especially during data analysis in proteomics investigations.

Targeted proteomics

Targeted proteomics, which we chose as our Method of the Year in 2012, offers a fundamentally different way of analyzing data compared to discovery-based proteomics. Targeted approaches, most commonly selected reaction monitoring (SRM), utilize mass spectrometry assays to identify and quantify peptides selected to represent proteins of interest, akin to Western blotting, but in a multiplexed fashion.

These SRM assays can be laborious to generate, however. Methods for high-throughput SRM assay generation are therefore important (see Picotti et al.Stergachis et al. and Kennedy et al.). In 2008 Ruedi Aebersold’s group set up a database of assays for the yeast proteome, called SRMAtlas, which has since grown to include assays for M. tuberculosis and human. Amanda Paulovich and colleagues just this year presented the CPTAC Assay Portal, a new repository of analytically validated targeted proteomics assays.

As in discovery-based proteomics, statistical validation in targeted proteomics is equally important. Aebersold’s lab developed the mProphet tool and also provide a useful guide to SRM in their 2012 Review.

Biological applications of targeted proteomics are growing. Bart Deplancke and colleagues showed that transcription factors could be followed during cellular differentiation using SRM. Olga Vitek’s group showed that targeted proteins could be quantified using sparse reference labeling. In this current 10th Anniversary issue, Claus Jørgensen’s group reports a quantitative method for monitoring human kinases, and Paola Picotti’s lab describes a panel of assays to quantify ‘sentinel’ proteins reporting on 188 different yeast processes.

Data-independent analysis

Our very first issue in October 2004 featured an interesting paper from Yates and colleagues describing a data-independent mass spectrometry scanning approach for acquiring MS/MS spectra. In contrast to the common data-dependent approach, where the most prominent peptide ions are selected for MS/MS, the data-independent approach can enable more reproducible results as it overcomes issues of peptide ion sampling stochasticity. It took nearly a decade for this clever idea to really catch on, but within the last year or so, we have published practical data-independent analysis implementations from Michael MacCoss’s and Stefan Tenzer’s labs.

Anne-Claude Gingras and Stephen Tate and colleagues, along with Aebersold and colleagues, showed how a quantitative targeted data-independent analysis method called SWATH provides advantages for analyzing protein interactomes by affinity purification-mass spectrometry.

We look forward to many more strong advances in mass spectrometry-based proteomics in the decade to come!

An all-encompassing term to describe protein complexity

Neil Kelleher and Lloyd Smith propose that the scientific community adopt the term ‘proteoform’ to refer to all the different forms that a protein can take. Will the community adopt it?

The field of top-down proteomics, in which intact proteins are analyzed by a mass spectrometer, provides rich information about the genetic variations, alternative splicing and post-translational modifications that can be lost in a bottom-up proteomics approach (where proteins are digested into peptides prior to analysis). An unsolved problem in the top-down field, however, has been what exactly to call these various protein forms. Besides ‘protein forms’, a handful of other terms have been batted around in the literature, including ‘protein variants’, ‘protein isoforms’ and ‘protein species.’

In a Correspondence in the March issue of Nature Methods, Neil Kelleher and Lloyd Smith lay out the reasons why none of these terms are satisfactory. What is needed, they argue, is a novel, unique, intuitive, single-word term with a precise definition that is all-encompassing in describing protein complexity, and is also compatible with a gene-centric approach to protein naming. They believe that they have the perfect term: proteoform.

“It’s not just a term, it’s a movement,” says Kelleher. Kelleher has been one of the key drivers of top down methodology development, and argues that using a controlled vocabulary to describe proteins will serve a catalytic role in moving the field forward. “The implicit thing about this term is that it puts a focal point on the fact that [the proteoforms] are the functional players, insofar as protein primary structure is concerned,” he says. Especially in clinical research, he notes, different proteoforms are tied strongly to function and phenotype.

Kelleher and Smith have been gathering support for their term over the last several months by introducing it at conferences and inviting researchers to comment on a LinkedIn forum. The term also has the full support of the Consortium for Top Down Proteomics. At their latest conference in Florida, about a month ago, Kelleher says that “everyone” was using “proteoform” in their talks. “It just catches on…it fills a void the rolls right off the tongue at conferences and sits well in the gut while digesting text,” he says. The consortium website maintains a repository of proteoforms, which they hope will grow. Kelleher also notes that the term is being embraced by key protein informatics players at UniProt and the Protein Information Resource, both of which have adopted a gene-centric approach to protein naming.

What do you think about the term “proteoform”? Will you adopt it? We’d love to hear from you!

A different kind of Method of the Year for 2012

Our choice of Method of the Year in prior years has tended to be methods that generally didn’t even exist only a few years earlier but which had quickly bounded onto the scientific stage and attracted the attention of a large portion of the scientific community. Targeted proteomics, our choice for 2012, on the other hand has existed for years in scaled-down forms using methods based on antibodies. Western blotting, immunofluorescence, antibody arrays, etc. can all be used to detect and measure targeted subsets the proteins expressed in cells and tissues.

During this time the workhorse of proteomics, the mass spectrometer, has been used mostly for shotgun proteomics experiments in which the goal was to analyze all the proteins in a sample. But the means to use these machines for targeted detection of defined subsets of proteins and obtain more reproducible measurements than shotgun experiments can typically provide have been around for decades.

Shotgun methods have been mostly confined to specialist laboratories as many biologists have been intimidated by the complexity of implementing and analyzing these experiments properly. Targeted proteomics on the other hand offers a tantalizing opportunity to bring a sampling of the power of mass spectrometry to the wider community of biologists. The assays are simpler, easier to run and well suited to the hypothesis-driven experiments that are the mainstay of biological research.

The ubiquitous Western blot has long filled a central role or functioned as a crucial control in many research studies. Unfortunately performing a high-quality Western blot can feel a bit like roulette. Sometimes you get a fantastic looking blot with an accurate antibody but other times either the blot is blank, the bands may look like they ran through some carnival ride or it might suffer from any number of other problems. This might prompt people to either look for a goat to appease the Western blot gods or take unscientific liberties with the presentation of the data in order to make it look like they are believe it should. It also lessens the likelihood that important replicates are performed or reported.

Targeted mass spectrometry offers the possibility for thousands of labs to move away from, or supplement, Western blots; and improve the quality and quantity of their protein measurements. This is not as sexy as next-generation sequencing, super-resolution imaging or optogenetics, some of our prior choices of Method of the Year, but the potential for revolutionizing an arguably mundane but indispensable technique was compelling enough that it played no small role in our decision. Only time will tell what impact the method has and we eagerly look forward to the answer.

To share or not to share

Many in the mass spectrometry community agree that MS data should be made publicly available for everybody’s benefit. All data, including the raw files generated by the mass spectrometers.

In the May editorial we support this request and introduce a new raw data repository run by the EBI that offers to replace the declining TRANCHE, up to very recently the only repository for such data.

Several good reasons can be made for making raw data available – one of them is the re-analysis of published data to validate claims. For example, the controversy arising in the wake of the analysis of fossilized Tyrannosaurus rex bones by Asara and colleagues  which led them to suggest that T. rex is more closely related to birds than to reptiles (Asara et al., Science 2007).  Their findings were finally corroborated in 2009 (Bern et al.J. Proteome Res.) but could have been examined much quicker,  if access to raw data had been given at the time of publication.

Re-analysis aside, raw data present a treasure trove of information that can be examined from different angles and, over time, with new tools that bring aspects to light that the original experimenters did not think of.  To create such new analysis tools, software developers rely on raw data to benchmark against established techniques.

Having access to raw files does not mean that they are easy to use – we realize that the diversity in file formats and the difficulty in converting one file type to another makes their analysis not as straight forward as it could be with a single community supported format.  And we also realize that these files are large and uploading them to the new EBI, or any other repository, will take time and some effort, particularly if important meta data about the experiment are included.

Still, we think the effort is worth it to ensure the field can move forward.  We’d love to hear your views, particularly if you disagree.

Wishing and hoping for mass spectrometry

In this month’s Technology Feature several key developers of mass spectrometry technology share their wishes for the future of this technology. Did your hope for the next innovation in mass spectrometry make the list? If not, here is your opportunity to add wishes, hopes or comments – and let the developers know what advances you would like to see in the coming years.