Learning every way to break a gene

From Fig. 1 in Majithia et al. Nature Genetics 2016

Finding the genetic cause of a disease—a mutation or genetic variant—is a lot like looking for a needle in a haystack. Except in the case of exome sequencing, it’s not always clear what a needle even looks like.

When a clinician finds a protein-altering variant in a gene known to cause disease, it could be the cause of the patient’s disease…or it could be nothing. This is the definition of a variant of uncertain significance (VUS). VUS’s often stay unknown unless someone puts in the time and resources to functionally characterize the variant. For obvious reasons, functional characterization of each individual VUS of every gene implicated in human disease is not practical.

One way to determine which variants cause disease and which don’t is to look at the DNA of many healthy individuals. If a healthy person carries a variant, it is unlikely to cause disease. The Exome Aggregation Consortium (ExAC) is the largest such effort—over 60,000 people have contributed their exome sequences to ExAC and thus provided a unique resource for clinicians to de-prioritize specific VUS’s as causes of disease.

But what if your variant is too rare or isn’t in ExAC?

Amit Majithia, David Altshuler and colleagues from the Broad Institute of Harvard and MIT developed a different strategy, published last week in Nature Genetics. The authors chose a gene, PPARG, that can cause Mendelian lipodystrophy when mutated. Some variants in this gene can also increase the risk of developing type 2 diabetes. PPARG also has many VUS’s in the population.

To find out which variants are likely to be pathogenic, the authors constructed a library of all 9,595 possible protein-altering variants of PPARG and tested them in pools using a functional assay in human macrophages. Cell pools that showed a positive result were sequenced and the numbers of each variant were counted, allowing the authors to assign a function score to each variant. These scores were then used, along with known benign and pathogenic variants from the prior literature, to train a machine-learning algorithm that could then classify each variant as pathogenic or not. The classifier found 6 new likely pathogenic variants, which were then validated through additional tests.

Summary of strategy used by Majithia et al.

The strategy used by Majithia et al. could potentially be used by other researchers to study other proteins implicated in disease. Although it is somewhat of a brute-force approach to test every possible variant, the use of a pooled functional assay and computational classifier increases both the efficiency and accuracy of the result. We asked Dr. Majithia to tell us a little more about this study.

Author Q&A:

Why did you choose to focus your study on PPARG?

In the diabetes community PPARG is a very important and storied gene. It has been linked to both common type 2 diabetes and rare familial forms. PPARG is also the target of multiple FDA approved drugs to treat diabetes. So on one hand, PPARG has been studied in humans and in the lab for two decades. On the other hand, we showed in 2014 (Majithia et al. PNAS) that even though PPARG had been sequenced in humans for so many years, we had only scratched the surface of the possible missense mutations that people in the general populations carry. Most of these mutations were benign, some strongly increased diabetes risk, and it took laboratory experiments with each and every mutation to sort them out. So PPARG was the perfect test case for our prospective experimental approach to test all possible missense mutations: it is relevant to common and rare genetic disease, has mutations of known function we could use for validation of our method, and has many unknown mutations, i.e. variants of uncertain significance that need to be functionally characterized.

Do you think that a similar strategy could be employed for other genes with many variants of unknown significance in the population? What would be the major challenges for applying this strategy to other genes?

Absolutely. A major purpose of this study was to demonstrate proof-of-concept that other investigators and clinicians could utilize for VUS in other genes/diseases. In principle there are three challenges in applying a prospective experimental approach to generate a “lookup table” for missense VUS: 1) making every possible missense mutation 2) building an assay with the scale and throughput to study every missense mutation and 3) connecting the lab experiments to what happens in people (i.e. phenotypes)

The mutation synthesis technique we use, which was pioneered by Tarjei Mikkelsen, applies to any gene. Other groups like Jay Shendure’s in Seattle have independently developed methods to mutate genes at scale and now companies like TWIST Biosciences offer high quality mutation libraries for any gene. This is no longer a barrier to entry.
Genes have myriad functions and so building an appropriately scaled, high throughput readout is a gene by gene process. No single assay can test the function of every gene, but there are partially generalizable strategies that can be used to study certain classes of genes. The strategy we used for PPARG, combining reporter gene expression and FACS, could be deployed for any gene that activates transcription. In fact we are taking this approach with another diabetes relevant transcription factor, HNF1A.
Establishing clinical relevance of saturation mutagenesis data is a critical step. In our study we set a criterion for our assay, that in order for us to be able to discriminate variants of “unknown” significance, we should be able to accurately discriminate variants of “known” significance. For PPARG we benefitted from decades of research that had resulted in a series of missense variants with known function and disease effect. Many genes do not have such “allelic series” but with the increasingly widespread use exome/genome sequencing our knowledge of allelic series for genes is rapidly growing.

From your perspective, was the most surprising aspect of the study?

To independently prove the findings from our “lookup table” our collaborators in Cambridge took a series of VUS from patients referred to their clinic and tested them in single variant assays that are the standard for PPARG functional testing. In one case, our “lookup table” had a major discrepancy from the single variant assay. Careful follow-up work by our Cambridge colleagues resolved that the single variant assay that has been used for decades was actually incorrect and the “lookup table” made the correct call. We were surprised that, in this case, our new, high throughput assay had higher fidelity with human biology and it introduces a degree of nuance into what we consider “gold standard” when discrepancies inevitably arise.

How do you see your findings impacting on future research?

We hope this study will have its largest impact as a proof-of-concept for other genetic diseases and VUS interpretation. Our method is capable of improvement. For example, it does not assess missense effects on gene splicing, but others in the community are developing complementary methods that will overcome this (for example, see this paper from Gregory Findlay et al. published in Nature). We are particularly excited that our lookup table approach opens the door to systematically characterizing drug-by-genotype interactions that could be useful to guide treatment.

Free Association

a blog from Nature Genetics

Learning every way to break a gene

Leave a Reply Cancel reply