Joint calling of the ExAC publications

ExAC publications in Nature

We report this week in Nature and Nature Genetics the first publications from the Exome Aggregation Consortium (ExAC), a project that has generated the largest catalogue to date of variation in the protein-coding regions of the genome (known collectively as the exome), aggregating sequence data from over 60,000 individuals from across 21 research studies. Most importantly, they have provided a publicly accessible database (https://exac.broadinstitute.org), which has already become a critical resource for research and clinical studies. While an estimated over 1 million individuals have been exome or whole genome sequenced, only a small fraction of this data has been made publicly available, as there are many challenges to sharing and providing open access to these datasets. We applaud the authors for recognizing this need and meeting these challenges.

This work comes 15 years after we published the Human Genome Project, and follows in a series of community resources to catalogue variation in human genomes within and across populations. We continue to support these efforts, recognizing the necessity of developing these resources to further studies to understand the information encoded in our genome, genetic variation and genetic basis of disease.

Mapping ExAC publications

Very rare genetic variation: a first look

The scale of this sequencing dataset in ExAC has provided some of our first glimpses into very rare genetic variation across populations, with several important early insights. Firstly, the authors identify more than 7.4 million high-confidence genetic variants, on average one every 8 bases, the majority of them entirely novel (not present in any existing database) and extremely rare (more than half of the variants are seen only once across all 60,706 samples). Second, they are able to document recurrent rare mutations emerging independently, providing an estimate of the frequency of recurrence, never observed systematically before due to the need for such large sample sizes. Third, they are able to examine the level of selective constraint against protein-truncating variation, identifying 3,230 genes that appear highly loss-of-function-intolerant. Reassuringly, this includes most known human haploinsufficient disease genes, however 72% do not yet have an established human disease phenotype. While some of these genes may be associated with weaker phenotypes or embryonic lethality, this points to how much more we have yet to understand about the phenotypic consequences of loss of function in human genes.

Copy number variation in ExAC

In a companion paper in Nature Genetics, Douglas Rudefer, Shaun Purcell and colleagues examine rare copy number variation (CNV) with the ExAC dataset, specifically the rates and properties of genic CNVs with <0.5% frequency. They use their previous method XHMM to characterize CNV calls from this exome sequencing dataset. They find that ~70% of individuals carry at least one rare genic CNV, with an average of 0.81 deleted and 1.75 duplicated genes. The authors also estimate relative intolerance to CNVs for each gene. This CNV dataset is incorporated into ExAC and will be useful for continuing population and disease association studies, together with other measures of genic intolerance, and the authors provide an example of this in analysis of a schizophrenia case-control study.

Clinical genetics: classifying pathogenic variation

The current work also brings an important message for clinical genetics in the need for reexamining the literature on classifying pathogenic variation for rare disorders. The average ExAC participant harbors ~54 variants that have previously been classified as causal for a disease, and considering the ascertainment of the study it is likely that most of this may be due to misclassified variants.

Using ExAC as a reference panel for classifying disease relevant variation, Lek et al. review the evidence for pathogenicity of 192 previously reported pathogenic variants for rare Mendelian disorders. Only 9 of these variants had sufficient support for disease association, with a high proportion of these variants present at an implausibly high frequency in the ExAC dataset. This suggests that many of these were false positive associations and incorrectly classified as pathogenic, the implications of which are not merely academic, as these findings are often used in clinical diagnoses and treatment.

In two additional companion publications, the authors take this a step further and demonstrate what is needed to move towards resolution of the nature of these prior associations, by bringing together large case series combined with ExAC. Walsh et al. (Genetics in Medicine, 10.1038/GIM.2016.90 published online August 17, 2016) systematically reexamine evidence for genes implicated in cardiomyopathy, one of the most common and severe rare disorders, and find many well known purported cardiomyopathy genes do not show support for pathogenicity, including some that are included in routine clinical genetic testing. Similarly, Minikel et al. collect 16,025 prion disease cases, the largest case series ever available for prion disease, for which ~10-15% of cases are estimated to be caused by mutations in the PRNP gene. They find a number of variants in PRNP thought to be pathogenic and with high penetrance appear to be likely benign (Minikel et al. Science Translational Medicine 10.1126/scitranslmed.aad5169). This led to a corrected patient diagnosis soon after this report, as Robert Green explained in his Perspective accompanying this publication (Lebo et al. Science Translational Medicine 10.1126/scitranslmed.aad9460).

These findings highlight the necessity to carefully evaluate the literature for rare genetic disorders. This also reinforces the value of large reference panels such as ExAC for filtering variants seen in patient exomes, a practice most of the genomics community has adopted in establishing standards for assessing sequence variants in human disease (MacArthur et al. Nature 508, 469–476 (2014), 10.1038/nature13127). The ExAC project continues to expand in size, hoping to increase to more than 120,000 exome sequences over this next year, as well as 20,000 whole genome sequences, bringing additional sample size, diversity and exploration of non-coding regions that will aid these efforts.

ClinVar and contributing to variant interpretation databases

This project, which relied on the willingness of many large research consortia to provide their raw data, demonstrates the extreme value of promoting the sharing, aggregation and harmonization of genomic data. This is true also for patient genetic variants, as there is a need for databases that provide greater confidence in variant interpretation. NCBI’s ClinVar database, which accepts contributions of clinically annotated genetic variation from clinical labs, clinicians and researchers, has become a key resource for clinical variant interpretation.

Improvements to the landscape of clinical genetics will require continued investment in such variant databases, continued expansion of human genetic reference panels, as well as efforts to link these to phenotype data. Recontacting to obtain phenotype data will be trialed on a subset of the ExAC dataset where consents allow, while new initiatives such as the UK 100,000 Genomes Project and the US Precision Medicine Initiative will also provide linked genome and phenotype information. Finally, enabling the ethical sharing of linked genetic and clinical data without violating participant privacy will require fundamental innovation in regulation and ethics policy, work that has been started by bodies such as the NIH and the Global Alliance for Genomics and Health, but around which considerable uncertainty remains.

Discovery of a gene for heart and gut rhythms

heartbeatWhat do your heart and gut have in common? More than you might think. A new study by Gregor Andelfinger and colleagues has found that a single gene, SGOL1 (Shugoshin-like 1), is required for the normal rhythms of both the heart and intestine.

The study’s co-authors found 17 patients with dysrhythmias of both the heart and intestine, termed sick sinus syndrome (SSS) and  Chronic intestinal pseudo-obstruction (CIPO), respectively. SSS is a term for a type of cardiac arrhythmia. Though it’s very rare in children or young adults, it is more common in the elderly and generally requires the patient to have a pacemaker implanted. CIPO occurs when the intestines stop their usual rhythmic pulses, and food can no longer pass through the digestive tract on its own. Both conditions are extremely rare as inherited disorders, so finding both disorders in these 17 patients was a truly remarkable discovery.

All affected patients in the study shared the same homozygous variant, which resulted in changing a lysine to a glutamic acid at a conserved residue. The new syndrome was named Chronic Atrial and Intestinal Dysrhythmia (CAID).

We asked one of the study’s lead authors, Gregor Andelfinger at Sainte-Justine University Hospital Research Center in Montreal, to tell us a little more about the work:

How did you become involved in studying CAID?

Map of Canada (New France) in North America 1703

Map of Canada (New France) in North America 1703{credit}Wikipedia{/credit}

We have an excellent collaboration across our provincial biobank for congenital heart disease in Québec and exchange regularly among colleagues. We now have more than 3,000 deeply phenotyped participants in our biobank—both affected and unaffected family members—and when my colleagues told me about an unusual co-occurrence of SSS and CIPO in a couple of cases, we quickly fanned out and a side project suddenly got to center stage in the lab. We were surprised to see how many patients we found in relatively short time for a previously undescribed disease. Obviously, we would be very eager to learn from other groups whether they have encountered similar rare patients, and would love to cooperate! Let’s not forget that this type of research always has a human face, and this is what motivates our group in the first place.

What would you say was the most unexpected aspect of this research? 

Everything in this project was unexpected! On the clinical side, the emergence of a generalized automaticity disorder in humans was totally unanticipated. On the molecular side, one of the biggest surprises certainly was how wrong we all were with our thoughts on what could be the causal gene. Virtually all members in the lab placed their bets on ion channels, a priori the most likely suspects. As you know, we were all proven wrong and had to go back to rethink how this disease arises. We were again surprised how a completely new picture emerged when we finally put all the pieces of the puzzle together—from genetics, populations and cell biology to disease.

How does the finding of SGOL1 mutations in these rare cases help inform the biology of CIPO and SSS more generally?

When doing my literature search, I was very surprised that one of the discoverers of the sinus node [the heart’s pacemaker tissue], Arthur Keith, had already drawn parallels between cardiac and gut pacemaking in an article in 1915 [PDF]. The recent literature suggests a role for TGF-β signaling as a driver for fibrosis in channelopathies and arrhythmias, and obviously this could very well be an important pathway through which a progressive destruction of pacemaking tissues takes place (for example, see papers here and here). Remember that we can clearly show that all patients in our series were normal at birth and developed disease only at later stages. On the other hand, we also have evidence that some ‘developmental anomalies’ are present in CAID patients, since the malformed gut pacemaking system probably was present from birth on, with initially normal function. I think that we are dealing with an overlap of developmental and acquired phenotypes, and that a similar process takes place in isolated SSS and CIPO, even if we could not detect SGOL1 mutations in the isolated forms of disease. Beyond this, I think the monogenic nature of the CAID phenotype tells us that all pacemaker cells need the cohesin complex. I would not be surprised if we found at least two non-canonical roles for SGOL1 in the future, one driving the developmental, and the other one driving the acquired part of disease, and that these disease pathways are at least partially shared in isolated SSS and CIPO. ‘Shugoshin’ means ‘guardian spirit’ in Japanese, so this is a very apt name for functions of this gene beyond its known function of protecting sister chromatids

What do cardiac and intestinal pacemakers have in common, and what could make them particularly vulnerable to mutations in a cohesin complex member?

First, they are both relatively small organs. An adult sinus node is approximately 15 x 5 x 1.5 mm long, probably not more than 50,000 cells. Second, both organs are non-uniform and comprise different cellular subtypes, and third, they have to be in a very particular place to efficiently perform their function. Fourth, and very importantly, cells in both organs are capable of automaticity. What could the cohesin complex have to do with these commonalities of different pacemakers in the human body? For the known functions of cohesin, in particular cell division, I speculate that a defect could directly influence how many cells will be available to form a certain organ. However, apart from the smaller myenteric plexuses we found in CAID patients, we do not have direct experimental evidence for this. Of course, this could also affect subpopulations within these organs, the second organ property I alluded to above. Ageing and loss of cells over time may also come into play in this intricate balance.

I am at a loss to come up with a valid hypothesis how a dysfunction of the cohesin complex would lead to the misplaced myenteric plexuses we found in CAID patients. As far as the fourth commonality between cardiac and intestinal pacemakers is concerned, we know that automaticity is mainly generated due to spontaneous depolarizations. The channels responsible for this phenomenon are mainly the HCN-channels and SCN5A, but calcium transients also participate in this. Given that cohesin plays an important role in transcriptional regulation, it is conceivable that some target genes are not correctly expressed when SGOL1 is mutated, either in time, space or quantity. Several recent studies on cohesinopathies point out that higher-order chromatin architecture organization has to be tightly regulated for normal gene expression, and I speculate that a dysfunction of SGOL1 could lead to problems with ion channel expression and thus be one of the key factors why we see this exquisite target organ specificity.

Can you say a little about the FORGE Canada consortium and how your research relates to its mission?

Care4RareThe FORGE Canada (Finding of Rare Disease Genes) was launched on April 1, 2011 and brought together clinicians from all 21 Clinical Genetics Centres representing every province, as well as clinicians from 17 countries. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. The outcome of this project was recently published in an article in AJHG. This project has a successor, Care4Rare, which is a pan-Canadian collaborative team building upon the infrastructure and discoveries of the FORGE Canada (Finding of Rare Disease Genes) project.  The goal of CARE for RARE is to improve clinical care for patients and families affected by rare diseases.  I think the great success of these projects also stems from their openness to collaborators like our group – this is the way it should be, and since my lab is working on several rare disease traits, we have benefited greatly from their help.

 

You can read the full paper here on the Nature Genetics website.