From the archives (2004): Large-scale structural variation in the human genome

Scherer_Lee

{credit}Iafrate et al. Nature Genetics 2004{/credit}

During the past 25 years, Nature Genetics has been lucky to publish many exciting papers, more than a few of which can be described as “landmark” papers—publications that have had a dramatic and long-lasting impact on a field. In 2004, the Journal published such a study by Stephen Scherer, Charles Lee and colleagues (Iafrate et al.) in which they reported 255 loci across the human genome containing large structural variants.

In 2017, the idea that there exist large numbers of structural variants in the genome (such as rearrangements, deletions and insertions) that differ from person to person is an established fact. But in 2004, this was not the prevailing wisdom. Prof. Scherer has already written an excellent essay at The Winnower about the study and its importance to the field, so I won’t recap it in detail here—I will simply encourage you to the read the piece.

Charles Lee wrote us about the study by email. “I saw a talk by Dr. Dan Pinkel at the 2002 ASHG meeting where he presented his latest array CGH findings,” he remembers. “In his talk, one of the slides showed the array CGH results of a trisomy 18 patient and Dan remarked how cleanly his array platform performed, especially for the other chromosomes. But in fact, I (and others, I’m sure) could see that there were actually occasional clones that deviated from the expected log2 ratio of 0. During the question period, I sheepishly asked him about these clones. I really didn’t mean to criticize his platform, but I think that he took it that way. Those “blips” bothered me and when I returned to Boston, John Iafrate (who was a postdoc with me at the time) began our own array CGH experiments. Ironically, there were several other groups that were way ahead of us with respect to technical expertise and experience with array CGH, but it could be that they considered these “blips” as technical artifacts – without biological implications.”

Prof. Lee added, “In late 2003, I gave a talk at the University of Toronto and met Stephen Scherer in person for the first time. In a casual conversation, we realized that we were both using the same 1 MB chromosome microarray platform from Spectral Genomics and that we were both seeing these recurrent ‘blips’ in our data.”

Stephen Scherer also corresponded with us by email about the study and the mutual decision to collaborate with the Lee lab. “We were both were fresh enough to look beyond what others were calling ‘noise’ to realize these aberrations represented intermediate and gene-level copy number variation.”

“Many of us suspected it was there,” he said of the large-scale variation they uncovered, “based on the fact there were lots of smaller indels and that 0.6% of the population carried cytogenetic alterations. We kind of predicted it in our chromosome 7 mapping and sequence paper, but only at the chromosomal level.”

ng1416-F1

Circles to the right of each chromosome ideogram show the number of individuals with copy gains (blue) and losses (red) for each clone among 39 unrelated, healthy control individuals. Green circles to the left indicate known genome sequence gaps within 100 kb of the clone, or segmental duplications known to overlap the clone, as compared to the Human Recent Segmental Duplication Browser. Cytogenetic band positions are shown to the left. {credit}Fig. 1 from Iafrate et al. 2004{/credit}

The study by Iafrate et al. was published on August 1, 2004. Exactly one week prior, a very similar study by Michael Wigler and colleagues (Sebat et al.) was published in Science. The methods used by the two groups were different, but the findings and implications were consistent with each other. “Charles and I were happy to see the Wigler paper,” said Prof. Scherer, “because nobody believed our results.” Prof. Lee added, “This was one of the most difficult papers for me to publish. The reviewers were very skeptical. We had to keep providing more and more validation data, and one of the reviewers even commented that s/he did not believe that the paper was worthy of being an article and we had to shortened the paper into a Brief Communication. At the end, Reviewer #2, who was persistently negative wrote: ‘… I still feel hesitant about publication of this work in Nature Genetics… and I still doubt the importance and novelty of their work.” Prof. Scherer remembers similar levels of skepticism in the community. “Prior to publication I was showing the data at talks, including one at Michigan where they were trying to recruit me, and I remember getting trashed. People in my own department were mostly the same.”

[I looked up the referee reports and internal notes from the review process and Prof. Lee is correct that at least one of the reviewers was very skeptical about the impact of the study. However, I do want to note the very unusual fact, at least by today’s standards, that the study was published a little more than 2 months after initial submission, according to our records. I wish this was more common!]

After publication, however, the importance of the studies was immediately clear, at least to those working most closely in the field. Nigel Carter contributed a News and Views article in Nature Genetics about the studies. He wrote, “This unexpected level of LCV [large-scale copy-number variation] forces us to re-evaluate our view of the structure of the normal human genome.”

However, Prof. Lee remembers some ongoing skepticism about the work. “For more than 18 months after the paper was published, I had trouble getting grant funding for continuing my work in human copy number variation. Some comments that I received included, ‘If this was real, the Human Genome Project would have found it.’ I am embarrassed to say that I was forced to write for smaller grants on other topics and when funded, did everything I could to complete the projects using less money and use the ‘extra’ funds for my human copy number variation interests. It was very, very frustrating.”

In 2007, Science announced Human Genetic Variation as the Breakthrough of the Year.  “When I saw this article in Science,” Prof. Lee said, “I felt like there was finally some widespread acceptance of our findings in the general scientific community.”

“However, this came with different issues.” For example, he often received the response from the GWAS community that structural variation is interesting, but it is too difficult to incorporate into GWAS. “So, most association studies continued to focus on SNPs, which is a problem that persists to this very day.”

The findings in Iafrate et al. were based on, by today’s standards, a fairly small sample of 55 individuals profiled by array comparative hybridization array comprising ~12% of the genome (the study in Science reported results from 20 individuals using representational oligonucleotide microarray analysis). However, the impact on the field was anything but small. Part of the legacy of the studies was the establishment of the Database of Genomic Variants (originally the Genome Variation Database) that has now collected over 550,000 CNVs. The discovery that so many structural variants are present in our genomes, even in healthy individuals, opened up an entire field of study to understand the function of these variants, and much is still to be discovered (see for example a recent study on the impact of structural variation on human gene expression).

Prof. Scherer summed up the impact of the studies this way: “If you remember the fights between the public Human Genome Project and Celera Genomics, and them finger-pointing to the errors in each other’s assemblies, in many cases these were due to CNV and other structural variations. They had no idea these CNV variants existed. It was really the 2004 Nature Genetics and Science papers, coincident, pure discovery, that opened the eyes of the community and it took some longer than others to believe it.”

The genetic syntax of febrile seizures

The genetics of seizure disorders, including epilepsy, has recently come into the spotlight (see the Nature Outlook on epilepsy). Epilepsy is a complex disease with many different subtypes, both sporadic and familial. While epilepsy is one of the most common neurological disorders, and it has been studied for a very long time, the underlying mechanisms of seizure disorders remain largely elusive. Identifying the genetic causes of different subtypes of the disorder can help to illuminate the gene networks involved and lead to a deeper understanding overall. Importantly, the genetic tools now exist to identify causal mutations for the many different subtypes of seizure disorders.

Febrile seizures, which are induced by fever, affect approximately 2-4% of children worldwide. This type of epileptic seizure is often triggered by infectious disease, but there is strong evidence that it has a genetic basis. A paper recently published in Nature Genetics by Bjarke Feenstra identified two genes associated with vaccine-induced febrile seizures (vaccines, such as MMR, are an extremely rare cause of febrile seizure).

Protein model for STX1B

Protein model for STX1B{credit}Wikipedia{/credit}

Now, a study by Holger Lerche, Camila Esguerra and colleagues identifies variants in the gene STX1B as causing a familial form of febrile seizure disorder. STX1B encodes a protein called syntaxin-1B. Syntaxin-1 is a key component of a protein complex necessary for the release of neurotransmitters from the presynaptic membrane.

The authors first identified two families in Germany with a history of febrile seizures. They used a combination of whole-exome and whole-genome sequencing to identify the gene most likely to harbor pathogenic mutations causing the disorder. Targeted sequencing in an extended cohort identified further variants in STX1B in patients who had experienced febrile seizures.

To validate these findings, the authors tested the function of stx1b in zebrafish, and showed that a reduction in syntaxin-1B led to behavioral defects in the fish, such as lack of touch response, fin fluttering and jerking movements. Recordings of brain activity confirmed that the fish were experiencing epilepsy-like symptoms. You can read a more in-depth summary of the paper in a blog post at Beyond the Ion Channel by one of the study’s co-authors. 

We asked one of the study’s senior authors, Holger Lerche, to tell us a little more about the background of this study:

How did you initially become interested in studying seizure disorders?

I was working during my thesis with mutated ion channels in rare muscle diseases. When I started with my Neurology training, epilepsy emerged as a highly interesting topic in that field as well, and also clinically I became very interested in epilepsy.

How did the two families in this study first come to your attention?

The index case of the first family was referred to me during a cooperation with the Children’s Hospital (at that time at the University of Ulm), when I was looking for familial cases with epilepsy for genetic studies. When I called his grandmother, it turned out to be a large pedigree further increasing when contacting and visiting the different branches of the families. The second family was referred to my colleague Yvonne Weber for similar reasons from another Children’s Hospital in Germany.

STX1B mutations have been associated with other forms of epilepsy. How does the association with febrile seizures further the understanding of this gene’s function?

The function of this gene has been explored very well already by Nobel Laureate Thomas Südhof and his group. The mutations we detected may teach us more about the functional role of different protein domains and their interaction with other proteins in the vesicle release machinery. It is not surprising that mutations in STX1B cause epilepsy, but how febrile seizures develop is still an enigma. Follow-up studies of our discovery may shed light on the unknown temperature-sensitive mechanisms leading to febrile seizures.

Do you think there is the potential for developing drugs targeting STX1B in these patients?

The question is how the loss of function of one allele of STX1B could be compensated. If targeting STX1B to enhance its production or activity is possible, and if this may help these patients, is difficult to predict. However, the zebrafish model can also help us to find therapies which work in a completely different way to compensate for STX1B failure (see answer to next question).

Can you say a little about why you chose zebrafish as a model, and what you learned from this model organism that you wouldn’t have been able to learn otherwise?

We started only recently to collaborate with Camila Esguerra and Alex Crawford who have the zebrafish facilities and expertise. It is a vertebrate, easy to study and very quick to manipulate (much quicker and easier than mice).

Behavioral assays (left) and electrographic recordings of zebrafish brain (right)

Behavioral assays (left) and electrographic recordings of zebrafish brain (right){credit}Courtesy of Alex Crawford and Camila Esguerra{/credit}

To establish a cellular model for functional proof of these mutations would have been more difficult in our case. And the zebrafish is an in vivo model, so we can study behavior and EEG, which is not possible in a cellular assay. Also the temperature effect could be studied very nicely with an effect on EEG in an in vivo system.Last but not least, and most important when thinking of the impact of our work: zebrafish models can be used to find new drugs in medium to high throughput screens using seizure-like behaviour or EEG as read-outs. This allows us to find different kinds of drugs that are able to antagonize the consequences of the STX1B defect on a system-wide level.

Read the full study by Lerche and colleagues here. You can also read more about this work here [press release]. 

Discovery of a gene for heart and gut rhythms

heartbeatWhat do your heart and gut have in common? More than you might think. A new study by Gregor Andelfinger and colleagues has found that a single gene, SGOL1 (Shugoshin-like 1), is required for the normal rhythms of both the heart and intestine.

The study’s co-authors found 17 patients with dysrhythmias of both the heart and intestine, termed sick sinus syndrome (SSS) and  Chronic intestinal pseudo-obstruction (CIPO), respectively. SSS is a term for a type of cardiac arrhythmia. Though it’s very rare in children or young adults, it is more common in the elderly and generally requires the patient to have a pacemaker implanted. CIPO occurs when the intestines stop their usual rhythmic pulses, and food can no longer pass through the digestive tract on its own. Both conditions are extremely rare as inherited disorders, so finding both disorders in these 17 patients was a truly remarkable discovery.

All affected patients in the study shared the same homozygous variant, which resulted in changing a lysine to a glutamic acid at a conserved residue. The new syndrome was named Chronic Atrial and Intestinal Dysrhythmia (CAID).

We asked one of the study’s lead authors, Gregor Andelfinger at Sainte-Justine University Hospital Research Center in Montreal, to tell us a little more about the work:

How did you become involved in studying CAID?

Map of Canada (New France) in North America 1703

Map of Canada (New France) in North America 1703{credit}Wikipedia{/credit}

We have an excellent collaboration across our provincial biobank for congenital heart disease in Québec and exchange regularly among colleagues. We now have more than 3,000 deeply phenotyped participants in our biobank—both affected and unaffected family members—and when my colleagues told me about an unusual co-occurrence of SSS and CIPO in a couple of cases, we quickly fanned out and a side project suddenly got to center stage in the lab. We were surprised to see how many patients we found in relatively short time for a previously undescribed disease. Obviously, we would be very eager to learn from other groups whether they have encountered similar rare patients, and would love to cooperate! Let’s not forget that this type of research always has a human face, and this is what motivates our group in the first place.

What would you say was the most unexpected aspect of this research? 

Everything in this project was unexpected! On the clinical side, the emergence of a generalized automaticity disorder in humans was totally unanticipated. On the molecular side, one of the biggest surprises certainly was how wrong we all were with our thoughts on what could be the causal gene. Virtually all members in the lab placed their bets on ion channels, a priori the most likely suspects. As you know, we were all proven wrong and had to go back to rethink how this disease arises. We were again surprised how a completely new picture emerged when we finally put all the pieces of the puzzle together—from genetics, populations and cell biology to disease.

How does the finding of SGOL1 mutations in these rare cases help inform the biology of CIPO and SSS more generally?

When doing my literature search, I was very surprised that one of the discoverers of the sinus node [the heart’s pacemaker tissue], Arthur Keith, had already drawn parallels between cardiac and gut pacemaking in an article in 1915 [PDF]. The recent literature suggests a role for TGF-β signaling as a driver for fibrosis in channelopathies and arrhythmias, and obviously this could very well be an important pathway through which a progressive destruction of pacemaking tissues takes place (for example, see papers here and here). Remember that we can clearly show that all patients in our series were normal at birth and developed disease only at later stages. On the other hand, we also have evidence that some ‘developmental anomalies’ are present in CAID patients, since the malformed gut pacemaking system probably was present from birth on, with initially normal function. I think that we are dealing with an overlap of developmental and acquired phenotypes, and that a similar process takes place in isolated SSS and CIPO, even if we could not detect SGOL1 mutations in the isolated forms of disease. Beyond this, I think the monogenic nature of the CAID phenotype tells us that all pacemaker cells need the cohesin complex. I would not be surprised if we found at least two non-canonical roles for SGOL1 in the future, one driving the developmental, and the other one driving the acquired part of disease, and that these disease pathways are at least partially shared in isolated SSS and CIPO. ‘Shugoshin’ means ‘guardian spirit’ in Japanese, so this is a very apt name for functions of this gene beyond its known function of protecting sister chromatids

What do cardiac and intestinal pacemakers have in common, and what could make them particularly vulnerable to mutations in a cohesin complex member?

First, they are both relatively small organs. An adult sinus node is approximately 15 x 5 x 1.5 mm long, probably not more than 50,000 cells. Second, both organs are non-uniform and comprise different cellular subtypes, and third, they have to be in a very particular place to efficiently perform their function. Fourth, and very importantly, cells in both organs are capable of automaticity. What could the cohesin complex have to do with these commonalities of different pacemakers in the human body? For the known functions of cohesin, in particular cell division, I speculate that a defect could directly influence how many cells will be available to form a certain organ. However, apart from the smaller myenteric plexuses we found in CAID patients, we do not have direct experimental evidence for this. Of course, this could also affect subpopulations within these organs, the second organ property I alluded to above. Ageing and loss of cells over time may also come into play in this intricate balance.

I am at a loss to come up with a valid hypothesis how a dysfunction of the cohesin complex would lead to the misplaced myenteric plexuses we found in CAID patients. As far as the fourth commonality between cardiac and intestinal pacemakers is concerned, we know that automaticity is mainly generated due to spontaneous depolarizations. The channels responsible for this phenomenon are mainly the HCN-channels and SCN5A, but calcium transients also participate in this. Given that cohesin plays an important role in transcriptional regulation, it is conceivable that some target genes are not correctly expressed when SGOL1 is mutated, either in time, space or quantity. Several recent studies on cohesinopathies point out that higher-order chromatin architecture organization has to be tightly regulated for normal gene expression, and I speculate that a dysfunction of SGOL1 could lead to problems with ion channel expression and thus be one of the key factors why we see this exquisite target organ specificity.

Can you say a little about the FORGE Canada consortium and how your research relates to its mission?

Care4RareThe FORGE Canada (Finding of Rare Disease Genes) was launched on April 1, 2011 and brought together clinicians from all 21 Clinical Genetics Centres representing every province, as well as clinicians from 17 countries. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. The outcome of this project was recently published in an article in AJHG. This project has a successor, Care4Rare, which is a pan-Canadian collaborative team building upon the infrastructure and discoveries of the FORGE Canada (Finding of Rare Disease Genes) project.  The goal of CARE for RARE is to improve clinical care for patients and families affected by rare diseases.  I think the great success of these projects also stems from their openness to collaborators like our group – this is the way it should be, and since my lab is working on several rare disease traits, we have benefited greatly from their help.

 

You can read the full paper here on the Nature Genetics website.