From the archives (1995): Guidelines for interpreting and reporting linkage results

NG1995In 1995, Nature Genetics published a report by Eric Lander and Leonid Kruglyak, recommending clear statistical guidelines for reporting linkage results for complex traits. The paper had an immediate impact, setting the bar for what could or could not be called “significant” in the literature. Although originally focused on human genetic linkage studies, the guidelines set forth by Lander & Kruglyak influenced fields from model organism genetics to plant genetics, and eventually genome-wide association studies (GWAS).

The mid-1990’s was a very exciting time in genetics. The human genome project had recently been announced and advances like microsatellite linkage maps of the human genome and multiplex sequencing technology were now available. Mapping genes underlying complex phenotypes was now a real possibility, and human geneticists were busy prospecting for genetic gold. However, as Lander & Kruglyak cautioned in their paper, the lack of clear guidelines could foster a spate a false positive reports that would, if left unchecked, discredit a the nascent field (for example, see this 1993 paper in Nature Genetics finding no evidence for a previously-reported linkage region for manic depressive illness).

On the other hand, setting too high a bar for reporting significance would mean missing many true signals where they exist, an equally dangerous proposition for a new field. As explained in the paper, “striking the right balance requires both a mathematical understanding of how positive results will occur just by chance and a value judgment about the relative costs of false positives and false negatives.” The paper then outlines the mathematical and statistical arguments in favor of the standards we now all know and love.

Capture

{credit}Lander & Kruglyak, Nature Genetics 1995{/credit}

I spoke with Leonid Kruglyak, co-author of this landmark paper, to get a sense of the context in which this paper came about, and the impact it had on the field at the time of publication. He first explained that it was finally possible to conduct genome-wide linkage studies with hundreds of individuals, allowing linkage mapping methods to be applied to complex traits (for example, this genome-wide screen for schizophrenia susceptibility genes published in the same issue). However, unlike Mendelian genes, there was no clue as to “how many signals there should be, or what their expected sizes were.” Thus, the need for a statistical framework.

This need was recognized as well by the Journal. As Prof Kruglyak recalls, Kevin Davies (founding editor of Nature Genetics) originally commissioned this work as a News & Views article, but it then evolved into a more extensive piece as its implications became clear. However, as he remembers, there was still a very strict deadline for the paper as it had to make the next issue (and these were still the days of hard-copy submissions). At the time, Prof Kruglyak was a young postdoc, so it fell to him to rush to the main FedEx office in downtown Boston before closing time, to make sure the manuscript got to the printer on time.

Prior to submitting the final text, Lander & Kruglyak produced some of the “original preprints”, sending a copy of the paper by snail mail or email to “everyone we knew in statistical genetics”, for comments and suggestions. After all, these guidelines would affect quite a lot of people and “signals that people would like to be results might not be real results anymore”.

Presentation1

{credit}Curtis, Nature Genetics 1996{/credit}

Following publication, “the reactions came in essentially two flavors,” Prof Kruglyak recalls. There were those who thanked the authors, saying that someone really needed to do this. Others were less enthused. “They said, ‘you’re standing in the way of progress and making it harder to publish.’” In fact, Nature Genetics published two letters to the editor arguing that the proposed genome-wide significance threshold was too strict, or that at the very least additional discussion was warranted before these guidelines were adopted (see the letters here and here, and the authors’ reply here). Personally, I agree with the overall sentiment of Lander & Kruglyak as summed up in this portion of their reply: “The correspondents (all trained statisticians) argue that there is no need for guidelines because everyone should be able to interpret the genomewide significance of pointwise P values on their own. In our view, this is naïve. Most geneticists are not statisticians, and rules of thumb can be extremely helpful in promoting sensible discussion.”

The legacy of this paper is clear to anyone familiar with GWAS. “The GWAS community learned a lot from that whole experience [of false positive linkage reports],” says Prof Kruglyak. “There were many serious statistical geneticists involved [in the GWAS field] from the beginning, with a lot of carryover from the linkage era to the GWAS era.”

“Guidelines are not just ‘external gatekeepers’”, he noted.  They are not just there to tell you what you can and can’t publish. “You know what they say, the easiest person to fool is yourself.” These guidelines were developed to help researchers understand their own findings better and decide which are worth following up. “You can often make up a plausible story, but how strong is the evidence?”

The genetic syntax of febrile seizures

The genetics of seizure disorders, including epilepsy, has recently come into the spotlight (see the Nature Outlook on epilepsy). Epilepsy is a complex disease with many different subtypes, both sporadic and familial. While epilepsy is one of the most common neurological disorders, and it has been studied for a very long time, the underlying mechanisms of seizure disorders remain largely elusive. Identifying the genetic causes of different subtypes of the disorder can help to illuminate the gene networks involved and lead to a deeper understanding overall. Importantly, the genetic tools now exist to identify causal mutations for the many different subtypes of seizure disorders.

Febrile seizures, which are induced by fever, affect approximately 2-4% of children worldwide. This type of epileptic seizure is often triggered by infectious disease, but there is strong evidence that it has a genetic basis. A paper recently published in Nature Genetics by Bjarke Feenstra identified two genes associated with vaccine-induced febrile seizures (vaccines, such as MMR, are an extremely rare cause of febrile seizure).

Protein model for STX1B

Protein model for STX1B{credit}Wikipedia{/credit}

Now, a study by Holger Lerche, Camila Esguerra and colleagues identifies variants in the gene STX1B as causing a familial form of febrile seizure disorder. STX1B encodes a protein called syntaxin-1B. Syntaxin-1 is a key component of a protein complex necessary for the release of neurotransmitters from the presynaptic membrane.

The authors first identified two families in Germany with a history of febrile seizures. They used a combination of whole-exome and whole-genome sequencing to identify the gene most likely to harbor pathogenic mutations causing the disorder. Targeted sequencing in an extended cohort identified further variants in STX1B in patients who had experienced febrile seizures.

To validate these findings, the authors tested the function of stx1b in zebrafish, and showed that a reduction in syntaxin-1B led to behavioral defects in the fish, such as lack of touch response, fin fluttering and jerking movements. Recordings of brain activity confirmed that the fish were experiencing epilepsy-like symptoms. You can read a more in-depth summary of the paper in a blog post at Beyond the Ion Channel by one of the study’s co-authors. 

We asked one of the study’s senior authors, Holger Lerche, to tell us a little more about the background of this study:

How did you initially become interested in studying seizure disorders?

I was working during my thesis with mutated ion channels in rare muscle diseases. When I started with my Neurology training, epilepsy emerged as a highly interesting topic in that field as well, and also clinically I became very interested in epilepsy.

How did the two families in this study first come to your attention?

The index case of the first family was referred to me during a cooperation with the Children’s Hospital (at that time at the University of Ulm), when I was looking for familial cases with epilepsy for genetic studies. When I called his grandmother, it turned out to be a large pedigree further increasing when contacting and visiting the different branches of the families. The second family was referred to my colleague Yvonne Weber for similar reasons from another Children’s Hospital in Germany.

STX1B mutations have been associated with other forms of epilepsy. How does the association with febrile seizures further the understanding of this gene’s function?

The function of this gene has been explored very well already by Nobel Laureate Thomas Südhof and his group. The mutations we detected may teach us more about the functional role of different protein domains and their interaction with other proteins in the vesicle release machinery. It is not surprising that mutations in STX1B cause epilepsy, but how febrile seizures develop is still an enigma. Follow-up studies of our discovery may shed light on the unknown temperature-sensitive mechanisms leading to febrile seizures.

Do you think there is the potential for developing drugs targeting STX1B in these patients?

The question is how the loss of function of one allele of STX1B could be compensated. If targeting STX1B to enhance its production or activity is possible, and if this may help these patients, is difficult to predict. However, the zebrafish model can also help us to find therapies which work in a completely different way to compensate for STX1B failure (see answer to next question).

Can you say a little about why you chose zebrafish as a model, and what you learned from this model organism that you wouldn’t have been able to learn otherwise?

We started only recently to collaborate with Camila Esguerra and Alex Crawford who have the zebrafish facilities and expertise. It is a vertebrate, easy to study and very quick to manipulate (much quicker and easier than mice).

Behavioral assays (left) and electrographic recordings of zebrafish brain (right)

Behavioral assays (left) and electrographic recordings of zebrafish brain (right){credit}Courtesy of Alex Crawford and Camila Esguerra{/credit}

To establish a cellular model for functional proof of these mutations would have been more difficult in our case. And the zebrafish is an in vivo model, so we can study behavior and EEG, which is not possible in a cellular assay. Also the temperature effect could be studied very nicely with an effect on EEG in an in vivo system.Last but not least, and most important when thinking of the impact of our work: zebrafish models can be used to find new drugs in medium to high throughput screens using seizure-like behaviour or EEG as read-outs. This allows us to find different kinds of drugs that are able to antagonize the consequences of the STX1B defect on a system-wide level.

Read the full study by Lerche and colleagues here. You can also read more about this work here [press release]. 

Discovery of a gene for heart and gut rhythms

heartbeatWhat do your heart and gut have in common? More than you might think. A new study by Gregor Andelfinger and colleagues has found that a single gene, SGOL1 (Shugoshin-like 1), is required for the normal rhythms of both the heart and intestine.

The study’s co-authors found 17 patients with dysrhythmias of both the heart and intestine, termed sick sinus syndrome (SSS) and  Chronic intestinal pseudo-obstruction (CIPO), respectively. SSS is a term for a type of cardiac arrhythmia. Though it’s very rare in children or young adults, it is more common in the elderly and generally requires the patient to have a pacemaker implanted. CIPO occurs when the intestines stop their usual rhythmic pulses, and food can no longer pass through the digestive tract on its own. Both conditions are extremely rare as inherited disorders, so finding both disorders in these 17 patients was a truly remarkable discovery.

All affected patients in the study shared the same homozygous variant, which resulted in changing a lysine to a glutamic acid at a conserved residue. The new syndrome was named Chronic Atrial and Intestinal Dysrhythmia (CAID).

We asked one of the study’s lead authors, Gregor Andelfinger at Sainte-Justine University Hospital Research Center in Montreal, to tell us a little more about the work:

How did you become involved in studying CAID?

Map of Canada (New France) in North America 1703

Map of Canada (New France) in North America 1703{credit}Wikipedia{/credit}

We have an excellent collaboration across our provincial biobank for congenital heart disease in Québec and exchange regularly among colleagues. We now have more than 3,000 deeply phenotyped participants in our biobank—both affected and unaffected family members—and when my colleagues told me about an unusual co-occurrence of SSS and CIPO in a couple of cases, we quickly fanned out and a side project suddenly got to center stage in the lab. We were surprised to see how many patients we found in relatively short time for a previously undescribed disease. Obviously, we would be very eager to learn from other groups whether they have encountered similar rare patients, and would love to cooperate! Let’s not forget that this type of research always has a human face, and this is what motivates our group in the first place.

What would you say was the most unexpected aspect of this research? 

Everything in this project was unexpected! On the clinical side, the emergence of a generalized automaticity disorder in humans was totally unanticipated. On the molecular side, one of the biggest surprises certainly was how wrong we all were with our thoughts on what could be the causal gene. Virtually all members in the lab placed their bets on ion channels, a priori the most likely suspects. As you know, we were all proven wrong and had to go back to rethink how this disease arises. We were again surprised how a completely new picture emerged when we finally put all the pieces of the puzzle together—from genetics, populations and cell biology to disease.

How does the finding of SGOL1 mutations in these rare cases help inform the biology of CIPO and SSS more generally?

When doing my literature search, I was very surprised that one of the discoverers of the sinus node [the heart’s pacemaker tissue], Arthur Keith, had already drawn parallels between cardiac and gut pacemaking in an article in 1915 [PDF]. The recent literature suggests a role for TGF-β signaling as a driver for fibrosis in channelopathies and arrhythmias, and obviously this could very well be an important pathway through which a progressive destruction of pacemaking tissues takes place (for example, see papers here and here). Remember that we can clearly show that all patients in our series were normal at birth and developed disease only at later stages. On the other hand, we also have evidence that some ‘developmental anomalies’ are present in CAID patients, since the malformed gut pacemaking system probably was present from birth on, with initially normal function. I think that we are dealing with an overlap of developmental and acquired phenotypes, and that a similar process takes place in isolated SSS and CIPO, even if we could not detect SGOL1 mutations in the isolated forms of disease. Beyond this, I think the monogenic nature of the CAID phenotype tells us that all pacemaker cells need the cohesin complex. I would not be surprised if we found at least two non-canonical roles for SGOL1 in the future, one driving the developmental, and the other one driving the acquired part of disease, and that these disease pathways are at least partially shared in isolated SSS and CIPO. ‘Shugoshin’ means ‘guardian spirit’ in Japanese, so this is a very apt name for functions of this gene beyond its known function of protecting sister chromatids

What do cardiac and intestinal pacemakers have in common, and what could make them particularly vulnerable to mutations in a cohesin complex member?

First, they are both relatively small organs. An adult sinus node is approximately 15 x 5 x 1.5 mm long, probably not more than 50,000 cells. Second, both organs are non-uniform and comprise different cellular subtypes, and third, they have to be in a very particular place to efficiently perform their function. Fourth, and very importantly, cells in both organs are capable of automaticity. What could the cohesin complex have to do with these commonalities of different pacemakers in the human body? For the known functions of cohesin, in particular cell division, I speculate that a defect could directly influence how many cells will be available to form a certain organ. However, apart from the smaller myenteric plexuses we found in CAID patients, we do not have direct experimental evidence for this. Of course, this could also affect subpopulations within these organs, the second organ property I alluded to above. Ageing and loss of cells over time may also come into play in this intricate balance.

I am at a loss to come up with a valid hypothesis how a dysfunction of the cohesin complex would lead to the misplaced myenteric plexuses we found in CAID patients. As far as the fourth commonality between cardiac and intestinal pacemakers is concerned, we know that automaticity is mainly generated due to spontaneous depolarizations. The channels responsible for this phenomenon are mainly the HCN-channels and SCN5A, but calcium transients also participate in this. Given that cohesin plays an important role in transcriptional regulation, it is conceivable that some target genes are not correctly expressed when SGOL1 is mutated, either in time, space or quantity. Several recent studies on cohesinopathies point out that higher-order chromatin architecture organization has to be tightly regulated for normal gene expression, and I speculate that a dysfunction of SGOL1 could lead to problems with ion channel expression and thus be one of the key factors why we see this exquisite target organ specificity.

Can you say a little about the FORGE Canada consortium and how your research relates to its mission?

Care4RareThe FORGE Canada (Finding of Rare Disease Genes) was launched on April 1, 2011 and brought together clinicians from all 21 Clinical Genetics Centres representing every province, as well as clinicians from 17 countries. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. The outcome of this project was recently published in an article in AJHG. This project has a successor, Care4Rare, which is a pan-Canadian collaborative team building upon the infrastructure and discoveries of the FORGE Canada (Finding of Rare Disease Genes) project.  The goal of CARE for RARE is to improve clinical care for patients and families affected by rare diseases.  I think the great success of these projects also stems from their openness to collaborators like our group – this is the way it should be, and since my lab is working on several rare disease traits, we have benefited greatly from their help.

 

You can read the full paper here on the Nature Genetics website.