When 23andMe offered a few select clients the opportunity to have the protein-encoding portion of their genome sequenced, Gabe Rudy jumped at the chance. On Wednesday, he walked strangers through the results. His conclusion: most detected genetic “variants of interest” are either not variants or not interesting. “Clinics beware,” he writes in a blog post detailing the analysis.
The standard service offered by 23andMe (based in Mountain View, California) does not sequence people’s DNA but instead probes for common variants, then lists these variants with an analysis of health, ancestry and other information, such as whether you carry a variant more often found in people who find that cilantro tastes soapy.
The exome sequence contained no such information, says Rudy; it was simply a list of ‘variant calls’ or differences that had been found between the sequenced individual and the reference genome. There are several research software pipelines available to call variants. 23andMe used what is probably the most popular one, which is available from the Broad Institute in Cambridge, Massachusetts.
An executive at DNA analysis company Golden Helix, Rudy was much better prepared than most to tackle this list.
He took the files he received for himself (as well as for his wife and son) and poured them into his own company’s software: the SNP and Variation Suite (SVS) and a freely available visualization and inspection tool called GenomeBrowse. Next, he began to assess the evidence behind his 151,000 variant calls and put them in their biological context.
Whereas whole-genome sequences cover all the DNA on all the chromosomes, exomes focus on the 2% or so of the genome that contains genes. Exome sequencing aims to provide data for all protein-encoding genes, but only about three-quarters of genetic regions are profiled with enough accuracy for variants to be called confidently in a “research grade” (30×) exome. Even with “clinical grade” exomes, in which each DNA fragment is sampled 80 times or more, 5–15% of variants will still not be called variants. And those ‘low-coverage’ regions vary with each exome. As a result, Rudy had variants that had been called in his genome that he couldn’t compare with those in his wife and son.
The raw data contained about 151,000 variants, but that number dropped to 80,000 when he pulled out problematic variants. These were generally variants with too few reads (meaning not enough data had been collected for certainty) or variants with far too many reads, indicating that they came from highly duplicated regions and so could not be reliably tied to particular genes or even chromosomes.
Then he excluded variants that were quite common. (Not only are these unlikely to have catastrophic impacts on health, but about 17,000 are already interrogated by 23andMe’s publicly available, more interpretive services designed to detect common variation). And then he excluded variants not expected to change the protein sequence, such variants in introns — regions within genes that do not encode proteins — as well as ‘synonymous’ variants that encode the same protein with different sequences. (Such variants might have an effect on cells, but we don’t know enough to figure out what it is.)
That left him with nearly 1,700 variants that were likely to affect the protein. Of these, about 197 were predicted to be of the ‘loss-of-function’ type, meaning that the protein would either not be made at all or would be made in a radically different way.
With this list in hand, he turned to a database called OMIM (Online Mendelian Inheritance in Man), which catalogues genes associated with diseases. That left him with 40 variants. Then he focused on genes in which both copies contained a variant, because most genetic diseases require problems in both genes to have a real impact. Now his list had whittled down to 16 — a mere 0.01% of the total.
Now was the time to re-assess the evidence that each variant existed. For some variants, there were no reads supporting the variant call. The bad call came down to a bug in the variant-calling software. All in all, five reads were bad calls when the evidence was examined. Four variants were actually common, but in a way that was hard to detect. And three were also found in his wife — a healthy, unrelated individual. Those three variants are probably common and will show up in population catalogues, such as the 1000 Genomes data, as catalogues improve.
That left three variants that seemed to be real and also seemed to be in genes. Rudy has no idea what two of them mean, but one variant has been classified as pathogenic. Diseases associated with this gene include a fatal neonatal form and a late-onset form implicated with toxic build-up of metabolites in dietary protein.
To be sure, the variants that were whittled away hold medically relevant clues, but scientists have yet to piece them together. And Rudy’s detective work with the 16 variants he deemed most interesting showed how quickly intriguing signals can evaporate.
Although very similar approaches have helped to find the causes of rare, genetic diseases, we are still some way from helping healthy people routinely make medical sense of their genome, says Rudy. What a clinician needs to advise patients is very different from what the most-available tools offer. Finding and assessing variants is “still largely a research field, and tools are built by researchers who are trying to push the limits of their data,” he warns.
But even if few people are able to make sense of their genome sequences, many more people will soon be able to obtain them, whether or not 23andMe rolls out its exome service to more people. Late last week, a company called Gene By Gene announced a direct-to-consumer sequencing service. It will sequence exomes for US$695 and whole genomes for $5,495.
The company will explicitly not provide any interpretation services, says president Bennett Greenspan. The reasoning is that providing medical advice could run afoul of the US Food and Drug Administration’s regulations. Although genomicists say that few physicians know how to deal with genetic information, Greenspan bets that there are some hospitals and physicians without access to sequencing who are willing to give it a try. “We’re not going to be the guys who will turn up our nose because a doctor wants to look at something and has three samples.”
But if experience is any guide, the quest to figure out what all the letters mean will be more difficult than finding the letters in the first place.
[Note: This post has been edited to clarify definition of research-grade exomes and low-coverage regions. It also replaces a generic image with one from Rudy’s analysis.]