Tiny bubbles

With at least the provisional success of genome-wide association studies to identify common disease-related variants now apparent, and highly significant P values floating up out of figures looking like nothing so much as a series of champagne flutes full to the brim, it’s interesting to take a look back at a paper that arguably provided the key impetus for the field. In their 1996 Science paper “The Future of Genetic Studies of Complex Human Diseases” (cited 2,231 times as of this writing), Neil Risch and Kathleen Merikangas asked:

“Has the genetic study of complex disorders reached its limits?”

Remarkably, this was the key question only 11 years ago. Of course, Risch & Merikangas were referring to the specific question of whether linkage studies would be adequate to detect variants of modest effect in the realm of complex disease. In his address upon receiving the Curt Stern award at the 2004 meeting of the American Society of Human Genetics, Risch described how the article came about:

“One colleague I worked with extensively, both in teaching and research, was Kathleen Merikangas, a psychiatric epidemiologist with interests in genetics. We spoke frequently about the state of the field of genetic epidemiology and where it was and should be going. We continued these discussions even after my move to Stanford in 1995. We began to develop an awareness that the linkage approach, although having some modest success in complex diseases, was unlikely to identify the large majority of genes. We were influenced by a news item that appeared in Science on July 14, 1995, entitled “Epidemiology Faces Its Limits”….Although the article did not discuss human genetics or genetic epidemiology, we realized that many of the comments could apply to the developing situation in human genetics as well”.

Interestingly, the author of that Science news article, Gary Taubes, is back in The New York Times magazine, with another piece pouring cold water on the field of epidemiology.

In any case, after penning a draft entitled “Human Genetics Facing Its Limits”, they realized that a more optimistic slant would be required, preferably one that offered an alternative approach. Again, from the Stern address:

“If we could have any tool to use for mapping disease genes, we wondered what would it be? Again, on the basis of my experience with HLA-associated diseases and my knowledge about disease associations with other blood-group systems, I knew that many of these associations, although highly significant statistically, would not produce substantial or robust linkage signals. Therefore, why not reverse the process of positional cloning? Instead of searching randomly through the genome by location, why not start with genetic variants and test them directly as candidates? The problem with candidate-gene association studies had been the limited number of candidates and, therefore, the low prior probability of a ‘hit’. But what if we could compile a list of all polymorphisms in the human genome?”

You know the rest. What’s remarkable about the Risch & Merikangas paper, beyond the power calculations showing that the relative gain in power for association studies as opposed to linkage, was the authors’ prescience in outlining the key issues. They noted the stringent genome-wide significance level that would be required for testing on the order of 1 million variants, while also pointing out the likelihood that linkage disequilibrium would allow this number to be reduced substantially. They also implored investigators to preserve all of their samples for future large-scale testing, and it could be argued that the collection of samples is now the rate-limiting step in association studies. Concluding, they wrote:

“Thus, the primary limitation of genome-wide association tests is not a statistical one but a technological one. A large number of genes (up to 100,000) and polymorphisms…must first be identified, and an extremely large number of polymorphisms will need to be tested”.

And finally:

“The human genome project can have more than one reward. In addition to sequencing the entire human genome, it can lead to identification of polymorphisms for all the genes in the human genome and the diseases to which they contribute”.

This is a reminder to those of us who, in the wake of so many robust associations, thought this was considered to be the reward from the very beginning. It wasn’t always obvious, it seems.

Cover puzzle

Anthony Edwards has produced an elegant representation of the genetic code in all its degenerate complexity for this month’s , cover explained in Touching Base. Now you have a chance to use his device to solve a puzzle. Please post your solutions to the Nature Precedings website, or send them to me and I’ll add them to this blog.

Problem: Find a ‘Gray code’ order for the codons.

When the numbers 1 to 2n are written in binary form in their natural order the number of digits, 0 or 1, that change on proceeding from one number to the next varies. There exist, however, orderings in which only a single digit changes each time and the last number only differs from the first in respect of a single digit as well. These are known as Gray codes, the numbers forming a complete cycle.

The same principle can be applied to the codon triplets, ordering all 64 in a cycle such that each differs from its predecessor in exactly one position. There are many such orders, each forming a Gray code. They differ in the extent to which they group together triplets that code for the same amino-acid.

One measure of success in forming such groups would be the number of times in the cycle that neighbouring triplets code for different amino-acids (or a stop signal). Since there are 21 of these the absolute minimum of such changes between neighbours is simply 21, but this may not be attainable.

It is easy to find a Gray code ordering with 25 changes by threading a regular route through the standard table of the genetic code. But can you find one with fewer? There’s a route through the Edwards–Venn diagram given in Figure 3 of ‘Picturing the genetic code’ (Nature Precedings doi:10.1038/npre.2007.682.1) with only 23 changes of amino-acid.

P.S. Why ARE there two groups of serine codons, anyway?