(Not-quite-free) associations

Earlier today, The National Human Genome Research Institute and the National Institute of Environmental Health Sciences announced two new efforts aimed at identifying variants that underlie common disease. We will no doubt hear much more about them in coming months.

The two initiatives are the Genes and Environment Initiative (GEI) and the Genetic Association Information Network (GAIN). The GEI has a genotyping component and a technology development program aimed at generating new ways to monitor personal environmental exposures and their interactions with different genetic backgrounds. Several dozen common diseases are to be studied. Its proposed budget for fiscal year 2007 is $68 million.

GAIN will be managed by the non-profit Foundation for the NIH as a public-private partnership to carry out whole genome association studies using at least 375,000 SNPs for each of five common diseases. SNPs will be chosen from phase II HapMap data to include as many as possible from the Affymetrix 500K chip with r2 > 0.8, with a minor allele frequency > 0.05. The particular diseases and study populations will be determined through a peer review process. The private partners include Pfizer, which will contribute $15 million toward the cost of initial genotyping, to be carried out by Perlegen Sciences beginning in late summer, 2006 (Pfizer and Perlegen are already collaborating on such studies). Affymetrix will apparently be making a similar contribution. There is no indication that non-SNP-related variation (copy number or structural variation) will be assayed.

Hyperbolic language aside (“We stand on the threshold of creating a future that will revolutionize the practice of medicine…”) the promising structure of the program, and the fact that dedicated funds will be allocated during a time of a stagnant NIH budget overall, are both grounds for cautious optimism that robust associations will be found. It is a concern that no mention is made of potential overlap with whole genome association (WGA) studies that are already underway all over the map (that low, distant rumble you hear if you listen carefully enough). But on the other hand, a certain amount of overlap in the form of replicated associations in different populations is no doubt ideal.

Of particular interest are the data analysis procedures in place, given that assessing statistical robustness in whole genome association (WGA) studies remains almost as much an art as a science. Each PI, and an analyst they designate, will be required to attend a workshop on analysis of WGA data, which will open to other members of the community. As stated on the GAIN website:

The goals of the workshop will be to review current experience with WGA studies completed or in progress to:1) identify particularly powerful designs and methods of analysis for WGA studies; 2) identify potential pitfalls in design and analysis of WGA studies and ways of avoiding them; 3) propose new approaches to WGA data analysis, and design or analysis modifications needed to implement them; and 4) identify needs for methodological development in design and analysis of WGA data.

And, equally important:

A key role for the Analysis Working Group will be to suggest analyses to be conducted across multiple projects, as appropriate, and common approaches to analyses within each project as needed to enhance comparability.

We, and others, (and many others), have been grappling with this very issue. Having seen more than a few disagreements between experts over the appropriate statistical standards to apply, one suspects this meeting of the minds may be painfully difficult.

But, alas, no pain, no GAIN.

Leave a Reply

Your email address will not be published. Required fields are marked *