Recently, a series of publications by members of the modENCODE consortium were released online at Science, Nature, and Genome Research. These works collectively describe a massive effort to functionally characterize and annotate the Drosophila melanogaster and Caenorhabditis elegans genomes, including in-depth analyses of genes and transcripts, epigenetic marks, transcription factor binding, and replication timing, across a range of developmental and tissue sources.
Integrated analyses of these data are described in two articles released at Science (Gerstein et al, 2010; modENCODE Consortium et al, 2010). These works provide compelling support for the existence of highly occupied target regions (HOT) regions — regions of the genomes that bind a complex mix of many transcription factors, but whose connection with gene regulation is still largely unclear — and, show that the dense epigenetic datasets can be used to segment the genomes into “chromatin states” that have distinct functional properties (see also the recent work by Filion et al, 2010)
In a related Perspective, Mark Blaxter, declares that these works have provide an important step toward the ability “to compute an organism from its genome” (Blaxter 2010). A prime example of progress toward this goal is provided by the particularly comprehensive genomic regulatory network built by the Drosophila modENCODE team, which is inferred from a combination of ChIP-based transcription factor binding, sequence motifs, epigenetic marks, and coexpression (modENCODE Consortium et al, 2010). A relatively simple linear combination of predicted regulatory inputs can predict the expression of about one quarter of the transcriptome with some accuracy. In addition, the authors find that the remaining unpredictable genes tend to have noisier expression levels, suggesting that they may be intrinsically more weakly regulated.
Blaxter M (2010) Genetics. Revealing the dark matter of the genome. Science 330:1758-9
Filion GJ, van Bemmel JG, Braunschweig U, Talhout W, Kind J, Ward LD, Brugman W, de Castro IJ, Kerkhoven RM, Bussemaker HJ, van Steensel B (2010) Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143:212-24
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K et al (2010) Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science 330:1775-1787
modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, Washietl S, Arshinoff BI, Ay F, Meyer PE, Robine N, Washington NL, Di Stefano L, Berezikov E, Brown CD et al (2010) Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science 330:1787-1797