On Wednesday, a handful of journals, including this one, released more than 30 papers describing results from the second phase of ENCODE: a consortium-driven project tasked with building the ‘ENCyclopedia Of DNA Elements’, a manual of sorts that defines and describes all the functional bits of the genome.
Many reactions to the slew of papers, their web and iPad app presentations and the news coverage that accompanied the release were favourable. But several critics have challenged some of the most prominently reported claims in the papers, the way their publication was handled and the indelicate use of the word ‘junk’ on some material promoting the research.
First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors prominently claim that the ENCODE project has thus far assigned “biochemical functions for 80% of the genome”. I had long and thorough discussions with Ewan Birney about this figure and what it actually meant, and it was clear that he was conflicted about reporting it in the paper’s abstract.
It’s a big number, to be sure. The protein-encoding portion of the genome — that which has historically been considered the most important part— represents a little more than 1%, and to imply that they found similarly important and interesting functions for another 79% is an extraordinary claim. Birney had said to me and reiterates in a Q&A-style blog post that it is also a loose interpretation of the word ‘functional’ that encompassed many categories of biochemical activity, from the very broad — such as actively producing or ‘transcribing’ RNA — to being attached to some sort of transcription-factor protein, all the way down to that narrow range of protein-encoding DNA within the 1%.
But hold on, said a number of genome experts: most of that activity isn’t particularly specific or interesting and may not have an impact on what makes a human a human (or what makes one human different from another). A blog post by Ed Yong discusses some of these critiques. It was already known, for example, that vast portions of the genome are transcribed into RNA. A small amount of that RNA encodes protein, and some serves a regulatory role, but the rest of it is chock-full of seemingly nonsensical repeats, remnants of past viruses and other weird little bits that shouldn’t serve a purpose.
The paper does drill down somewhat into what the authors mean by functional elements. And Birney does the same in his blog. Excluding all but the sites where there is very probable active binding by a regulatory protein, “we see a cumulative occupation of 8% of the genome,” he writes. Add to that the 1% of protein-encoding DNA and you get 9%.
Birney and his colleagues have estimated how complete their sampling is, and suspect that they will find another 11% of the genome with this kind of regulatory activity. That gets them to 20%. So, perhaps the main conclusion should have been that 20% of the genome in some situation can directly influence gene expression and phenotype of at least one human cell type. It’s a far cry from 80%, but a substantial increase from 1%.
Some suggest that a majority of the genome does have an active role in biological functions. John Mattick, director of the Garvan Institute of Medical Research in Sydney, Australia, who I spoke to in the run up to the publication of these papers, argued that the ENCODE authors were being far too conservative in their claims about the significance of all that transcription. “We have misunderstood the nature of genetic programming for the past 50 years,” he told me. Having long argued that non-coding RNA has a crucial role in cell regulatory functions, his gentle criticism is that “they’ve reported the elephant in the room then chosen to otherwise ignore it”.
The 80% number may not have been ideal, but it did provide a headline figure that was impressive to the mainstream media. This is at the core of a related critique against the ENCODE researchers and the journals that published their papers. By bandying about this big number, press releases on the project touted the idea that ENCODE had demolished some long-standing notion that much of the genome is ‘junk’. Michael Eisen, an evolutionary biologist at the University of California, Berkeley, said in a blog post that this pushed “a narrative about their results that is, at best, misleading.”
That narrative goes something like this: scientists long thought the genome was littered with junk, evolutionary remnants that serve no purpose, but ENCODE has shown that 80% of the genome (and possibly more to come) does serve a purpose. That narrative appeared in many media reports on the publication. Many on Twitter and in online conversations bemoaned the rehashing of a junk-DNA debate that they considered imaginary or at least long-settled. Eisen, perhaps rightfully, puts the blame on press releases that touted the supposed paradigm shift: the one from Nature Publishing Group started thus: “Far from being junk, the vast majority of our DNA participates in at least one biochemical event in at least one cell type.” Eisen says that “the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ anymore. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it.”
It is an old argument, but it’s not clear that it is a dead argument. Several researchers took issue with ENCODE’s suggestion that its wobbly 80% number in any way disproves that some DNA is junk. Larry Moran, a biochemist at the University of Toronto in Ontario argued on his blog that claims about disproving the existence of junk gives ammunition to creationists who like a tidy view of every letter in the genome having some sort of divine purpose. “This is going to make my life very complicated,” he writes.
Indeed, the papers have caught the attention of at least some creationists, and of just about everyone else. This was in part designed by the project leaders and editors, who organized a simultaneous release of the publications to maximize their impact. This was a major, time-consuming event that occupied a great deal of time from the scientists involved and from the editors at their respective journals. And the delay that this coordination caused has led to another complaint. Casey Bergman, a genome biologist at the University of Manchester, UK, tried to tally the cost of this delay on the scientific community.
Each paper sat for an average of 3.7 months after being accepted before it was published. He estimates a maximum total of 112 months — nearly 10 years — during which the scientific community was deprived of insights from these papers. “To the extent that these papers are crucial for understanding the human genome, and the consequences this knowledge has for human health, this decade lost to humanity is clearly unacceptable,” writes Bergman. Granted, the ENCODE data have been released regularly and consistently throughout the project, and anyone can access and use the data to publish, but some observers noted that not everyone was aware of ENCODE’s progress. It would have been far better, Bergman and others argue, for the papers to be released as they were accepted. A review article, perhaps along with some of the other web and mobile bells and whistles, could have rounded them up at some set point, but reserving all the papers for one big publication push was detrimental, he claims.
ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large. Similar efforts went into the coordinated publication of the first drafts of the human genome, another resource-building project, more than a decade ago. Although complaints and quibbles will probably linger for some time, the real test is whether scientists will use the data and prove ENCODE’s worth.
