News blog

Fighting about ENCODE and junk

A red junk at Tsim Sha Tsui

Alfonso Jimenez and Flickr

On Wednesday, a handful of journals, including this one, released more than 30 papers describing results from the second phase of ENCODE: a consortium-driven project tasked with building the ‘ENCyclopedia Of DNA Elements’, a manual of sorts that defines and describes all the functional bits of the genome.

Many reactions to the slew of papers, their web and iPad app presentations and the news coverage that accompanied the release were favourable. But several critics have challenged some of the most prominently reported claims in the papers, the way their publication was handled and the indelicate use of the word ‘junk’ on some material promoting the research.

First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors prominently claim that the ENCODE project has thus far assigned “biochemical functions for 80% of the genome”. I had long and thorough discussions with Ewan Birney about this figure and what it actually meant, and it was clear that he was conflicted about reporting it in the paper’s abstract.

It’s a big number, to be sure. The protein-encoding portion of the genome — that which has historically been considered the most important part— represents a little more than 1%, and to imply that they found similarly important and interesting functions for another 79% is an extraordinary claim. Birney had said to me and reiterates in a Q&A-style blog post that it is also a loose interpretation of the word ‘functional’ that encompassed many categories of biochemical activity, from the very broad — such as actively producing or ‘transcribing’ RNA — to being attached to some sort of transcription-factor protein, all the way down to that narrow range of protein-encoding DNA within the 1%.

But hold on, said a number of genome experts: most of that activity isn’t particularly specific or interesting and may not have an impact on what makes a human a human (or what makes one human different from another). A blog post by Ed Yong discusses some of these critiques.  It was already known, for example, that vast portions of the genome are transcribed into RNA.  A small amount of that RNA encodes protein, and some serves a regulatory role, but the rest of it is chock-full of seemingly nonsensical repeats, remnants of past viruses and other weird little bits that shouldn’t serve a purpose.

The paper does drill down somewhat into what the authors mean by functional elements. And Birney does the same in his blog. Excluding all but the sites where there is very probable active binding by a regulatory protein, “we see a cumulative occupation of 8% of the genome,” he writes. Add to that the 1% of protein-encoding DNA and you get 9%.

Birney and his colleagues have estimated how complete their sampling is, and suspect that they will find another 11% of the genome with this kind of regulatory activity.  That gets them to 20%. So, perhaps the main conclusion should have been that 20% of the genome in some situation can directly influence gene expression and phenotype of at least one human cell type. It’s a far cry from 80%, but a substantial increase from 1%.

Some suggest that a majority of the genome does have an active role in biological functions. John Mattick, director of the Garvan Institute of Medical Research in Sydney, Australia, who I spoke to in the run up to the publication of these papers, argued that the ENCODE authors were being far too conservative in their claims about the significance of all that transcription. “We have misunderstood the nature of genetic programming for the past 50 years,” he told me. Having long argued that non-coding RNA has a crucial role in cell regulatory functions, his gentle criticism is that “they’ve reported the elephant in the room then chosen to otherwise ignore it”.

The 80% number may not have been ideal, but it did provide a headline figure that was impressive to the mainstream media. This is at the core of a related critique against the ENCODE researchers and the journals that published their papers. By bandying about this big number, press releases on the project touted the idea that ENCODE had demolished some long-standing notion that much of the genome is ‘junk’. Michael Eisen, an evolutionary biologist at the University of California, Berkeley, said in a blog post that this pushed “a narrative about their results that is, at best, misleading.”

That narrative goes something like this: scientists long thought the genome was littered with junk, evolutionary remnants that serve no purpose, but ENCODE has shown that 80% of the genome (and possibly more to come) does serve a purpose. That narrative appeared in many media reports on the publication. Many on Twitter and in online conversations bemoaned the rehashing of a junk-DNA debate that they considered imaginary or at least long-settled. Eisen, perhaps rightfully, puts the blame on press releases that touted the supposed paradigm shift:  the one from Nature Publishing Group started thus: “Far from being junk, the vast majority of our DNA participates in at least one biochemical event in at least one cell type.” Eisen says that “the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ anymore. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it.”

It is an old argument, but it’s not clear that it is a dead argument. Several researchers took issue with ENCODE’s suggestion that its wobbly 80% number in any way disproves that some DNA is junk.  Larry Moran, a biochemist at the University of Toronto in Ontario argued on his blog that claims about disproving the existence of junk gives ammunition to creationists who like a tidy view of every letter in the genome having some sort of divine purpose. “This is going to make my life very complicated,” he writes.

Indeed, the papers have caught the attention of at least some creationists, and of just about everyone else. This was in part designed by the project leaders and editors, who organized a simultaneous release of the publications to maximize their impact. This was a major, time-consuming event that occupied a great deal of time from the scientists involved and from the editors at their respective journals. And the delay that this coordination caused has led to another complaint. Casey Bergman, a genome biologist at the University of Manchester, UK, tried to tally the cost of this delay on the scientific community.

Each paper sat for an average of 3.7 months after being accepted before it was published. He estimates a maximum total of 112 months — nearly 10 years — during which the scientific community was deprived of insights from these papers. “To the extent that these papers are crucial for understanding the human genome, and the consequences this knowledge has for human health, this decade lost to humanity is clearly unacceptable,” writes Bergman. Granted, the ENCODE data have been released regularly and consistently throughout the project, and anyone can access and use the data to publish, but some observers noted that not everyone was aware of ENCODE’s progress. It would have been far better, Bergman and others argue, for the papers to be released as they were accepted. A review article, perhaps along with some of the other web and mobile bells and whistles, could have rounded them up at some set point, but reserving all the papers for one big publication push was detrimental, he claims.

ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large. Similar efforts went into the coordinated publication of the first drafts of the human genome, another resource-building project, more than a decade ago. Although complaints and quibbles will probably linger for some time, the real test is whether scientists will use the data and prove ENCODE’s worth.



  1. Report this comment

    Jim Woodgett said:

    The splurge of ENCODE papers has certainly gotten a lot of people questioning the role of PR in scientific releases. On one hand the embargo created a greater gush of data (and served the authors and the journals hosting the papers), on the other, would a trickle of papers have achieved similar impact? Clearly, no, although digestion would have been a lot easier and more considered. What is less forgivable though is the over simplification or distillation of the message of 30 significant papers into “debunking junk DNA”. There’s a lot more to the datasets (and to the visualization tools) that that rather vacuous bottom-line. It may have opened the conversation but how many in the media probed deeper than that catch phrase? There are so many other stories to tell in this collection.

  2. Report this comment

    Michael White said:

    John Mattick’s view is an extreme minority view, a fringe view. You’ll find very few genome scientists who, in print or in person at conferences, would support his claim that ‘we’ve misunderstood the nature of genetic programming for 50 years.’ Ewan Birney, in his blog post, provides a viewpoint closer to the consensus.

    And yet, if you read the front page of the NY Times, or the relevant stories in just about every other major media outlet, you got the fringe view. That’s why many of us were so upset.

  3. Report this comment

    Martin Hafner said:

    Larry Moran’s complaint that ENCODE’s and Ewan Birney’s statement that 80% may be functional will feed all types of creationists was a minor concern. His foremost argument is that ENCODE ignores well established biological knowledge. E.g., the presence junk DNA is compatible with the observed mutation rates and with the pedigrees of our species and its relatives. It fits to the well characterized genetics and evolution of repetitive elements and doesn’t contradict the C-value paradox. Unfortunately, the above post doesn’t refer to the later. The big difference of genome sizes of closely related species is in stark contrast to ENCODE’s 80% value for functional DNA in the human genome. Hopefully, Ewan Birney’s twitter reply to the challenge of T. Ryan Gregory’s “onion test” was tongue-in-cheek or does he really believe that C-value differences are due to polyploidy? Dr. Moran is also criticizing that ENCODES frivolously communicated 80% number is based on a very loose definition of functional DNA. He further points out that it has been known that non-coding DNA is not the same as junk DNA decades before ENCODE published this notion. Isn’t checking the methodology and discussing the presented conclusions in the light of older research and current knowledge exactly what ENCODE authors would expect from their peers?

  4. Report this comment

    Robert Mullen said:

    Casey Bergman’s claim that the embargo deprived science of the papers for nearly 10 years is not sound. He arrives at the number by adding the separate delays of each paper. This amounts to multiplying the number of papers by the average delay. By his metric, if the same information had been spread over twice as many papers the delay would have been 20 years, if spread over half as many papers it would have been five years. This is an untrustworthy method. In fact no paper was delayed for 10 years nor was a single researcher deprived of a result for 10 years. Let’s speak of the average or maximum delay but let’s not pretend we have lost 10 years of progress.

  5. Report this comment

    Hilary Butler said:

    The issue is much simpler than just “creationism”. A close read of medical history, shows that science is always based on assumption, not knowledge. The problem is that scientists, assume that their knowledge is the sum total of their assumptions.

    There are still surgeons out there who believe that tonsils, adenoids and the appendix, are vestigial organs with no practical use, and that we can get by just as well without them.

    Likewise, immunologists have an extremely limited view of the immune system and the knowledge of the innate immune system is about as limited as Cook’s knowledge abour New Zealand, when he first set foot on the shore.

    The average person who has read medical history extensively, can see clearly that the knowledge base is not only highly limited, but fragmentary. Further, they can also see that what is assumed to be fact about what is known, can be overturned in the future when what they didn’t know in the past, makes a monkey of what they know now.

    It was very interesting reading all the linked blogs, because the same flaw came through in most of them – the assumption that what we know now is accurate, and won’t be overturned by what is found in ten years time. Just assimilated into or added to. The problem is that doesn’t take into account, that what isn’t known, might actually radically alter the function of, and implications to the host, of what was previously assumed.

    Just look at the microbiome, and what is now being discovered about the havoc being wreaked on it by antibiotics, whereas in the past doctors told people that antibiotics would just nuke the bad bug and leave everything else alone. Yet again, it’s plainly obvious to the ordinary person with accurate observation habits, that that assumption was never correct. Perhaps that’s why it’s so hard for scientists to come to terms with the lay-explosion of a multitude of probiotic foods over the last 25 years, (by people without degrees) to try to rectify what science has denied for so long, – – – until recently.

    Commonsense (at least to thinkers) leads to the observation that nothing in the body is redundant, or without function.

    Any problem with creationists, is solely the fault of scientists, who in their writing project the sense that they don’t quite understand that their own evolutionary limits – the black holes they don’t understand -, could land up competely altering the few stars they describe, from viewing in their telescopes.

    These problems faced in that regard, is solely of their own making, as David Horribin very accurately described in his book called “Science is God”. Rather than blaming creationists, look to the fact that scientific denial has created consequences.

    Scientists have made science into their god, with constant pontifical utterances, even if they are hedged with qualification words, like “seems”, or “appears” etc. Yet anyone who reads medical literature knows that the conclusions states are often anything but equivocal. They can easily trace that behavioural pattern in medical journals from about 1850. It’s often telling that anything that isn’t accorded the status of paradigm, has to be published as “opinion” – if the author is lucky. A scientist I once knew, described it as “the arrogance of ignorance”, which can be seen in many grandiose pronouncements, which are more often fallacy not fact.

    One day, scientists might also have to admit that what was announced recently by the Encode Project, was also a fallacy.

    Or perhaps they have a more delicate word for it, which escapes the average reader to isn’t trying to contort information to fit it into a current paradigm while avoiding upsetting the apple cart… so perhaps David Bohm and F. David Peats’ old book, “Science, Order and Creativity” should be put on the compulsory reading list for all scientists – to be read, every year.

  6. Report this comment

    Katerina Fomicheva said:

    Cracking the Maya Code was not trivial and started move forward only when Yuri Knorozov made a suggestion about nature of decipherment. Generally speaking same phonetic sounds may be encoded by different hieroglyphs. For me only way to understand what is encoded by “junk” to accept similar principle of multicomponent regulations of gene status. It would lead to heritable epimutations of neighbouring genes through influencing their transcriptional states. Good example is coming from knowledge that some drugs or treatment would activate different genes due to diversity of ENCODED “junk” in genomes of two strain of mice BL6 vs CD1.

  7. Report this comment

    David Ewing said:

    Was there a similar clamor when the term “Junk DNA” was coined? That assumption is looking more and more like Junk Science. It’s amusing to see the threat to such an obviously dubious assumption causing outrage among those so comfortable in their hubris.

Comments are closed.