Nature Chemistry | The Sceptical Chymist

Chemiotics: The death of the synonymous codon

Posted on behalf of Retread

For years, stretches of DNA not coding for protein were called noncoding DNA. As we came to know more about DNA, sites coding for just where the transcription of DNA to messenger RNA (mRNA) should begin, along with the DNA coding for the RNA in ribosomes were grandfathered in. Then about 30 years ago, we found that most genes coding for proteins contained large stretches of DNA not coding for amino acids at all.

Dystrophin, the defective gene causing Duchenne muscular dystrophy contains 3685 amino acids, but the gene stretches over 2.2 million contiguous positions in DNA. It only takes 11,055 positions to code 3685 amino acids. However the 11,055 occur in 79 stretches (called exons), separated by 78 much larger stretches of DNA (called introns). The whole 2.2 megaBases is transcribed into mRNA and then the introns are lopped (spliced) out by a gigantic protein and RNA machine called the spliceosome, a molecular machine even larger and more complicated than the ribosome (300 proteins, 5 RNAs [see: Science vol. 307 pp. 863-864, 2005]).

Ever since the human genome project ended, people have wondered why we have so few protein coding genes (around 20,000 at last count). The humble E. Coli contains 4300 [see: Nature vol. 385 p. 472, 1997]. Not to worry, we make lots of different proteins from the same gene, by using different combinations of exons – some exons are skipped by the spliceosome when it removes introns. Different tissues (or different states of the same tissue) skip different exons depending on (as yet obscure) conditions, so lots of different variants of the same protein are made. The process is called alternative splicing and is quite common – it happens in 92-94% of human protein genes according to a recent paper [see: Nature vol. 456 pp. 470-476, 2008 and here].

What determines which exons are left in the final product and which are skipped? This is where it gets really interesting. There exist stretches of DNA called exonic splicing enhancers (ESEs) and other stretches inhibiting the splicing in of a particular exon – the exonic splicing inhibitors (ESIs). Where are the ESEs and ESIs found? In the exons themselves.

So what? And what does this have to do with synonymous codons? The commonest genetic disease of Caucasians is cystic fibrosis (CF). Using the 12th exon of CFTR (the gene mutated in CF), when one synonymous codon was switched to another, 25% of the time it resulted in skipping of exon 12 and a defective protein [see: Proc. Natl Acad. Sci. USA vol. 102 pp. 6368-6372, 2005]. So synonymous codons aren’t synonymous at all. A completely different cellular use of synonymous codons will follow in the next post, but why should chemists be interested in any of this?

Because DNA isn’t sitting there passively waiting to be read in just one way. All sorts of new chemistry is involved. There is not enough space in this post for the next two examples, but their chemistry does not involve protein-DNA interaction.

So even if we had 15 amino acids and a stop codon to begin with (as per the last post) we could never give up that extra position and all that redundancy now. We need the coding overkill because it is being used for other things. This work also has profound implications for our understanding of protein evolution. That’s also for next time.


  1. Report this comment

    Yggdrasil said:

    Another mechanism by which synonymous mutations may affect protein function is by altering the kinetics of protein folding. Researchers have proposed that synonymous substitutions that change a codon read by an abundant tRNA to a codon read by a rare tRNA could cause the ribosome to pause at the site of the mutation, and that this pause could cause to protein to misfold and become trapped in local minima (Kimchi-Sarfaty et al. Science 2007). Although this theory is still fairly speculative, it does suggest that evolution may have had to tailor the codon usage of proteins to optimize the folding of the proteins.

    With all the new findings about splicing, non-coding RNAs, nucleosome codes, etc., the notion of our genome being filled with “junk” DNA is becoming more and more outdated. It’s amazing to think of the number of layers of information that are encoded into a simple heteropolymer consisting of just four different monomers.

  2. Report this comment

    Scott said:

    This is an article written on a web page. It’s only taking up virtual space. Why is there not “…not enough space in this post for the next two examples…”?

    [Editor’s note: because we try not to have posts that we consider to be too long]

  3. Report this comment

    retread said:

    If you found this post and those preceding and following it a bit cryptic and condensed, a very nice leisurely article with lots of good illustrations concerning synonymous codons and their different usages is to be found in the June ‘09 Scientific American. The authors don’t discuss microRNAs, but they do come up with yet another wrinkle — folding of the mRNA made from DNA and how synonymous codons affect this.