Nature Chemistry | The Sceptical Chymist

They’d none of them be missed – why 20 amino acids and not 15?

Posted on behalf of Retread

Making DNA is metabolically expensive. 4 ATPs are consumed making adenine (and that’s even when you start with 5 phosphoribosyl alpha pyrophosphate – PPRP). This is why parasites living inside cells have such small genomes. As soon as they figure out a way to get the host to do their metabolic work, they jettison the (now redundant) DNA. The leprosy organism which lives inside cells sheathing nerve processes has only two-thirds the DNA of its cousin, the tuberculosis organism. There are many similar examples and not all are bacterial.

As you know, ‘the’ genetic code is made of nucleotides which come in four varieties (abbreviated A, T, G, C). There are 16 possible combinations when nucleotides are taken 2 at a time, 64 combinations taken 3 at a time. 64 combinations is clearly is overkill for just 20 amino acids. So most amino acids have multiple combinations of 3 nucleotides (called codons) which code for them – these are the synonymous codons. Two amino acids (leucine, arginine) have 6 synonymous codons, 2 have none (e.g., just one codon – methionine and tryptophan), the rest fall inbetween.

If proteins contained only 15 amino acids, you could cut genome size by one-third – that’s 4 billion or so ATPs/cell if the 3 other nucleotides are as expensive to make as adenine. As the late senator Dirksen used to say – a billion here, a billion there, pretty soon you’re talking real money (this was in a older, happier pre-bailout time).

Why 15 and not 16 amino acids? Because you need a codon to tell the machinery when to stop – such codons were known as ‘nonsense’, back in the day when all the genome was thought to do was code for protein.

Look at the side chains of the 20 amino acids with your chemist’s eye. Some are so similar as to be redundant. Glutamic acid and aspartic acid are chemically the same, differing only by a methylene group – get rid of one. Glutamine and asparagine are just the amides of the two acids (why they aren’t called glutamide and asparamide is beyond me). Get rid of one of them. Similarly threonine and serine differ only by an extra methyl group. Not only that but the several hundred different enzymes which add phosphate to them (inappropriately called kinases) don’t bother to tell them apart – get rid of one. Do we really need 4 different hydrocarbon side chains (methyl, isopropyl, sec-butyl, isobutyl)? Maddeningly sec-butyl belongs to isoleucine, and isobutyl belongs to leucine. Get rid of two of them – probably a long one and a short one. Other chemists might choose different amino acids to let go.

Removing these 5 amino acids from the total cuts the DNA required to code for them down by one-third, saving all that synthetic ATP. Of course, synonymous codons disappear in the process. Nonetheless, we should be able to build pretty decent proteins from the 15 amino acids we have left. No chemical functionality present in the original 20 has been lost.

Clearly this hasn’t happened in the real world. Just why not is probably a matter of history, and an endless source of armchair speculation (like this post). Could there be a reason for all this coding redundancy, or at least could there be mechanisms to keep it in place?

I think such mechanisms exist, but you’ll have to give up the protein-centric notion that all DNA does is code for protein. Even better, there is excellent recent hard experimental data to back this up. But that’s the subject of the next post.


  1. Report this comment

    JZ said:

    Well I guess it depends on how you view evolution. Most modern biologist I know think RNA→Protein→DNA (RNA world theory). If I were to think of the two most important amino acids they would be Arginine and Leucine the ones which have 6 codons each (Arginine is one of, if not thee major nucleic acid binding amino acids and Leucine is hydrophobic allowing proteins to pack and fold properly, also Leucine has been known to have some cool functions in DNA base flipping proteins such as UDG if I remember correctly). Evolution seems to encourage duplication of beneficial things especially in genomic space. If an organism is evolving in the RNA world one of the first types of proteins it will create are nucleic acid interacting proteins. RNA can catalyze self-replicating (ligation, cleavage) reactions but they are slow mostly on the order of 1-5/hour but the hammerhead ribozyme can perform ~60 cleavages/hour under optimal conditions, this is in contrast to restriction enzymes. According to NEB EcoRI cuts 1ug of DNA in 1 hour with 1 unit(20,000 units/mL; 20 units/ul) at 37C. I’ll skip the part were I try to calculate the number of EcoRIs in a unit but .05ul is a very small amount especially because the EcoRI is stored in glycerol which has higher density than water. If we say that our organism has a 10Kbp RNA genome (we will assume basepairs at this point because A form RNA helps protect the 2’OH from basic attack), ~310Da/base, ~6.2MDa, convert 1ug to daltons using avagadro, basically we achieve about 96774193548 cleavages per hour. So even if there were 1Million EcoRIs in our unit that would still be 96774 cleavages per hour, still 3 orders of magnitude faster to participate in replication centric functions. So I can understand why having more arginine and leucine would be beneficial to replication and thus evolution.

    Next, I know you were just being off-the-cuff but some proteins such as KaiC, which is both serine and threonine phosphorylated on adjacent residues functions by differential phosphorylation of the residues as a circadian clock regulator. KaiA functions in the ability to prevent only serine phosphorylation but encourage threonine or serine after threonine phosphorylation. Maybe cyanobacteria could have evolved a different mechanism if there was no serine or threonine?

    Finally, in your final sentence you assume DNA evolved before proteins which is most likely not the case if you subscribe to the RNA world theory. Perhaps codons have evolved over time but there does not seem to be much evolutionary data to support this based on genome sequences. Further,(non-junk junk) promoters, enhancers and regulatory elements are mostly inserted in the introns of eukaryotes and can be combinatorially made from the four base pair sequence ACTG which means the number of codons would theoretically never matter. Though some proteins do prefer an AT rich minor groove (histones).

  2. Report this comment

    retread said:

    JZ and J — thanks for the comments. The next post is relevant to some of the points you raised, but it won’t appear until at least 12 Jan ’09, so consider the following:

    Amino acids #21 (selenocysteine) and #22 (pyrrolysine) both use one of the stop codons — but not all the time or proteins would never end. What’s quite interesting, is that structure (determined as always by the sequence of nucleotides) in the 3’ end of the gene (e.g. past the stop codon) is important in determining when stop means stop and when it doesn’t. If you think about it, this means that the DNA sequence is being read in a rather different language. About this more later.

    “Most modern biologist I know think RNA→Protein→DNA (RNA world theory).” This is probably correct, but I see no way to prove or disprove this so ‘hypotheses non fingo’.

    “you assume DNA evolved before proteins” — no I don’t, but the next post will give one reason why, regardless of how things came to be the way they are, we could never go back to the two nucleotide per codon world.

    I’d like you to think about why we have so few (22) amino acids when the 3 nucleotide/codon world gives us the possibility of so many more. The next post will contain one reason, the one after that another.

  3. Report this comment

    Ambika said:

    Just to give an example for the comment about getting rid of threonine or serine since they both almost seem the same….. Here is an exampl of how biologically both the amino acids have their own importance…. The GFP-S65T mutant, with the Serine at position 65 mutated to a Threonine, also has a single red shifted excitation peak, fluoresces more intensely than wild type, and has an added advantage of acquiring fluorescence approximately four times faster than wild-type GFP.