Cover puzzle
Anthony Edwards has produced an elegant representation of the genetic code in all its degenerate complexity for this month's , cover explained in Touching Base. Now you have a chance to use his device to solve a puzzle. Please post your solutions to the Nature Precedings website, or send them to me and I'll add them to this blog.
Problem: Find a ‘Gray code’ order for the codons.
When the numbers 1 to 2n are written in binary form in their natural order the number of digits, 0 or 1, that change on proceeding from one number to the next varies. There exist, however, orderings in which only a single digit changes each time and the last number only differs from the first in respect of a single digit as well. These are known as Gray codes, the numbers forming a complete cycle.
The same principle can be applied to the codon triplets, ordering all 64 in a cycle such that each differs from its predecessor in exactly one position. There are many such orders, each forming a Gray code. They differ in the extent to which they group together triplets that code for the same amino-acid.
One measure of success in forming such groups would be the number of times in the cycle that neighbouring triplets code for different amino-acids (or a stop signal). Since there are 21 of these the absolute minimum of such changes between neighbours is simply 21, but this may not be attainable.
It is easy to find a Gray code ordering with 25 changes by threading a regular route through the standard table of the genetic code. But can you find one with fewer? There’s a route through the Edwards–Venn diagram given in Figure 3 of ‘Picturing the genetic code’ (Nature Precedings doi:10.1038/npre.2007.682.1) with only 23 changes of amino-acid.
P.S. Why ARE there two groups of serine codons, anyway?

Comments
Oops. this is an old post. Anyway, the reason there are two groups of serine codons is probably for the same reason that there are two groups of Leucine (UU{A,G} and CU*) and Arginine (CG* and AG{A,G}) codons. We only notice the serine split at first glance because they're so far away from each other on the codon table, but if you change the order in which bases are listed across the sides and top of a table, then you can split the 6 leucine codons pretty far apart as well.
Position 1 for leucine (U or C) has two bases that are not very different (U somewhat similar to deaminated C), but for serine A != U, so the current setup probably arose differently. There was a theory floating around about 10 years ago that the genetic code was the merging of two genetic codes at some point in the far past, but I don't know whether any large scale genomic analysis has shown that to be the case, but it could explain why there are two serine/leucine/arginine clusters.
Posted by: Hanspeter | November 19, 2007 10:23 PM