Nature Chemistry | The Sceptical Chymist

Chemiotics: Sherlock Holmes and the Green Fluorescent Protein

Posted on behalf of Retread

Gregory (Scotland Yard): “Is there any other point to which you would wish to draw my attention?”

Holmes: “To the curious incident of the dog in the night-time.”

Gregory: “The dog did nothing in the night-time.”

Holmes: “That was the curious incident.”

The chromophore of green fluorescent protein (GFP) is para-hydroxybenzylidene imidazolinone. It is formed by cyclization of a serine (#65) tyrosine (#66) glycine (#67) sequential tripeptide. It is found in the center of a beta barrel formed by the 238 amino acids of GFP.

What is so curious about this?

Simply put, why don’t things like this happen all the time? Perhaps nothing quite this fancy, but on a more plebeian level consider this: of the twenty amino acids, 2 are carboxylic acids, 2 are amides, 1 is an amine, 3 are alcohols and one is a thiol. One might expect esters, amides, thioesters and sulfides to be formed deep inside proteins. Why deep inside? On the surface of the protein, there is water at 55 molar around to hydrolyze them purely by the law of mass action (releasing about 10 kJ/Avogadro’s number per bond in the process). Some water is present in the X-ray crystallographic structure of proteins, but nothing this concentrated.

The presence of 55 M water bathing the protein surface leads to an even more curious incident, namely why proteins exist at all given that amide hydrolysis is exothermic (as well as entropically favorable). Perhaps this is why proteins contain so many alpha helices and beta sheets — as well as functioning as structural elements they may also serve to hide the amides from water by hydrogen bonding them to each other. Along this line, could this be why the hydrophilic side chains of proteins (arginine, lysine, the acids and the amides) are rather bulky? Perhaps they also function to sterically shield the adjacent amides. After all, why should lysine have 4 methylene groups rather than just one or two?

Now the serine-tyrosine-glycine tripeptide should occur by chance once in every 8000 tripeptides. The SwissProt database of proteins contains 144,041,553 amino acids in 399,749 proteins as of 14 October. Does this tripeptide occur 18,805 times in the database as it should? If it doesn’t, is negative selection preventing it? If it does occur this often, have we missed other chromophores? Are there other tripeptides missing from SwissProt? If there are, does this tell us how to build other chromophores? Or does it tell us something important about protein structure?

I don’t have the skills to properly interrogate SwissProt or the Protein Data Bank, but I imagine that some of the readership does. Go to it. These are curious incidents indeed.

Comments

  1. Report this comment

    Adrian Vazquez said:

    Well, negative selection does occur when you take into account protein folding, even though the combinatory prowess of proteins is exhilarating, there are just a few (a dozen or so) families of supersecondary structures or folding patterns. And it seems like natural selection just adjusted the functionality of these superfamilies to whatever it needed. By being evolutively restricted by these folding patterns, the odds for finding the tripeptide whithin these could be drastically reduced. However, the design of novel types of folding (i.e.- never found in nature, check out David Baker’s papers, they’re worth it) will be bringing new, never-seen-in-nature chromophores which could have an even bigger quantum yields, or even different emission wavelengths. Now that’s what I call exciting!