Nature Chemistry | The Sceptical Chymist

Chemiotics: How many proteins can we make?

Posted on behalf of Retread

The mass of the earth is given by my physics book (Halliday 6th Ed.) as 6 × 10^27 grams. If we made just one molecule of each protein containing n amino acids linked together, when would we run out of material? Make a guess. I found the results surprising.

Assume the earth is made of nothing but hydrogen, oxygen, nitrogen, carbon and sulfur. Clearly not true, but we’re going for what mathematicians call an upper bound. If mathematicians can get away with things like “consider a spherical cow” I can get away with this. (The cognoscenti may wish to go for a least upper bound). Proteins are linear chains of 20 different amino acids ranging in mass from glycine at 79 Daltons to tryptophan at 204. When linked together by an amide (peptide) bond, 18 Daltons of mass is lost (water is split out). So figure the average amino acid at 100 Daltons (roughly).

So there are 20 × 20 = 400 distinct proteins of 2 amino acids, 8000 with 3, 160,000 with 4, 3,200,000 with just 5. Shorties like this are called peptides (or polypeptides) and just when you start calling them proteins seems to be a matter of taste.

We’re figuring the mass of the typical amino acid at 100 Daltons, but a Dalton doesn’t have much mass. It is 1/12 the mass of a single atom of carbon-12, Avogadro’s number (about 6 × 10^23) of which have a mass of 12 grams. So one Dalton has a mass of 10^-24 grams (roughly).

The number of distinct proteins containing n amino acids is 20^n. The mass of each protein (in Daltons) is (roughly) 100 x n — depending on the amino acids chosen. The mass of the collection of distinct proteins of length n in grams is (20^n) x (100 x n) x (10^-24). It’s clear that we’re over 1 gram for the collection at only 24 amino acids (as 20^24 is much larger than 10^-24. How far over? 2^24 × 100 × 24 = 40,265,318,400 = 4 × 10^10 grams.

As noted, the mass of the earth is 6 × 10^27 grams. So we’re not too far away at 24 amino acids. Certainly no farther away than another 17 amino acids as 20^17 is much greater than 10^17.

So, the mass of the earth (which isn’t all carbon, hydrogen, etc… ) isn’t enough to make just one molecule of each of the possible proteins 41 amino acids long. 41 amino acids is a very small protein (some would call it a polypeptide). Just about every protein of biological interest is much larger. The champ is a muscle protein called titin which has 27,000+ amino acids.

So what? It means that chemists will never be able to explore more than a tiny morsel of the space of possible proteins. Perhaps computationally we will (I doubt it), but that’s the subject of a future post.


  1. Report this comment

    Retread said:

    20^17/10^17 = 2^17 = 131,172, which is about 1/3 of the mass of the sun/mass of the earth ratio.

  2. Report this comment

    Wavefunction said:

    Neat calculation and thought. But how many of all the potential protein folds that are out there would actually be functional? Computationally we may be able to explore novel folds, but determining their function would still be a real challenge. Interestingly even this can be potentially tackled computationally. For example:

    “Structure-based activity prediction for an enzyme of unknown function” by Shoichet et al.

    Nature 448, 775 – 779 (16 Aug 2007), doi: 10.1038/nature05981

  3. Report this comment

    retread said:

    Biologic functionality is exactly what I’m driving at. To be functional biochemically, a protein would need to have one or just a few shapes (only a few potential energy minima and well separated from the rest in energy) and it should reach these shapes quickly after synthesis. Clearly such proteins exist or we wouldn’t be here. How often this happens in the space of all proteins is the subject of the next post.

  4. Report this comment

    Param Priya Singh said:

    Really Good! However this may not be true. Because the situation which has been discussed is only valid if all possible polypeptides are made- all at once. But in biological reality it may not be the case. What if the sequence space has been explored (by nature) gradually during millions of years? In that case at a particular instance not all, but a limited (but still very large) subset is being explored and is being evolved under the selective pressure.

  5. Report this comment

    Retread said:

    Param — thanks for your comments.

    Consider the following: Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different 41 amino acid proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.

    As per your suggestion, this is making one protein at a time. However, even if the hapless post-doc was able to use the entire mass of the earth (6 × 10^27 grams) every nanosecond to make a different set of proteins (one molecule of each), he would never have made all the possibilities for a protein of length of one chain of hemoglobin (141 or 146 amino acids) since time began. Hemoglobin just isn’t that big as proteins go (the gene mutated in cystic fibrosis has well over 1000).