Chemiotics: The death of the synonymous codon

Posted on behalf of Retread

For years, stretches of DNA not coding for protein were called noncoding DNA. As we came to know more about DNA, sites coding for just where the transcription of DNA to messenger RNA (mRNA) should begin, along with the DNA coding for the RNA in ribosomes were grandfathered in. Then about 30 years ago, we found that most genes coding for proteins contained large stretches of DNA not coding for amino acids at all.

Dystrophin, the defective gene causing Duchenne muscular dystrophy contains 3685 amino acids, but the gene stretches over 2.2 million contiguous positions in DNA. It only takes 11,055 positions to code 3685 amino acids. However the 11,055 occur in 79 stretches (called exons), separated by 78 much larger stretches of DNA (called introns). The whole 2.2 megaBases is transcribed into mRNA and then the introns are lopped (spliced) out by a gigantic protein and RNA machine called the spliceosome, a molecular machine even larger and more complicated than the ribosome (300 proteins, 5 RNAs [see: Science vol. 307 pp. 863-864, 2005]).

Ever since the human genome project ended, people have wondered why we have so few protein coding genes (around 20,000 at last count). The humble E. Coli contains 4300 [see: Nature vol. 385 p. 472, 1997]. Not to worry, we make lots of different proteins from the same gene, by using different combinations of exons – some exons are skipped by the spliceosome when it removes introns. Different tissues (or different states of the same tissue) skip different exons depending on (as yet obscure) conditions, so lots of different variants of the same protein are made. The process is called alternative splicing and is quite common – it happens in 92-94% of human protein genes according to a recent paper [see: Nature vol. 456 pp. 470-476, 2008 and here].

What determines which exons are left in the final product and which are skipped? This is where it gets really interesting. There exist stretches of DNA called exonic splicing enhancers (ESEs) and other stretches inhibiting the splicing in of a particular exon – the exonic splicing inhibitors (ESIs). Where are the ESEs and ESIs found? In the exons themselves.

So what? And what does this have to do with synonymous codons? The commonest genetic disease of Caucasians is cystic fibrosis (CF). Using the 12th exon of CFTR (the gene mutated in CF), when one synonymous codon was switched to another, 25% of the time it resulted in skipping of exon 12 and a defective protein [see: Proc. Natl Acad. Sci. USA vol. 102 pp. 6368-6372, 2005]. So synonymous codons aren’t synonymous at all. A completely different cellular use of synonymous codons will follow in the next post, but why should chemists be interested in any of this?

Because DNA isn’t sitting there passively waiting to be read in just one way. All sorts of new chemistry is involved. There is not enough space in this post for the next two examples, but their chemistry does not involve protein-DNA interaction.

So even if we had 15 amino acids and a stop codon to begin with (as per the last post) we could never give up that extra position and all that redundancy now. We need the coding overkill because it is being used for other things. This work also has profound implications for our understanding of protein evolution. That’s also for next time.

They’d none of them be missed – why 20 amino acids and not 15?

Posted on behalf of Retread

Making DNA is metabolically expensive. 4 ATPs are consumed making adenine (and that’s even when you start with 5 phosphoribosyl alpha pyrophosphate – PPRP). This is why parasites living inside cells have such small genomes. As soon as they figure out a way to get the host to do their metabolic work, they jettison the (now redundant) DNA. The leprosy organism which lives inside cells sheathing nerve processes has only two-thirds the DNA of its cousin, the tuberculosis organism. There are many similar examples and not all are bacterial.

As you know, ‘the’ genetic code is made of nucleotides which come in four varieties (abbreviated A, T, G, C). There are 16 possible combinations when nucleotides are taken 2 at a time, 64 combinations taken 3 at a time. 64 combinations is clearly is overkill for just 20 amino acids. So most amino acids have multiple combinations of 3 nucleotides (called codons) which code for them – these are the synonymous codons. Two amino acids (leucine, arginine) have 6 synonymous codons, 2 have none (e.g., just one codon – methionine and tryptophan), the rest fall inbetween.

If proteins contained only 15 amino acids, you could cut genome size by one-third – that’s 4 billion or so ATPs/cell if the 3 other nucleotides are as expensive to make as adenine. As the late senator Dirksen used to say – a billion here, a billion there, pretty soon you’re talking real money (this was in a older, happier pre-bailout time).

Why 15 and not 16 amino acids? Because you need a codon to tell the machinery when to stop – such codons were known as ‘nonsense’, back in the day when all the genome was thought to do was code for protein.

Look at the side chains of the 20 amino acids with your chemist’s eye. Some are so similar as to be redundant. Glutamic acid and aspartic acid are chemically the same, differing only by a methylene group – get rid of one. Glutamine and asparagine are just the amides of the two acids (why they aren’t called glutamide and asparamide is beyond me). Get rid of one of them. Similarly threonine and serine differ only by an extra methyl group. Not only that but the several hundred different enzymes which add phosphate to them (inappropriately called kinases) don’t bother to tell them apart – get rid of one. Do we really need 4 different hydrocarbon side chains (methyl, isopropyl, sec-butyl, isobutyl)? Maddeningly sec-butyl belongs to isoleucine, and isobutyl belongs to leucine. Get rid of two of them – probably a long one and a short one. Other chemists might choose different amino acids to let go.

Removing these 5 amino acids from the total cuts the DNA required to code for them down by one-third, saving all that synthetic ATP. Of course, synonymous codons disappear in the process. Nonetheless, we should be able to build pretty decent proteins from the 15 amino acids we have left. No chemical functionality present in the original 20 has been lost.

Clearly this hasn’t happened in the real world. Just why not is probably a matter of history, and an endless source of armchair speculation (like this post). Could there be a reason for all this coding redundancy, or at least could there be mechanisms to keep it in place?

I think such mechanisms exist, but you’ll have to give up the protein-centric notion that all DNA does is code for protein. Even better, there is excellent recent hard experimental data to back this up. But that’s the subject of the next post.

Sugar Daddy: Like sleeping with an elephant

Posted on behalf of Sugar Daddy, with a nod to Andy’s recent post

As a fifth-year chemical biology graduate student, I sometimes wonder if I’ll know when I’ve been in grad school too long. Maybe I’ll want to finish that last project, or start something anew to pass along to a new student. Maybe a personal life decision is playing a factor in my wanting to leave now or stay longer. Maybe I look around at group meeting and realize that free pizza once per week isn’t as great as it used to be, partially because I know what everyone in the group is working on and am slightly less interested in it than I used to be. Maybe I read every paper in my field with such a critical eye that it all seems boring now when it was so exciting only a few years ago.

But sometimes you need something more direct, like a kick in the face, a surefire sign that it’s time to pack up the pipets, file away the round bottom flasks, and start looking for greener pastures in some other field of science. Last week, I think I got that sign: VWR and Fisher simultaneously told me that the world is out of acetonitrile. Yup, that’s right. If I ever need a sign to graduate, it’s that the world has run out of one of the two solvents that I use on the HPLC. (Given that the other solvent is water, I guess if I had to pick which one I’d rather run dry… I guess the situation could be worse.)

The story there is somewhat interesting. I’ll write what I’ve heard, and please write any comments if what I’m saying is rubbish or not. Basically, the most economically viable way — and currently the only way — that acetonitrile is produced is as by-product of acrylonitrile production. Acrylonitrile is a monomer that finds its way into nylon, acrylic, plastics, and all sorts of products; it is a much more important product in the global marketplace than pitiful little acetonitrile, the by-product of acrylonitrile production.

So, acetonitrile supplies are tied to the laws of supply and demand in the acrylonitrile market. Given the global economic situation, building construction projects and the general production of goods — that is, things that rely on products made ultimately from acrylonitrile — are all way down. Therefore, demand for acrylonitrile is down; the price of acrylonitrile has plummeted in the last few months, and production is drying up. Unfortunately for us chemists, the demand for acetonitrile, the bastard step-child of acrylonitrile production, has remained relatively constant, because HPLCs still need to run even if Lehman Brothers has closed up shop and GM isn’t far behind.

Wikipedia claims that the situation was caused by a shutdown of acrylonitrile production in China last summer because of the Olympics and damage to a plant in Galveston, Texas due to Hurricane Ike, but as I understand it, these are medium-sized blips that are only exacerbating a larger market situation.

Acetonitrile shortages have happened before and will happen again. Like the famously flamboyant former Canadian prime minister, Pierre Trudeau, once said about Canada’s proximity to and geopolitical entanglement with the United States, “Living next to you is in some ways like sleeping with an elephant. No matter how friendly and even-tempered is the beast, if I can call it that, one is affected by every twitch and grunt.”

So, basically, we’re not in a very good place until someone can figure out how to make acetonitrile independent of acrylonitrile production in a way that is economically viable. Then we can finally kick the elephant out of our bed, allow the acetonitrile market to be regulated by its very own market forces, and maybe keep me from interviewing for postdocs and writing up my thesis.

Hmm… perhaps a good puzzle for a lazy Thanksgiving afternoon? Thanks for the idea, SD! Catherine

Chemiotics: Sherlock Holmes and the Green Fluorescent Protein

Posted on behalf of Retread

Gregory (Scotland Yard): “Is there any other point to which you would wish to draw my attention?”

Holmes: “To the curious incident of the dog in the night-time.”

Gregory: “The dog did nothing in the night-time.”

Holmes: “That was the curious incident.”

The chromophore of green fluorescent protein (GFP) is para-hydroxybenzylidene imidazolinone. It is formed by cyclization of a serine (#65) tyrosine (#66) glycine (#67) sequential tripeptide. It is found in the center of a beta barrel formed by the 238 amino acids of GFP.

What is so curious about this?

Simply put, why don’t things like this happen all the time? Perhaps nothing quite this fancy, but on a more plebeian level consider this: of the twenty amino acids, 2 are carboxylic acids, 2 are amides, 1 is an amine, 3 are alcohols and one is a thiol. One might expect esters, amides, thioesters and sulfides to be formed deep inside proteins. Why deep inside? On the surface of the protein, there is water at 55 molar around to hydrolyze them purely by the law of mass action (releasing about 10 kJ/Avogadro’s number per bond in the process). Some water is present in the X-ray crystallographic structure of proteins, but nothing this concentrated.

The presence of 55 M water bathing the protein surface leads to an even more curious incident, namely why proteins exist at all given that amide hydrolysis is exothermic (as well as entropically favorable). Perhaps this is why proteins contain so many alpha helices and beta sheets — as well as functioning as structural elements they may also serve to hide the amides from water by hydrogen bonding them to each other. Along this line, could this be why the hydrophilic side chains of proteins (arginine, lysine, the acids and the amides) are rather bulky? Perhaps they also function to sterically shield the adjacent amides. After all, why should lysine have 4 methylene groups rather than just one or two?

Now the serine-tyrosine-glycine tripeptide should occur by chance once in every 8000 tripeptides. The SwissProt database of proteins contains 144,041,553 amino acids in 399,749 proteins as of 14 October. Does this tripeptide occur 18,805 times in the database as it should? If it doesn’t, is negative selection preventing it? If it does occur this often, have we missed other chromophores? Are there other tripeptides missing from SwissProt? If there are, does this tell us how to build other chromophores? Or does it tell us something important about protein structure?

I don’t have the skills to properly interrogate SwissProt or the Protein Data Bank, but I imagine that some of the readership does. Go to it. These are curious incidents indeed.

Chemiotics: Auditing P-Chem

Posted on behalf of Retread

Why would an ex-organic chemist, retired MD do that? The P-chem you need for organic chemistry is pretty simple. You can look at most reactions and figure the overall entropy and enthalpy, and we get pretty good at figuring out delta-deltaG and manipulating it to get reactions to go the way we want.

Well, the answer is because nearly all the really interesting questions in cellular biology involve physical chemistry. Look back at the post of 20 March where throwing a growth factor at a cell resulted in a two fold change in phosphorylation in 924 of 6,600 phosphorylatable sites in 2,244 different proteins. We have some 478 enzymes (called kinases) to accomplish this reaction. Why so many? Because most kinases have a limited number of substrates. Studying the phosphorylation reaction itself (e.g. the classic chemistry) tells you very little. What determines which kinase associates with which substrate? That’s exactly where physical chemistry comes in. The association of one protein with another doesn’t involve covalent (or even ionic) bonds. It’s mostly van der Waals and hydrogen bonding, along with solvent effects. Pure P-Chem.

Non(classical chemical) bonding protein association is crucial in the normal life of the cell (and sometimes in its death). Consider the mediator complex. It is required for the molecular machines which transcribe DNA into RNA (the three RNA polymerases) to actually do their work. Depending on the organism, the mediator complex has between 20 and 30 proteins and a mass of 1-2 megaDaltons. Also, RNA polymerase II itself isn’t just one protein, but 12 (in yeast) with a mass of 500 kiloDaltons — again held together by noncovalent interactions.

A personal reason for studying P-Chem is the protein folding problem, where nary a covalent bond is formed. I’d certainly like to get up to speed to read the literature and find out if the ‘potential energy funnel’ is more than a fancy way to say that (biologic) proteins fold into their final shape quickly. As docs, we do this all the time. Consider the diagnosis of idiopathic thrombocytopenic purpura. Impressive, n’est-ce pas? However, all it means is that you are bleeding because you don’t have enough platelets (a type of blood component) and we don’t know why.

We’ve already been through the 3 laws of thermodynamics, the second introduced by Carnot’s brilliant analysis of the changes in state of an ideal gas as it went around his cycle, and his discovery (better construction) of the concept of entropy. Even after nearly 200 years, the power of his thought is impressive. I doubt that most of you have the time, but you will be similarly impressed with the stunning power of Darwin’s mind if you read “The Origin of Species”. All of you have more background (just by inhaling the zeitgeist) than he did. If you really have a lot of time, read “Darwin’s Ghost” by Steve Jones along with Darwin. Jones updates "The Origin .. " to 2000 chapter by chapter. Although Jones is an excellent writer, Darwin wins each chapter hands down.

Finally, the course is being given at the local state university. It’s very gratifying to see that state universities continue to function as the giant engines of social mobility that they were for my parents’ generation, educating immigrants and the children of immigrants. The present crop of students isn’t predominantly from eastern and southern Europe as my father’s class was at Rutgers 80 years ago. But immigrants they are, and 3 of the students I’ve spoken to were born in Nigeria, Haiti and Poland.

Chemiotics: Apologies to Borodin

Posted on behalf of Retread

Can you picture yourself spending a week with a group of people who can’t tell an Angstrom from arugula, some of whom are wary of all “chemicals”. Many highly analytic types (mathematicians, computer scientists, physicists, electrical engineers and even chemists) do just that and enjoy it immensely. I speak of adult amateur chamber music festivals (or ‘band camp for adults’ as one of my friend’s grandkids calls them). After 35 years of them, I only met the 5th chemist this year. They are vastly outnumbered by the other analytics, particularly mathematicians and physicists.

Participants are highly educated for the most part, but the most talented cellist this year was a moving-company man who hauls furniture around for a living, and I still remember playing with a marvellous 300-pound violist years ago who was a jail matron.

If you were an aspiring organic chemist in the early 60s, the bible was “Mechanism and Structure in Organic Chemistry” by Edwin S. Gould, a physical chemist amazingly enough. He also happens to be an excellent violinist and I had the pleasure of playing with him a few years ago. He’s still active in research although he received his PhD from UCLA in 1950. Who says chemicals are toxic!?

Occasionally the two cultures do clash, and a polymer chemist friend is driven to distraction by a gentle soul who is quite certain that “chemicals” are a very bad thing. For the most part, everyone gets along. Despite the very different mindsets, all of us became very interested in music early on, long before any academic or life choices were made.

So, are the analytic types soulless automatons producing mechanically perfect music which is emotionally dead? Are the touchy-feely types sloppy technically and histrionic musically? A double-blind study would be possible, but I think both groups play pretty much the same (less well than we’d all like, but with the same spirit and love of music).

I wonder why chemists are so outnumbered in this group? It’s been downhill ever since Alexander Borodin. Perhaps a larger sample is needed. Any thoughts?

Chemiotics: Unrequired reading

Posted on behalf of Retread

If you look back at your notes on thermodynamics, you are likely to find a blizzard of partial derivatives, state functions and total differentials. As an organic chemist, I had an intuitive understanding of the thermodynamics I needed at the molecular level (actually it’s pretty simple), but the math and the big ideas were not friends. Should you be in the same boat and wish to get the big picture, have a look at “Four Laws that Drive the Universe” by Peter Atkins. It’s 124 small pages, written extremely well and bounces back and forth between the macroscopic and the microscopic illuminating each by the other. If there is a derivative to be found, I missed it.

The book may produce in you physics envy (with apologies to Freud). On p. 45 you will find a discussion of Noether’s theorem, which states that under all the conservation laws of physics lies a symmetry. The first law (conservation of energy) is really about the symmetry of time flow — e.g., “time flows steadily, it does not bunch up and run faster then spread out and run slowly.” Chemistry just doesn’t have statements of such majesty (or strangeness).

If you liked Atkins you’ll love “Boltzmann’s Atom” by David Lindley. It concerns Boltzmann’s trials and tribulations as he developed statistical mechanics. As a neurologist I doubt that they drove him to suicide at 62 (he sounds pretty loosely wrapped throughout his life). Boltzmann’s big opponent was Ernst Mach, who didn’t see the need for atoms as an explanatory device. Mach’s view was that physics should establish laws tying observable phenomena together — e.g., the ideal gas law etc, etc… Postulating something you couldn’t see to explain something you could, was not considered science (by Mach and his followers). Pretty strange to our way of thinking today, but these were the events of just over 100 years ago.

However, vestiges of Mach’s thinking linger on in the Copenhagen interpretation of quantum mechanics. As junior chemistry majors in the 50s we had to read “The Logic of Modern Physics” written by P. W. Bridgeman in 1927. It was our introduction to quantum mechanics (as none of us had the math to tackle it). All you could hope to predict by a theory were ‘numbers on a dial’. Going deeper, by hoping for a trajectory explaining things was a no no (the nodes in atomic and molecular orbitals pretty much rule out trajectories don’t they?). The book drove us nuts at the time, as chemistry back then was firmly on the macroscopic side of the quantum mechanical divide.

Gibbs and Maxwell make their appearance in Lindley’s book, as does the culture and politics of Austria-Hungary before WWI, so there is some breathing room for the reader. One of the founders of physical chemistry, Wilhelm Ostwald, also appears. He doesn’t come off too well — he was enamored of something called energetics, which to Boltzmann (and to Lindley who is a physicist) meant that he really didn’t understand physics very well.

Atoms were finally accepted after Einstein’s work on Brownian motion in 1905 (also described). Parenthetically, there was a similar controversy ending about the same time, as to whether the brain was made of cells, and whether individual neurons existed, or whether the whole brain was a big gemish of nuclei and fibers.

Chemiotics: A chemical double entendre

Posted on behalf of Retread

A few chemists who were both literary and literal recently looked fairly silly on the pages of the New York Times and in the blogosphere. You can read all about it on Michelle’s Francl’s blog “”https://cultureofchemistry.blogspot.com/“>Culture of Chemistry”. See her post of 1 May ‘08 — “”https://cultureofchemistry.blogspot.com/2008/05/you-pronounce-unionized-as-un-ionized.html">How to tell if you’re really a chemist." To make a long story short, 3 chemically impossible organic molecules (5 bonds to carbon etc., etc…) spelled out the word SEX in a review of a book about (what else?) sex. The chemists missed the semantic forest while closely inspecting the chemical trees.

Is there anything inside the cell being read chemically two different ways? Yes there is, and it has implications for how we determine what in the genome is being worked on by natural selection and what is being left alone. If intron, exon, neutral selection and synonymous and nonsynonymous codons aren’t old friends, have a look at the first comment which will give you all the background you need (which is quite a bit).

People attempt to measure the rate of natural selection acting on proteins using synonymous and nonsynonymous codons in the same protein in different organisms (say hemoglobin for example). Positive selection is measured as the rate of nonsynonymous nucleotide substitution (Ka) per nonsynonymous site, relative to the underlying ‘neutral mutation rate’, which is given by the rate of synonymous substitution per synonymous site (Ks). Usually Ka is much less than Ks (as most new mutations aren’t helpful or are actually harmful — this is negative selection). Positive selection is implied by Ka/Ks greater than 1. However, strictly by chance, the ratio of nonsynonymous (Ka) to synonymous (Ks) amino acid substitutions is 2:1.

All very nice, but ESSs and ESEs are found in exons, and mutations of them will change alternate splicing (something a functioning cell has a great interest in). It’s easy to see how changing one nucleotide in an ESS or an ESE could render it more or less effective, while leaving the amino acid sequence of the underlying protein unchanged. In short, the ‘neutral mutation rate’ may in fact not be neutral at all (if it is in an ESE or an ESS). Or possibly switching one amino acid for another has nothing whatever to with the protein and everything to do with controlling alternate splicing.

Now, chemists are adept at doing all sorts of different things with the same structure. Think what organic chemists can do with a carbonyl group. But whatever they do is over and done with. In protein-coding genes, the same sequence can mean two different things without being chemically changed at all.

We are far from understanding all the things DNA can do in a cell. Less than 2% of our 3.2 gigabases of DNA codes for exons. Calling the 98% of the genome not doing so ‘junk’ is a vestige of the protein-centric era of molecular biology, just as calling changing one synonymous codon for another neutral. Both assume that the only thing that DNA does is code for protein.

The expressive power of language lies in its ambiguity not its precision. DNA may be similar as we uncover the languages it speaks. My guess is that there are more to be found.

Sugar Daddy: The importance of being… there

Posted on behalf of Sugar Daddy

There seems to be this mindset among scientists, particularly chemists, that what we do is noble and somehow above the fray. Perhaps it comes from our training as graduate students. We live lives often completely removed from the world around us. We have friends who go home at 5 pm, cook dinner every night, watch TV programs, write books, do crossword puzzles, or other “normal” things. These people don’t “take” the whole weekend off; it is naturally given to them, an unalienable right of living in the “real world”. We are in a research lab and when we leave for brief periods, we don’t leave our work behind. Now that’s not necessarily a bad thing, but every now and then — going to get your driver’s license renewed, or “taking” a day off to go to some tourist site because a family member or friend is in town — we cross paths with the real world around us.

Bubble, meet daylight.

Long hours and an physico-emotional connection to our work are probably two of the most hackneyed topics amongst graduate students in science. But why is it that way? The obvious answer is ambition, a state of mind that isn’t unique to aspiring young scientists but can be applied to aspiring people in any walk of life — lawyers, politicians, chefs, artists, sports players, etc. But there is something unique about science. Many of us have a sense of elitism, that what we do really is that important, so noble, and there’s this sense of urgency that we can never put it down for fear of being overtaken. And that feeling, I think, contributes to a sense that we really shouldn’t be doing much else at all with our time. Do you think that? I know some graduate students do, and I’m curious as to where this feeling comes from: ourselves, our advisors, who are typically the ones who have risen to the top (one particularly cynical comment to a previous post comes to mind), our work environment, or other influences entirely?

Chemiotics: A chemical gedanken experiment

Posted on behalf of Retread

In the early days of quantum mechanics Einstein and Bohr threw thought experiments (gedanken experiments) at each other like teenagers throwing firecrackers. None were thought possible at the time, although thanks to Bell and Aspect, quantum nonlocality and entanglement now have a solid experimental basis.

Two Chemiotics posts ago there appeared the following: “I doubt that most strings of amino acids have a dominant shape (e.g., biological meaning), and even if they did, they couldn’t find it quickly enough (the Levinthal paradox again).”

How would you prove me wrong? The same way you’d prove a pair of dice was loaded. Just make (using solid-phase protein synthesis) a bunch of random strings of amino acids (say 41 amino acids long) and see how many have a dominant shape. If one crystallizes it does, if not, use NMR to look at them in solution. You can’t make all of them, because the earth doesn’t have enough mass to do so (see “How many proteins can we make?” a few posts back). That’s why this is a gedanken experiment — it can’t possibly be performed in toto.

Even so, the experiment is over (and I’m wrong) if even 1% of the proteins you make have a dominant shape.

However, choosing a random string of amino acids is far from trivial. Some amino acids appear more frequently than others depending on the protein. Proteins are definitely not a random collection of amino acids. Consider collagen. In its various forms (there are over 20, coded for by at least 30 distinct genes) collagen accounts for 25% of body protein. Statistically, each of the 20 amino acids should account for 5% of the protein, yet one amino acid (glycine) accounts for 30% and proline another 15%. Even knowing this, the statistical chance of producing 300 copies in a row of glycine–any amino acid–any amino acid by random distribution of the glycines are less than zilch. But one type of bovine collagen protein has >300 such copies in its 1042 amino acids.

One further example. If you were picking out a series of letters randomly hoping to form a word, you would not expect a series of 10 ‘a’s to show up. But we normally contain many such proteins, and for some reason too many copies of the repeated amino acid produce some of the neurological diseases I (ineffectually) battled as a physician. Normal people have 11 to 34 glutamines in a row in a huge (molecular mass 384 kiloDaltons — that’s over 3000 amino acids) protein known as huntingtin. In those unfortunate individuals with Huntington’s chorea, the number of repeats expands to over 40. One of Max Perutz’s last papers [Proc. Natl. Acad. Sci. USA 99, 5591–5595 (2002)] tried to figure out why this was so harmful.

On to the actual experiment. Suppose you had made 1,000,000 distinct random sequence proteins containing 41 amino acids and none of them had a dominant shape. This proves/disproves nothing. 10^6 is fewer than the possibilities inherent in a string of 5 amino acids, and you’ve only explored 10^6/(20^41) of the possibilities.

Would Karl Popper, philosopher of science, even allow the question of how commonly proteins have a dominant shape to be called scientific? Much of what I know about Popper comes from a fascinating book “Wittgenstein’s Poker” and it isn’t pleasant. Questions not resolvable by experiment fall outside Popper’s canon of questions scientific. The gedanken experiment described can resolve the question one way, but not the other. In this respect it’s like the halting problem in computer science (there is no general rule to tell if a program will terminate).

Would Ludwig Wittgenstein, uberphilosopher, think the question philosophical? Probably not. His major work “Tractatus Logico-Philosophicus” concludes with “What we cannot speak of we must pass over in silence”. While he’s the uberphilosopher he’s also the antiscientist. It’s exactly what we don’t know which leads to the juiciest speculation and most creative experiments in any field of science. That’s what I loved about organic chemistry years ago (and now). It is nearly always possible to design a molecule from scratch to test an idea. There was no reason to make 7paracyclophane, other than to get up close and personal with the ring current.

If the probability or improbability of our existence, to which the gedanken experiment speaks, isn’t a philosophical question, what is?