I was looking through recent literature this past week and found a few things in JACS that I thought were particularly interesting.
The first comes from Kelly Damm and Heather Carlson, and substantiates my feeling that NMR is the coolest technique ever invented. In this case, they were trying to figure out the best way to incorporate protein flexibility into structure-based drug design. The authors previously established an MD method to generate multiple protein conformations of a single protein; the resultant ensemble worked better in assigning known inhibitors or non-inhibitors appropriately than a static structure. But all those calculations take a lot of time, and so Damm and Carlson went to the pdb, pulling out 90 static structures of HIV-1 protease (bound to a variety of ligands) and one NMR structure, which is actually an ensemble of 28 structures. What they discovered is that the success of these two ensembles was quite similar in identifying inhibitors, but that the NMR structure was less specific to a given ligand and so was more able to identify the essential features of the ligand and extrapolate to new classes of compounds. So, they suggest NMR structures as useful tools for SBDD. Go NMR!
Two communications also caught my eye: one, from Scott Miller’s group, extends his work on small, peptide-based catalysts to an Asp-catalyzed asymmetric epoxidation. In this case, putting the Asp carboxylate into a protected tripeptide known to form beta-turns resulted in a catalyst that could turn over nearly 20 times, with 97% yield and 92% ee in optimized conditions. He wrote a nice review on the rationale for this work now three years ago, but I would still recommend it. The second is work from John Klassen’s lab: Amidst the ongoing controversy of what gas-phase analysis of proteins really means, they seem to have put together a nice method for monitoring ligand binding sites, and determining whether the sites are identical (linear slope of ligand released over time and temperature) or not (non-linear slope).
Well, that’s my weekend reading. Now back to watching Wimbledon…
(ed’s note: Dr. Carlson alerted me to the fact that the study was actually about HIV protease, which I fixed 07/05)
Catherine (associate editor, Nature Chemical Biology)
You’ve flagged up an interesting paper. NMR always seems more elegant and cultured (surely spin choreography trumps reciprocal space) than crystallography. Some of the techniques used to probe protein-ligand interactions are quite exquisite.
It is worth taking a look at some of the methodological detail in this paper. The protein structures are used to generate pharmacophore models against which molecules in the database are then matched. The pharmacophores generated from the NMR structures were built from a smaller number (6-8) of elements than those from the crystal structures. I tend to be wary of attempts to compare performance of pharmacophores built from different numbers of features. In general, using larger numbers of pharmacophoric elements results in less permissive pharmacophores.
The choice of inactive compounds (the decoys) is important in studies of like this. I would be curious to know for each pharmacophore how many of the decoys have the full complement of pharmacophoric elements. In less abstract terms, suppose I’m evaluating a pharmacophore built from an acid, a basic amine and an aromatic ring. If my decoys only have aromatic rings and basic amines, they’ll all be an acid short of a full complement of pharmacophoric elements and geometry becomes irrelevant.
“Great Molecular Crapshoot” has made excellent points. We whole-heartedly agree and accounted for these concerns in our study.
It is easiest for me to first address the second concern about our test sets of compounds. GMC is correct that many papers screen against large counter sets that represent much of the available chemical space. This can lead to “inappropriately enhanced” performance of a pharmacophore model. We have a set of 89 known inhibitors and two inactive decoy sets. The first is 85 decoys of very similar molecular weight, chemical composition, and geometric size. Our first paper on HIV-1 protease outlined their creation. People interested in appropriate test sets should look at the 2D structures of all 89 inhibitors and 85 decoys in the supplemental information of that first paper. The 85 decoys were specifically created to test the same chemical space, and our original paper notes that the most common hits from the decoy set are renin inhibitors and small, hydrophobic peptides (compounds that should be identified as potential inhibitors). Our second set of 2322 decoys is a broad set which we added to this paper for comparison to other works in the literature.
With respect to the differing number of sites between NMR and crystallographic models, this was a definite concern which we discuss in the paragraph that runs from pages 8232-8233 of the Damm and Carlson paper. There are three “extra” sites in the crystal-based models that are not present in the NMR-based model. (Those extra features are present in some — but not all — inhibitors, and we argue that they should not be required elements in a model.) In order for the crystal-based model to identify the most inhibitors and reflect all of the diversity possible, it was necessary to drop several features by requiring hits to match only 10 or 9 of the features of the 11-site model (this is done on-the-fly in MOE). When we examined the detailed output, we found that known inhibitors were being identified by skipping the extra sites. When we manually created a crystal-based model without the three extra sites, the performance was nearly unchanged, verifying that the extra sites are not essential features.
Lastly, there is an unseen bias against the NMR model that it still overcomes. Using all 8 sites of the NMR model provided excellence recovery of known inhibitors (90%) without identifying a significant number of the 85 chemically similar decoys (4%). The best crystal-based model used 9 of 11 sites to identify 89% of the inhibitors and 3% of the decoys. Comparing an 8-of-8-sites model to a 9-out-of-11-sites model is biased because the later is technically the joint performance of 55 individual combinations (11×10/2). Even the 10-of-11-site models have 11 ways for a molecule to count as a hit. To see the single 8-site model from NMR perform equivalently is impressive.
I would like to thank Prof Carlson for taking the time to respond to my previous comments. I should also point out to Sceptical Chymist readers who may be following this exchange that selection of decoy sets is not a trivial problem.
If pushed to select decoy sets for pharmacophore evaluation, I would start by identifying compounds that have a full set of pharmacophoric elements from which the pharmacophore is built. I would use a 2D similarity (most likely based on fingerprints) search these compounds for the closest inactive analogs of each active matching the pharmacophore. The less similar these inactive analogs were to the actives, the more I would worry. This approach would hardly be news to Prof Carlson and Dr (or soon to be Dr) Damm and I have noted it primarily to place the rest of this post in context.
These searches for inactive compounds can be done easily in the data-rich environment of a large pharmaceutical company. In an academic setting, things are much more difficult. One might think that you could just go to the company that invented the active compound and ask for structures of some inactive analogs. They are inactive after all. However there are reasons that a company may not want to share this information. For example the inactive compounds might turn up as active when screened against another target which justifies a certain degree of paranoia.
However there is another less obvious reason that a company may be wary of revealing structures of inactive compounds. If the inactive compound falls within the scope of a patent, it can weaken (sometimes fatally) the patent. This can be used offensively in patent busting campaigns. Back to decoys, it is well known that companies synthesise and evaluate each others compounds and it is not without precedent for actual samples of competitor compounds to be shared with academic collaborators. It would be interesting to see how willing pharmaceutical companies would be to share assay data for competitor compounds synthesised in house. Could be some interesting discussions!
Before dishing out glory for NMR’s value in structure determination, as a person who actually has determined structures with both techniques I can only say that:
1) more times than not, “structural flexibility” seen in NMR ensembles is an artefact of the absence of observable geometric restraints and not the actual reflection of dynamics;
2) crystallography brings you plethora of information if one “reads between the lines” and consider what is disordered and not visible in the electron density map;
3) those who think that NMR reflects real in solution state of the molecule and crystallography is done in a highly fictional environment, need to inspect solution conditions used in NMR (extremely low salt concentrations certain pH requirements etc) and recall that protein crystals are not solid state and also that they consist of 50 % solvent or so (one can argue that this is actually a good model of a crowded environment of the cell).