About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Five Questions | Main | The Flu Plan, Part One: Vaccines »

November 1, 2005

Molecular Modeling Cage Match

Email This Entry

Posted by Derek

I mentioned an interesting paper that's coming out in the Journal of Medicinal Chemistry on molecular modeling. It's a long one from a large group of people scattered across GlaxoSmithKline's worldwide research facilities, entitled "A Critical Assessment of Docking Programs and Scoring Functions." And that's what it is, all right.

For the non-med-chem readers, those are two of the key techniques in computational molecular modeling. Docking refers to taking a modeled version of your small molecule and trying to fit it into a similarly modeled version of the binding site of your protein target. The program ties to take into account the size and shape of the molecule and the binding site, of course, as well as more subtle interactions between the various functional groups. Scoring functions are what the programs use to try to rate how well the docking procedure went for a given compound, and to compare it to others in a given data set.

The GSK team did a very thorough job, evaluating ten different docking programs. They started with seven varying types of protein targets, mostly different classes of enzymes, all of which are known drug targets. An expert computational chemist took each one and polished up the model of the binding site. At the same time, lists of between one and two hundred potential binding compounds were put together for each target, including several series of related compounds. Another modeling chemist took these structures and got them ready for docking. They made sure that a crystal structure of each structural class was known for each case (to check the accuracy of the modeling later on), and also made sure that the binding affinity of the compounds ranged over at least four orders of magnitude (from pretty darn good, in other words, to pretty darn awful). The goal was to make the whole exercise as real-world as possible. Then each of those binding site models and their associated lists of potential ligands were turned over to separate chemists with experience in the various docking programs, and they told them to have at it. As the paper puts it:

"To optimize the performance of each docking program, computational chemists with expertise in a particular program were identified from the worldwide GSK computational chemistry community. Each program expert was given complete freedom and sufficient time to maximize the performance of the docking program. . .No time deadlines were imposed so that even low-throughput docking programs could be evaluated. Indeed, no constraints whatsoever were placed on the level of agonizing over details of how each docking program was applied."

It's important to remember that the results of this paper come from experienced users who had a great deal of knowledge about the targets, and all the time they needed to mess with them. The aformentioned agonizing was devoted to three typical kinds of question that such software is designed to answer: The first was: what is the conformation (the 3-D physical "pose") of a small molecule once it's in a binding site? This is why they picked all these things with known crystal structures, since those provide a check with real data. Results of this test were OK, in some cases fairly good. Some of the target proteins seemed to have binding sites that were more suited for the capabilities of the programs, which could take the majority of the compounds in their list and fit them pretty close (within two angstroms) to the known crystal structures.

And every target had at least one program that could take at least a third or so of the test compounds and dock them fairly well. But the problem was, no one program could do that for more than 35% of the binding modes. The best performances were scattered among the different software packages, and there seems to be absolutely no way to know in advance whether a given program is going to perform well on a new target. The other problem, and it's a big one, was that the scoring functions couldn't reliably identify when the program had hit on one of the good answers. There wasn't much correlation between what the program thought was a well-docked conformation and its resemblance to the known crystal structure.

The second question they looked at was: given a list of molecules (some active, some inactive), how well can the software pick out some active ones? This process is often known as "virtual screening". Again, the results were fairly good, but with some significant problems. For all but one of the targets, at least one of the programs could find at least half of the top 10% of the active compounds. (I know, that sounds like a lot of defensive hedging compared to what some people think these programs can do, but that's the real world for you). The programs also did pretty well at pulling a variety of structures out, and not just making their total by grabbing only the members of one particular class.

But that fairly-decent performance is for the programs as a group. As before, though, the best performances were scattered through all the software packages, with no real standout. Most of the programs, at one point or another, had to grind through a significant amount of a compound lists to do the job, too, which is something you really don't want in real-world use. Another disturbing result was that some of the scoring functions seemed to be picking the right compounds for the wrong reasons – that is, based on incorrect binding modes.

Now we're ready for the third question, a hard one which (in my experience) is one of the ones that medicinal chemists most would like molecular modeling software to answer: given a list of compounds, can the program rank-order them according to their expected affinity for the target? Unfortunately, the answer is "absolutely not." No scoring function in any of the software packages could even come close. The compounds that the programs ranked as winners were just as likely to stink, and the ones that they put into the discard heap were just as likely to be fine.

My way of looking at the first two tests is to say that if you have just one molecular modeling package, it is guaranteed to mislead you a fair amount of the time. And you have no way of knowing when it's doing that. If you have more than one program to work with, though, then they are guaranteed to disagree with each other a fair amount of the time, and you have no way of knowing which one of them is right – if either. I'll let the authors have last word on the third test, and on the software in general:

". . .in the area of rank-ordering or affinity prediction, reliance on a scoring function alone will not provide broadly reliable or useful information. . .This study demonstrates unequivocally that significant improvements are needed before compound scoring by docking algorithms will routinely have a consistent and major impact on lead optimization. . .it is not completely obvious by what means these improvements will arise. . ."

Comments (5) + TrackBacks (0) | Category: In Silico


1. milo on November 2, 2005 11:33 AM writes...

Great post Derek! It is nice to see that a comparison such as this has finally made it in to the literature, even if it is JMC :-) I have often wondered how accurate a crystal structure is at showing the "true" nature of a drug/protein interaction. I get the impression that a lot of people look at an Xray structure and go "Well there you go! That is how our new drug works!". Folks in my lab have often asked if they could correlate IC50 with docking score. I have often told them that it is possible, but the results would not be very reliable. It is nice to see that I was not blowing' smoke! I still think that in silico is good for generating possible explainations of things, it just is not great at predictions. yet.

Permalink to Comment

2. DRogers on November 2, 2005 6:40 PM writes...

I would be interested in knowing how many companies are using docking for lead optimization (as this paper studies) versus lead generation (for example, screening an entire corporate collection through a docking program to identify possible new leads). My impression is that the latter task is more commonly done, and even if a method (or methods!) cannot rank-order compounds well in IC50 order, a docking method may still have value in the more rough-and-tumble world of identifying potential new leads in a yes-or-no fashion (or at least of rejecting vast numbers of useless compounds!).

(Anyone else have any idea of the balance of docking-for-virtual-screening vs. docking-for-QSAR that is happening in the wider (i.e., real) world?)

Permalink to Comment

3. MeToo on December 16, 2006 4:21 PM writes...

Great post Derek!

In this land of declining pharma there is a proliferation of managerial command and control types who think shrink wrapped in silico intelligence will save high cost research footprint.

Amusingly, the flaw in their thinking is entropy, which is poorly modeled by available software.

Oh the irony.

Permalink to Comment

4. sujitrambhade on March 5, 2009 1:29 AM writes...

i m student of mpharma(project student)
sir v hav schrodinger software.
i want 2 knew about docking procedure.
1.ligand prepration
2.protein preparation

Permalink to Comment

5. sujitrambhade on March 5, 2009 1:30 AM writes...

i m student of mpharma(project student)
sir v hav schrodinger software.
i want 2 knew about docking procedure.
1.ligand prepration
2.protein preparation

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry