There's a new paper out in Nature Chemistry called "Quantifying the Chemical Beauty of Drugs". The authors are proposing a new "desirability score" for chemical structures in drug discovery, one that's an amalgam of physical and structural scores. To their credit, they didn't decide up front which of these things should be the miost important. Rather, they took eight properties over 770 well-known oral drugs, and set about figuring how much to weight each of them. (This was done, for the info-geeks among the crowd, by calculating the Shannon entropy for each possibility to maximize the information contained in the final model). Interestingly, this approach tended to give zero weight to the number of hydrogen-bond acceptors and to the polar surface area, which suggests that those two measurements are already subsumed in the other factors.
And that's all fine, but what does the result give us? Or, more accurately, what does it give us that we haven't had before? After all, there have been a number of such compound-rating schemes proposed before (and the authors, again to their credit, compare their new proposal with the others head-to-head). But I don't see any great advantage. The Lipinski "Rule of 5" is a pretty simple metric - too simple for many tastes - and what this gives you is a Rule of 5 with both categories smeared out towards each other to give some continuous overlap. (See the figure below, which is taken from the paper). That's certainly more in line with the real world, but in that real world, will people be willing to make decisions based on this method, or not?

The authors go for a bigger splash with the title of the paper, which refers to an experiment they tried. They had chemists across AstraZeneca's organization assess some 17,000 compounds (200 or so for each) with a "Yes/No" answer to "Would you undertake chemistry on this compound if it were a hit?" Only about 30% of the list got a "Yes" vote, and the reasons for rejecting the others were mostly "Too complex", followed closely by "Too simple". (That last one really makes me wonder - doesn't AZ have a big fragment-based drug design effort?) Note also that this sort of experiment has been done before.
Applying their model, the mean score for the "Yes" compounds was 0.67 (s.d.0.16), and the mean score for the "No" compounds was 0.49 (s.d. 0.23, which they say was statistically significant, although that must have been a close call. Overall, I wouldn't say that this test has an especially strong correlation with medicinal chemists' ideas of structural attractiveness, but then, I'm not so sure of the usefulness of those ideas to start with. I think that the two ends of the scale are hard to argue with, but there's a great mass of compounds in the middle that people decide that they like or don't like, without being able to back up those statements with much data. (I'm as guilty as anyone here).
The last part of the paper tries to extend the model from hit compounds to the targets that they bind to - a druggability assessment. The authors looked through the ChEMBL database, and ranked the various target by the scores of the ligands that are associated with them. They found that their mean ligand score for all the targets in there is 0.478. For the targets of approved drugs, it's 0.492, and for the orally active ones it's 0.539 - so there seems to be a trend, although if those differences reached statistical significance, it isn't stated in the paper.
So overall, I find nothing really wrong with this paper, but nothing spectacularly right with it, either. I'd be interested in hearing other calls on it as it gets out into the community. . .
1. Curious Wavefunction on January 26, 2012 11:47 AM writes...
I had similar sentiments. Plus, even with this kind of metric drug discovery is going to remain quite subjective. For instance, depending on the precise target and project, an average desirability score of 0.7 may still consign a compound to the molecular dustbin if it suffers from a single extreme property like PSA or lipophilicity. The one thing that this study does is to suggest retaining at least a few compounds that would earlier have failed the standard filters. As medicinal chemists we already knew this, but it might be nice to have a more quantitative measure.
Permalink to Comment2. anonymous on January 26, 2012 12:08 PM writes...
It would be interesting to see if this type of methodology (or related) could be used to triage HTS libraries. You probably wouldn't want to eliminate more than ~25% using this method, however, it might be useful to enrich libraries in "nice" compounds.
Permalink to Comment3. Virgil on January 26, 2012 12:16 PM writes...
So biologists have been struggling to deal with such issues of subjectivity for a long time... You do a gene chip, a bunch of things change, half of them have obscure names like "riken cDNA34561290", so you ignore those and go after the ones with familiar sounding names like "alcohol dehydrogenase". There's nothing more or less important about the two classes of hits, it's just that people gravitate toward the familiar.
As a biologist, it's interesting to hear that such biases exist in the chemistry field too. Apparently lots of chemists have lists of "things I won't work with", based on criteria a lot more benign than Derek's ;-)
Permalink to Comment4. smurf on January 26, 2012 2:11 PM writes...
This is an example for 80-20, and not untypical for the computational chemistry community: a lot of work for, well, for what really? For just another probabilistic descriptor, a descriptor that cannot really be acted on?
Yes, it is desirable to improve computational models, but we have many other, more urgent issues in early stage drug discovery, so perhaps it’s time to use our resources more wisely?
Permalink to Comment5. DrSnowboard on January 26, 2012 4:31 PM writes...
And the instituional bias at AZ is....? Quinazolines, pyrazoles..kinase motifs?
Permalink to Comment6. CYP3A4 on January 26, 2012 4:51 PM writes...
Favorable review of it by Leeson in Nature's News and Views this week. Must be good then.
Permalink to Comment7. Pete on January 26, 2012 5:21 PM writes...
Something that I learned about lead identification over the years was that those presenting a lead for optimisation usually have a different view of its quality to those who will be charged with optimsing it. Beauty really is in the eye of the beholder!
Permalink to Comment8. Pete on January 26, 2012 5:38 PM writes...
CYP3A4, Thanks for your most excellent advice!
Permalink to Comment9. DrJEKyll on January 26, 2012 7:44 PM writes...
I always feel like these are studies in self-fulfilling prophecy. Ro5(or analogue) heavily influenced which molecule were made, so Ro5 compounds were made and pushed into clinic.
Permalink to Comment10. pharmadude on January 26, 2012 8:42 PM writes...
I find these papers depressing. Is this the best we've got?
Permalink to Comment11. Anonymous on January 26, 2012 11:58 PM writes...
Andrew Hopkins overselling something with weak numbers and calling it beautiful? Like we haven't heard that before.
Permalink to Comment12. Despairing modeller on January 27, 2012 2:41 AM writes...
When I read stuff like this, as a modeller I get that same queasy feeling as one might get standing between riot police and an angry mob. There are several methodological errors that make the paper of little value
Permalink to Comment1) they rely on the argument X is a greek; all greeks are liars => X is a liar
2) they assume that all descriptors can computed accurately; the error on any ClogP is at least 0.25. I don't see how QED can be accurate to 3SF
3) the choice of negative data set is inadequate. The use of dissimilarity to pick it guarantees an effect. The set is probably not representative of the space of non-drugs
4) We are not told how many times and by how many chemists each compound was assessed (the whole point of the Lajiness paper)
We are building generalisations that miss the whole individual context of a target. I wonder how many new drugs would have been missed by applying these 'rules'
13. smurf on January 27, 2012 3:09 AM writes...
ad 12: thanks.
Permalink to Comment14. processchemist on January 27, 2012 4:30 AM writes...
A more interesting exercise would be to pick from the Merck Index all the known drugs and see how many meet these criteria.
Permalink to CommentSince we're talking about rule of five and recently we talked about cooperations between academia and industry, it's interesting that pregabalin would probably be seen as "ugly"...
15. Morten G on January 27, 2012 5:26 AM writes...
Fig 4 is really pretty.
Derek could you please break the figure b) c) into two? You're screwing with your layout.
Has anyone ever been in a meeting where they had to point out that while the current lead violated Ro5 it was meant for intravenous dosing?
Permalink to Comment16. DCRogers on January 27, 2012 10:52 AM writes...
There's a fundamental point they make that is worth hearing: the strict cutoffs of Ro5 has led to a bias where lots of compounds are made near the cutoff, leading to Hann's 'molecular obesity' phenomenon. From the paper:
"Paradoxically, since the publication of the seminal paper by Lipinski et al. there appears to be a growing epidemic, which Hann has termed 'molecular obesity', among new pharmacological compounds. Compounds with higher relative Mr and lipophilicity have a higher probability of attrition at each stage of clinical development. Thus, the inflation of physicochemical properties that increases the risks associated with clinical development may explain, in part, the decline in productivity of small-molecule drug discovery over the past two decades. However, the mean molecular properties of new pharmacological compounds are still considered Lipinski compliant, even though their property distributions are far from historical norms."
Permalink to Comment17. XChemistTurnedCompSci on January 27, 2012 11:38 AM writes...
I think druglikeness should be redefined as the similarity of a molecule with the set of marketed drugs that have made it through regulatory approval. After all, this is the end game of the pharma business. Instead, we seem to use the Ro5 to define druglikeness despite the fact that it was never intended to be used this way.
I have always been a little wary of using Ro5 to discriminate compounds for HTS. There have been a few papers that have shown that the Ro5 is no better than random in predicting the druglikeness (new definition) of a molecule. A more appropriate approach would be to use machine learning techniques to discriminate your HTS compounds. Is there any particular reason as to why machine learning hasn't been widely adopted for HTS discrimination?
Permalink to Comment18. NJBiologist on January 27, 2012 1:17 PM writes...
@17 XChemist: "There have been a few papers that have shown that the Ro5 is no better than random in predicting the druglikeness (new definition) of a molecule."
These papers would be very useful to me at work--could you pass along a citation or two?
Permalink to Comment19. TX Raven on January 27, 2012 3:30 PM writes...
@ 12: Amen to that.
I think the problem of Ro5 is one of domain of application.
Before you start making compounds, and in the absence of actual experimental information, it is ok to bias your chances based on previous knwoledge.
However, once you start making compounds and can get (rather inexpensively these days) actual data on biological properties, IMHO you should bias your decision based on these data.
I see often that chemists 'look' at a table with a compound characterized by good data, and still don't "like" the way it looks...
Is that scientific way to make decisions?
Permalink to Comment20. XChemistTurnedCompSci on January 27, 2012 4:33 PM writes...
@NJBiologist
Frimurer, Thomas et al, “Improving the Odds in Discriminating “Drug-Like” from “Non Drug-Like” Compounds, J Chem Inf Model, 2000, 40:1315-1324.
I think this is the one.
XChemist
Permalink to Comment21. NJBiologist on January 30, 2012 12:34 PM writes...
@20 XChemist: Thanks, that looks like an interesting read.
Permalink to Comment22. Geneticist on January 31, 2012 11:13 AM writes...
Thought this might be of interest to some people:
http://www.nature.com/nchembio/journal/v6/n7/abs/nchembio.380.html
Basically a machine-learning guided approach to predicting small molecular bioavailability in a model organism, after screening ~1000 molecules for accumulation and metabolism in the animals.
Permalink to Comment