Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Addex Cuts Back: An Old Story, Told Again | Main | Snow Versus Scientific Progress »

February 8, 2013

All Those Drug-Likeness Papers: A Bit Too Neat to be True?

Email This Entry

Posted by Derek

There's a fascinating paper out on the concept of "drug-likeness" that I think every medicinal chemist should have a look at. It would be hard to count the number of publications on this topic over the last ten years or so, but what if we've been kidding ourselves about some of the main points?

The big concept in this area is, of course, Lipinski criteria, or Rule of Five. Here's what the authors, Peter Kenny and Carlos Montanari of the University of São Paulo, have to say:

No discussion of drug-likeness would be complete without reference to the influential Rule of 5 (Ro5) which is essentially a statement of property distributions for compounds taken into Phase II clinical trials. The focus of Ro5 is oral absorption and the rule neither quantifies the risks of failure associated with non-compliance nor provides guidance as to how sub-optimal characteristics of compliant compounds might be improved. It also raises a number of questions. What is the physicochemical basis of Ro50s asymmetry with respect to hydrogen bond donors and acceptors? Why is calculated octanol/water partition coefficient (ClogP) used to specify Ro50s low polarity limit when the high polarity cut off is defined in terms of numbers of hydrogen bond donors and acceptors? It is possible that these characteristics reflect the relative inability of the octanol/water partitioning system to ‘see’ donors (Fig. 1) and the likelihood that acceptors (especially as defined for Ro5) are more common than donors in pharmaceutically-relevant compounds. The importance of Ro5 is that it raised awareness across the pharmaceutical industry about the relevance of physico- chemical properties. The wide acceptance of Ro5 provided other researchers with an incentive to publish analyses of their own data and those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.

There, fellow med-chemists, doesn't this already sound like something you want to read? Thought so. Here, have some more:

Despite widespread belief that control of fundamental physicochemical properties is important in pharmaceutical design, the correlations between these and ADMET properties may not actually be as strong as is often assumed. The mere existence of a trend is of no interest in drug discovery and strengths of trends must be known if decisions are to be accurately described as data-driven. Although data analysts frequently tout the statistical significance of the trends that their analysis has revealed, weak trends can be statistically significant without being remotely interesting. We might be confident that the coin that lands heads up for 51 % of a billion throws is biased but this knowledge provides little comfort for the person charged with predicting the result of the next throw. Weak trends can be beaten and when powered by enough data, even the feeblest of trends acquires statistical significance.

So, where are the authors going with all this entertaining invective? (Not that there's anything wrong with that; I'm the last person to complain). They're worried that the transformations that primary drug property data have undergone in the literature have tended to exaggerate the correlations between these properties and the endpoints that we care about. The end result is pernicious:

Correlation inflation becomes an issue when the results of data analysis are used to make real decisions. To restrict values of properties such as lipophilicity more stringently than is justified by trends in the data is to deny one’s own drug-hunting teams room to maneuver while yielding the initiative to hungrier, more agile competitors.

They illustrate this by reference to synthetic data sets, showing how one can get rather different impressions depending on how the numbers are handled along the way. Representing sets of empirical points by using their average values, for example, can cause the final correlations to appear more robust than they really are. That, the authors say, is just what happened in this study from 2006 ("Can we rationally design promiscuous drugs?) and in this one from 2007 ("The influence of drug-like concepts on decision-making in medicinal chemistry"). The complaint is that showing a correlation between cLogP and median compound promiscuity does not imply that there is one between cLogP and compound promiscuity per se. And the authors note that the two papers manage to come to opposite conclusions about the effect of molecular weight, which does make one wonder. The "Escape from flatland" paper from 2009 and the "ADMET rules of thumb" paper from 2008 (mentioned here) also come in for criticism on this point - binning averaged data from a large continuous set and then treated those as real objects for statistic analysis. Ones conclusions depend strongly on how many bins one uses. Here's a specific take on that last paper:

The end point of the G2008 analysis is ‘‘a set of simple interpretable ADMET rules of thumb’’ and it is instructive to examine these more closely. Two classifications (ClogP<4 and MW<400 Da; ClogP>4 or MW>400 Da) were created and these were combined with the four ionization state classifications to define eight classes of compound. Each combination of ADMET property and compound class was labeled according to whether the mean value of the ADMET property was lower than, higher than or not significantly different from the average for all compounds. Although the rules of thumb are indeed simple, it is not clear how useful they are in drug discovery. Firstly, the rules only say whether or not differences are significant and not how large they are. Secondly, the rules are irrelevant if the compounds of interest are all in the same class. Thirdly, the rules predict abrupt changes in ADMET properties going from one class to another. For example, the rules predict significantly different aqueous solubility for two neutral compounds with MW of 399 and 401 Da, provided that their ClogP values do not exceed 4. It is instructive to consider how the rules might have differed had values of logP and MW of 5 and 500 Da (or 3 and 300 Da) had been used to define them instead of 4 and 400 Da.

These problems also occur in graphical representations of all these data, as you'd imagine, and the authors show several of these that they object to. A particular example is this paper from 2010 ("Getting physical in drug discovery"). Three data sets, whose correlations in their primary data do not vary significantly, generate very different looking bar charts. And that leads to this comment:

Both the MR2009 and HY2010 studies note the simplicity of the relationships that the analysis has revealed. Given that drug discovery would appear to be anything but simple, the simplicity of a drug-likeness model could actually be taken as evidence for its irrelevance to drug discovery. The number of aromatic rings in a molecule can be reduced by eliminating rings or by eliminating aromaticity and the two cases appear to be treated as equivalent in both the MR2009 and HY2010 studies. Using the mnemonic suggested in MR2009 one might expect to make a compound more developable by replacing a benzene ring with cyclohexadiene or benzoquinone.

The authors wind up by emphasizing that they're not saying that things like lipophilicity, aromaticity, molecular weight and so on are unimportant - far from it. What they're saying, though, is that we need to be aware of how strong these correlations really are so that we don't fool ourselves into thinking that we're addressing our problems, when we really aren't. We might want to stop looking for huge, universally applicable sets of rules and take what we can get in smaller, local data sets within a given series of compounds. The paper ends with a set of recommendations for authors and editors - among them, always making primary data sets part of the supplementary material, not relying on purely graphical representations to make statistical points, and a number of more stringent criteria for evaluating data that have been partitioned into bins. They say that they hope that their paper "stimulates debate", and I think it should do just that. It's certainly given me a lot of things to think about!

Comments (13) + TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico | The Scientific Literature


COMMENTS

1. Curious Wavefunction on February 8, 2013 11:11 AM writes...

Yes, it's a great paper and I am glad you highlighted it. For some reason it reminded me of Stephen Jay Gould's famous essay "The Median Isn't the Message".

Gould's take-home message was that the true representation of a distribution is the distribution itself, not a metric like mean or median. Both Gould and the authors are making a similar point; only the raw, untransformed data can give us an accurate picture of reality.

Permalink to Comment

2. weirdo on February 8, 2013 12:20 PM writes...

Yeah, definitely a paper whose time has come. Reminds me very much of a blog I read regularly about 4-5 years ago lamenting the very concepts related here -- it went dormant years ago. Makes me think the blogmaster was Peter Kenny or a close associate.

If you have read Nick Silver's book you will certainly recognize the primary issue here. Too much data, too little understanding.

Permalink to Comment

3. Ed on February 8, 2013 12:26 PM writes...

Great molecular crapshoot?

Permalink to Comment

4. weirdo on February 8, 2013 12:31 PM writes...

Thanks Ed!

I was thinking of the precursor to what is apparently the current permutation. Definitely something for me to catch up on.

Permalink to Comment

5. jd on February 8, 2013 1:20 PM writes...

weirdo-

do you mean nate silver?

Permalink to Comment

6. weirdo on February 8, 2013 1:25 PM writes...

jd-- well, if I understand the statistics well enough, Nick and Nate are pretty much the same.

But then maybe I'm a little rusty on my math.

Permalink to Comment

7. Anonymous on February 8, 2013 1:29 PM writes...

The problem is not Lipinski's or any of the subsequent points-to-consider rubrics, but their rapid evolution into the rule of 3/4/5 commandments that placed limits on medchemists’ imaginations and their ability to pursue interesting biological activity not fitting the rules. Now after a decade of stupidity, we are recognizing the obvious: a rule of thumb can be useful only when the dogmatism is left on the sidelines.

Permalink to Comment

8. Pete on February 8, 2013 3:01 PM writes...

Data is a good servant but a poor master.

Permalink to Comment

9. on-ice-in-new-england on February 8, 2013 3:37 PM writes...

Reader Exercise: Compare the quality and testability of data used to generate these papers with the quality and testability of data used for climate modeling. Recommend appropriate social policies. Be sure to explain your error bars.

Permalink to Comment

10. Post-Newtonian on February 8, 2013 6:16 PM writes...

Great Bulletin, Peter (and Carlos).

Responders to the Universe - Unite!

Controllers of the Universe - your game is up...

Permalink to Comment

11. Pete on February 8, 2013 6:49 PM writes...

Will the Controllers of the Universe be easily prised from their bunker?

Permalink to Comment

12. Rock on February 8, 2013 10:06 PM writes...

I am still one to believe that following the suggestions in all of those papers will lead to higher quality compounds. Some people argue it is stochastic in nature; I say so be it. Why would you chose not to work in higher probability space if the target allows? If you have been around long enough, you have witnessed the perils of working in less desirable space for yourself. At the same time, you would be a fool not to determine the property boundaries of your series using experimental data. That includes LogP.

Permalink to Comment

13. InSilicoConsulting on February 9, 2013 9:57 AM writes...

Agree with rock why work in less desirable chemical space when a lot of chemical space is unexplored, even within the boundaries of such thumb rules?

The idea is to minimize downstream risks as early as possible. Noone claims that these simple rules help with lead optimization. If they did, what would a medchemist do?

For some more trends http://www.ukqsar.org/slides/Tony_Wood_design%20challenges_09.ppt

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
How Not to Do It: NMR Magnets
Allergan Escapes Valeant
Vytorin Actually Works
Fatalities at DuPont
The New York TImes on Drug Discovery
How Are Things at Princeton?
Phage-Derived Catalysts
Our Most Snorted-At Papers This Month. . .