Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« A DPP-IV Compound Makes It Through | Main | Sponsor A Gene? »

April 1, 2009

Mexican Lemons To the Rescue

Email This Entry

Posted by Derek

Thanks to a comment on this post, I’ve had a chance to read this interesting article from Stephen Johnson of Bristol-Myers Squibb, entitled “The Trouble with QSAR (Or How I Learned to Stop Worrying And Embrace Fallacy)”. (As a side note, it’s interesting to see that people still make references to the titling of Dr. Strangelove. I’ve never met Johnson, but I’d gather from that that he can’t be much younger than I am).

Lemongraph.jpg

The most arresting part of the article is the graph found in its abstract. No mention is made of it in the text, but none has to be. It’s a plot of the US highway fatality rate versus the tonnage of fresh lemons imported from Mexico, and I have to say, it’s a pretty darn straight line. I’ve seen a lot shakier plots used to justify some sweeping conclusions, and if those were justified, well, then I’m forced to conclude that Mexican lemons have improved highway safety a great deal. The vitamin C, maybe? The fragrance? Bioflavanoids?

None of the above, of course. Correlation, tiresomely, once again refuses to imply causation, even when you ask it nicely. And that’s the whole point of the article. QSAR, for those outside the business, stands for Quantitative Structure-Activity Relationship(s), an attempt to rationalize the behavior of a series of drug candidate compounds through computational means. The problem is, there are plenty of possible variables (size, surface area, molecular weight, polarity, solubility, charge, hydrogen bond donors and acceptors, and as many structural representation parameters as you can stand). As Johnson notes dryly:

” With such an infinite array of descriptions possible, each of which can be coupled with any of a myriad of statistical methods, the number of equivalent solutions is typically fairly substantial.”

That it is. And (as he rightly mentions) one of the other problems is that all these variables are discontinuous. Some region of the molecule can get larger, but only up to a point. When it’s too large to fit into the binding site any more, activity drops off steeply. Similarly, the difference between forming a crucial hydrogen bond and not forming one is a big difference, and it can be realized by a very small change in structure and properties. (Thus the “magic methyl” effect).

But that’s not the whole problem. Johnson takes many of his fellow computational chemists to task for what he sees as sloppy work. Too many models are advanced just because they’ve shown some (limited) correlations, and they’re not tested hard enough afterwards. Finding a model with a good “fitness score” becomes an end in itself:

”We can generate so many hypotheses, relating convoluted molecular factors to activity in such complicated ways, that the process of careful hypothesis testing so critical to scientific understanding has been circumvented in favor of blind validation tests with low resulting information content. QSAR disappoints so often, not only because the response surface is not smooth but because we have embraced the fallacy that correlation begets causation.”

Comments (33) + TrackBacks (0) | Category: In Silico


COMMENTS

1. Retread on April 1, 2009 8:17 AM writes...

A similar declining straightline plot can be made between ambient lead levels and time. A nearly identical plot can be made for the decline in college board scores over time (before they were normalized upward to improve educator's and student's self-esteem). Clearly then, lead makes us smarter.

Permalink to Comment

2. HelicalZz on April 1, 2009 8:20 AM writes...

And here I thought I was unusual for finding this cartoon so funny.

http://xkcd.com/552/

Zz

Permalink to Comment

3. Wavefunction on April 1, 2009 8:45 AM writes...

I have seen that one. There are indeed many problems with QSAR, including overfitting and mistaking correlation for causation. Here are two similar but a little more detailed and engaging articles from Arthur Doweyko, also from BMS:

1. QSAR: Dead or Alive?
J Comput Aided Mol Des (2008) 22:81–89
DOI 10.1007/s10822-007-9162-7

2. Is QSAR relevant to Drug Discovery?
IDrugs 2008 11(12):894-899

In one of these Doweyko cites the correlation between number of breeding storks and number of new births in Germany. Another more subtle but still obvious graph is a correlation between number of executions in the US vs decline in US population rate. You need to know the physical basis of the correlation in order to distinguish correlation from causation.

A reasonable and sane computational chemist will usually know the problems with QSAR well and will judiciously interpret models.

Permalink to Comment

4. NJW on April 1, 2009 9:04 AM writes...

Reminds me of
http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming
Hit the graph at the right.

Permalink to Comment

5. schinderhannes on April 1, 2009 9:23 AM writes...

@ Wavefunction

The example with storks and babies in Germany is a really poor example: since everybody knows that in Germany babies are delivered by storks, there is not only a correlation but also a causation in this case.
Fewer storks fewer babies, simple!

Permalink to Comment

6. Hap on April 1, 2009 9:31 AM writes...

Mexican lemons ---> cheaper car air fresheners ---> happier drivers.

See? It works! I are a genius.

Permalink to Comment

7. anon on April 1, 2009 9:50 AM writes...

Mexican lemon truck drivers in the US drive slowly to comprehend highway signs because English is their second language. Everyone slows down as a result, leading to fewer fatalities.

Permalink to Comment

8. Anonymous on April 1, 2009 9:54 AM writes...

But what about pirates? Any correlation to the number of pirates?

As long as it's testable, I'm happy to use any metric that I can vary predictably by changing the structure to try to understand the activity of my compounds. It's when you start getting into numbers based on some black-box multicomponent analysis that isn't readily derivable in the real world, then you've lost me.

Permalink to Comment

9. Hap on April 1, 2009 10:21 AM writes...

What is this "slowing down to read signs because English is your second language" bit? One, no one where I live actually bothers to read signs - if their exit is nearby, they simply cut over how ever many lanes are needed to get to their exit and ignore what might be in their way (without any of this pesky foresight stuff), while speed limit, nearby lane closing, and no U-turn signs are roundly ignored. (And turn signals are only for when police are around or for when you cut someone off.) Two, truck drivers where I live don't slow down - they seem to figure it's everyone else's job to stay out of their way, and if they need to go somewhere they (usually) just signal and move, with what is in their way being irrelevant. (I would also assume Mexican trucking companies are less likely than American ones to have satellite/GPS truck monitors, so speeding to achieve their (probably optimistic) schedules is likely.)

Permalink to Comment

10. tyrosine on April 1, 2009 11:10 AM writes...

For these sort of exercises, I think it's important to realize what the p-value or R^2 truly means. Correct me if I'm wrong, but for the lemon chart, R^2=0.97 means there is a 3% chance the trend is due to pure random chance (or ~1 in 30). Sounds pretty good? Not if you have a modeler who spends his entire day looking for trends. This would mean if he looked at 30 possible trends, he would've found atleast one with an R^2 of 0.97 by pure chance. If he spends all week, he could probably find one with an R^2 of 0.995.

Permalink to Comment

11. emjeff on April 1, 2009 11:51 AM writes...

This sloppy way of thinking is (sadly) not confined to QSAR. It seems that the entire filed of modern epidemiology is dedicated to finding correlations, coffee consumption vs. cancer, meat intake vs. impotence, etc. No biological plausibility enters into these papers; the correlations are presented along with the rubber-stamp sentence "More research into the possible reasons for this relationship is needed" (read:Give me more grant money). These types of stories contibute to the poor public perception of science.

Permalink to Comment

12. FormerMolecModeler on April 1, 2009 11:59 AM writes...

tyrosine:

Incorrect. You're thinking of a P-value. R^2 is simply the square of the correlation coefficient, R, which can be positive or negative. It's a measurement of a model's predictive power.

Permalink to Comment

13. dWj on April 1, 2009 12:09 PM writes...

tyrosine, in particular:

There's a data-mining element to this (we aren't given a p-value here, but I bet it's not terrible), but the bigger thing that jumps out from this particular case is cointegration, where two variables that are independently following trends over a period of time will therefore be correlated. Note the year labels on the points; data-mining becomes a better concern if, reading from upper left to lower right, they went 1999, 1996, 1998, 2000, 1997 or something. The other examples given in these comments -- lead versus test scores, pirates versus anything -- are also cointegration issues rather than people simply having dug through a bunch of data to find out that ambient lead correlates with higher test scores.

Permalink to Comment

14. Aspirin on April 1, 2009 1:59 PM writes...

"3% chance the trend is due to pure random chance"

That's a p-value. Calculating that for a particular model may be non-trivial.

Permalink to Comment

15. RB Woodweird on April 1, 2009 2:24 PM writes...

Aren't there a crapload of pirates around? Try sailing a luxury yacht by Somalia or in some waters around Malaysia.

Permalink to Comment

16. KwadGuy on April 1, 2009 3:39 PM writes...

You would be surprised at how many scientists believe in QSAR models, and yet how few examples there are of QSAR models that have actually been of real use in a PROspective fashion.

Permalink to Comment

17. Great Molecular Crapshoot on April 1, 2009 5:27 PM writes...

Clustering in the training set is a real problem in QSAR because it can trick you into thinking that you are interpolating when in fact you are extrapolating.

Permalink to Comment

18. tpah on April 1, 2009 6:50 PM writes...

What is it about QSAR modelers and their humorous paper titles? I was at a cheminformatics conference in the Netherlands last year and one of the QSAR talks was entitled 'QSAR modeler seeks meaningful relationship' - one of the best titles I've seen.

They may not convince us they're right, but at least they can be funny about it

Permalink to Comment

19. TX Raven on April 2, 2009 2:26 AM writes...

How about the opposite situation?
Take a look at a currently approved manuscript in BMCL from AZ (Defining optimum lipophilicity and molecular weight ranges for drug candidates—Molecular weight dependent lower log D limits based on permeability).
There is data all over the Papp vs LogD graph, with a R2=0.12... that does not stop the author from drawing conclusions...

I am thinking... what is the real physical meaning of this?

Anyone can help me understand this?

Lost in NJ,
TX Raven

Permalink to Comment

20. damien bove on April 2, 2009 2:29 AM writes...

Its always good to see the statistics shown up, butthe big question is what woudl the power calculation look like for that, just how many lemon trucks / accidents would be needed to judge the relevance.

Permalink to Comment

21. drug_hunter on April 2, 2009 6:24 AM writes...

FWIW, The modelers here are pretty careful at checking out QSAR methods reported in the literature -- they've set up a streamlined way to do this -- and it is nearly always the case that the correlations don't hold up to more rigorous testing. Them molecules is sneaky.

Permalink to Comment

22. Lee on April 2, 2009 1:51 PM writes...

I think that people are misinterpreting Stephen and Arthur's messages. The data they present in their papers is exactly why creating linear models, especially in the presence of a limited number of cases, is a fool's errand.

The probability of finding a chance correlation, especially in situations which are not linear by nature and where there are few cases (i.e. short wide data tables), is so great that it's not even worth trying. It's also why non-linear methods like Forest of Trees, SVM, Bayes, and kNN methods, paired with descriptor-selecting methods, have become state of the art.

The lack of interpretability is the weakness of these methods, and perhaps part of the motivation for building traditional QSAR models, but I'd rather have a correct model that's a black box then an incorrect one that I think I understand. Unfortunately, convincing the medchemists to trust the models is harder when there's no visual or intuitive component.

Perhaps the term QSAR should be laid to rest, and replaced with MLSAR (Machine Learning SAR). This would change the mindset.

Permalink to Comment

23. TFox on April 2, 2009 4:00 PM writes...

@19: r^2 is the strength of a relationship, p is how well you've established it. Even a very weak relationship can be established with high probability if you have enough data points, and that's what Fig 2 shows us. The weakness of the correlation means that predictability is poor. As a direct consequence of this poor predictive power, the author suggests not bothering with most of Lipinski rule type criteria, and just looking at MW and lipophilicity (Fig 4). (At least, that's what I get from a quick skim.)

Permalink to Comment

24. InfMP on April 2, 2009 8:06 PM writes...

Saw that photo on facebook. I was wondering where it came from. totally awesome

Permalink to Comment

25. Cyan on April 2, 2009 10:13 PM writes...

FWIW, setting R^2 = 0.965 to be as generous as possible with the rounding, the p-value for a one-sided test against zero correlation is 0.0004 (Fisher rho-to-z transform and Gaussian approximation). If you generated 1,644 bivariate Gaussian data sets with N = 5 and rho = 0 you'd have a 50% chance of getting one with a stronger correlation.

In addition to the cointegration explanation, another obvious cheat is the fact that only 5 data points are plotted -- many more years of data on both variables are likely available.

(P-values are lousy as measures of statistical evidence -- see the work of Richard Royall for more.)

Permalink to Comment

26. Still Scared of Dinosuars on April 3, 2009 7:46 AM writes...

The notion of correlation and causation can be used to highlight this graph a bit. If someone says that reliable predictions state that Mexican lemon imports will increase 12% you may argue that automotive insurance stocks are a good buy because their claims will go down. Tell people the gov't should subsidize Mexican lemon imports to improve road safety and you'll...well...I guess you'll still find people who will agree. They're probably the only people still owning insurance stocks.

Permalink to Comment

27. MDW on April 9, 2009 10:28 AM writes...

Steve and I went to school together at PSU, he knows what he is talking about. But I'm not ready to throw the baby out with the bathwater just yet. Countless times at my old workplace, we had fast synthesis cycle times. This allowed us to really "validate" our QSAR models. For the most part, we performed quite well on "new" compounds not seen by the models, and our med chemists were not reluctant to use our models. We just didn't get around to publishing the models in scientific journals.

Permalink to Comment

28. Got_QSAR? on April 9, 2009 4:01 PM writes...

I definitely wouldn't throw the baby out with the bathwater either. IMHO if you are not looking for trends between chemical descriptors and biological activity you are not doing your job as a medicinal chemist. If a QSAR model fails to be predictive, it is not due to misinterpreting causation, it is because you are not using the correct descriptor or combination of descriptors (PLS regression).

I have noticed that chemists will often apply QSAR (with out physically plotting the relationship) without realizing they are using it.
I find it amusing when chemists notice a trend between the property of a functional group (lipophilicity etc) and make the appropriate compound and mock qsar at the same time.

Like protein crystal structures, a QSAR relationship is a model. And much like structure based drug design, QSAR models are not always predictive. If I had a dollar for every compound a modeler "designed" that was a dud, well...

Permalink to Comment

29. Myself on June 2, 2009 12:59 AM writes...

Coming from far outside the field as I do, I made a few guesses about QSAR's acronymic expansion before the term was defined.

I'd come up with "quodlibet search at random", and the more I read, the more I think it fits!

Permalink to Comment

30. Got_QSAR on June 23, 2009 3:21 PM writes...

Formulating trends between biological activity and structure (defined by physicochemical descriptors) and designing new compounds based on the trend is random?

"Myself", keep reading.

Permalink to Comment

31. Carolus on July 19, 2011 9:53 PM writes...

OK, I'm a little late to this game, but I've got the answer. The original graph is inverted. The x-axis should be fatalities, the y-axis imported lemons. The causation is then obvious: the lower the fatality rate, the more lemons will be imported. More people, more margaritas consumed, more lemons imported. QED.

Permalink to Comment

32. Allegra Brodnicki on July 20, 2012 1:41 AM writes...

Do you have any video of that? I'd care to find out more details.

Permalink to Comment

33. telecharger video youtube on May 29, 2013 2:39 PM writes...

Wow! This could be one particular of the most helpful blogs We have ever arrive across on this subject. Actually Wonderful. I am also an expert in this topic so I can understand your effort.

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
The Palbociclib Saga: Or Why We Need a Lot of Drug Companies
Why Not Bromine?
Fragonomics, Eh?
Amicus Fights Its Way Through in Fabry's
Did Pfizer Cut Back Some of Its Best Compounds?
Don't Optimize Your Plasma Protein Binding
Fluorinated Fingerprinting
One of Those Days