About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Table Of Content Graphics, Mocked | Main | Where's the Best Place to Apply Modeling to Drug Discovery? »

November 7, 2011

Rating A Massive Pile of Compounds

Email This Entry

Posted by Derek

Here's an interesting exercise carried out in the medicinal chemistry departments at J&J. The computational folks took all the molecules in the company's files, and then all the commercially available ones (over five million compounds), minus natural products, which were saved for another effort, and minus the obviously-nondruglike stuff (multiple nitro groups, solid hydrocarbons with no functionality, acid chlorides, etc.) They then clustered things down into (merely!) about 20,000 similarity clusters, and asked the chemists to rate them with up, down, or neutral votes.

What they found was that the opinions of the med-chem staff seemed to match known drug-like properties very closely. Molecular weights in the 300 to 400 range were most favorably received, while the likelihood of a downvote increased below 250 or above 425 or so. Similar trends held for rotatable bonds, hydrogen bond donors and acceptors, clogP, and other classic physical property descriptors. Even the ones that are hard to eyeball, like polar surface area, fell into line.

It's worth asking if that's a good thing, a bad thing, or nothing surprising at all. The authors themselves waffle a bit on that point:

The results of our experiment are fully consistent with prior literature on what confers drug- or lead-like characteristics to a chemical substance. Whether the strategy will yield the desired results in the long term with respect to quality, novelty, and number of hits/leads remains to be seen. It is also unclear whether this strategy can lead to sufficient differentiation from a competitive stand-point. In the meantime, the only undeniable benefits we can point to is that we harnessed our chemists’ opinions to select lead-like molecules that are totally within reasonable property ranges, that fill diversity holes, and that have been purchased for screening, and that we did so in a way that promoted greater transparency, greater awareness, greater collaboration, and a renewed sense of involvement and engagement of our employees.

I'll certainly give them the diversity-of-the-screening-deck point. But I'm not so sure about that renewed sense of involvement stuff. Apparently 145 chemists participated in total (this effort was open to everyone), but no mention is made of what fraction of the total staff that might be. People were advised to try to vote on at least 2,000 clusters (!), but fewer than half the participants even made it that far. Ten people made it halfway through the lot, and 6 lunatics actually voted on every single one of the 22,015 clusters, which makes me think that they had way too much time on their hands and/or have interesting and unusual personality features. A colleague's reaction to that figure was "Wow, they'll have to track those people down", to which my uncharitable reply was "Yeah, with a net".

So while this paper is interesting to read, I can't say that I would have been all that happy participating in it (although I've certainly had smaller-scale experiences of this type). And I'd like to know what the authors thought when they finally assembled all the votes and realized that they'd recapitulated a set of filters that they could have run in a few seconds, since they're surely already built into their software. And we all should reflect on how thoroughly we seem to have incorporated Lipinski's rules into our own software, between our ears. On balance, it's probably a good thing, but it's not without a price.

Comments (16) + TrackBacks (0) | Category: Drug Assays | Life in the Drug Labs


1. BCP on November 7, 2011 10:25 AM writes...

"undeniable benefits"? Admittedly, I can only see the abstract, but this does sound like an exercise in telling us what we already know.

Permalink to Comment

2. Biotechtranslated on November 7, 2011 10:53 AM writes...

I wonder if all this groupthink as to what a drug should look like is holding back R&D.

Based on the paper, nobody would have thought metformin, fampyra or BG-12 would be viable drugs.

The thing with R&D is, there are trends, but there are no hard and fast rules. So rather than tossing out a compound because it doesn't fit the rules, maybe one should first see if the rules apply in this particular situation?


Permalink to Comment

3. CMCguy on November 7, 2011 11:03 AM writes...

This message I see often condensed by management from this type effort is "Since our Computer models get us the same answers we don't need so many chemists, particularly those experienced ones, and can outsource all that."

Permalink to Comment

4. RM on November 7, 2011 11:48 AM writes...

Call me a cynic, but how much novel insight was there actually to be gained from this exercise?

Do we(*) really understand drugs and druggability that well enough that one would expect a novel and consistent ranking of 20,000+ compounds? Dollars to donuts, those people who ranked 2,000+ compounds weren't using any sort of uncaptured intuition to do so, but instead, consciously or unconsciously, were mechanically applying the "rules" they learned. Of course the analysis came up with Lipinski's rules etc., as that's what was used to make the rankings. You be hard pressed to honestly rank 20,000 compounds in any other fashion.

(*) This is taken as an aggregate of many people over a very broad range of compounds. I'll admit there are people out there who have a good eye for what makes a good SSRI, or ER (ant)agonist, etc., but I don't think anyone has intuition across the whole pharmaceutical landscape, and certainly not when lumped as a group.

Permalink to Comment

5. My 0.02 on November 7, 2011 11:59 AM writes...


How much novel insight from the excise - I would say "close to zero". Until you actually do the assay, you just don't know. I wonder how high "Dimethyl fumarate" (a tiny molecule for MS - posted by Derek the other day) would rank.

Permalink to Comment

6. Mike on November 7, 2011 12:09 PM writes...

Getting input from as many chemists as possible when deciding which compounds to purchase to augment a library is of course a good idea. Once you've done that, comparing the results of that to what you would have gotten from some standard algorithm is of course also a good idea. You really can't fault J&J for doing either of these.

Publishing the result of this comparison, which is neither groundbreaking nor surprising, might be questioned, but its good to know that the computational chemists and the lab chemists are "on the same page".

Permalink to Comment

7. Vader on November 7, 2011 12:24 PM writes...

"solid hydrocarbons with no functionality"

They ruled out my lunch?

Permalink to Comment

8. Innovorich on November 7, 2011 12:42 PM writes...


Permalink to Comment

9. marcello on November 7, 2011 2:17 PM writes...

of course it is up to discussion whether what we have between our ears is software, or rather hardware...
I would call "remembering Lipinski's rules" software...

Permalink to Comment

10. Vince on November 7, 2011 4:06 PM writes...

CMCguy: Call me a cynic, but how much novel insight was there actually to be gained from this exercise?

Perhaps none. Perhaps the insight is the cyclical nature in which the same generalized rules taught are holding back progress.

Basically, you had what we ideally believe to be 145 separate 'filters' looking at the same data set, hoping to extract some new heuristic from the output. Instead, it seems like most of the 145 filters, at least when summed over, use the same friggin' algorithm! :)

That said, these types of papers aren't useless. Somewhat analogously, not too long ago, Hod Lipson's group had a genetic algorithm sort through a huge, raw, data set that was motion captured from oscillators (SHO, chaotic, etc) and it re-derived Hamiltonians, the conservation of momentum, etc. As well as a few new relations that seem unrelated, but who knows... J&J's data set is a bit more complex, but the same principles that motivated their paper hold.

Permalink to Comment

11. DCRogers on November 7, 2011 6:52 PM writes...

They reference work from 2004 (Lajiness et. al., J. Med. Chem. 2004, 47, 4891-4896) that did a related study, letting a set of chemists reject hits from a number of studies. As the authors state:

"The results were striking: not only did the chemists disagree with each other, but they often contradicted their own choices. Indeed, reviewers agreed to reject the same compounds only 28% of the time, and when a reviewer looked at the same set of compounds repeatedly, they rejected the same compounds just 50% of the time."

Doesn't sound like these newer authors confirmed that result, however. Curious!

Permalink to Comment

12. Anononymous BMS Researcher on November 7, 2011 7:00 PM writes...

I'm a biologist myself, and my eyes tend to glaze over when the chemists are showing their latest SAR tables in working group meetings. If you tried to make me look at 22 thousand structure diagrams, long before I finished you'd probably need a net to catch me!

Of course some my PowerPoint slides may be just as inscrutable to the chemists as theirs are to me...

Permalink to Comment

13. RD on November 7, 2011 8:56 PM writes...

I think we're looking at this the wrong way. The problem is not that we should find the compounds that have the greatest chance of becoming leads. The problem is that resources are extremely limited and we are forced to prioritize. It could be the case that there is only a narrow subset of compounds that are amenable to optimization to become a drug. But ideally, wouldn't we want to test all of them to make sure we didn't miss anything? I mean other than the compounds withiut undesirable features.
Let's face it, to really get a fix on the scope of the drug like compounds out there, there need to be more people, not fewer, to make and test more compounds and that's not going to happen in this lifetime.

Permalink to Comment

14. EdM on November 8, 2011 1:47 AM writes...

I'm not a chemist, so maybe this is a deeply ignorant question, but what proportion of known effective drugs are outside these parameters?

Thinking naively, that would give a rough indication of the odds that these mental filters are rejecting useful compounds.

Permalink to Comment

15. Tim on November 8, 2011 9:23 AM writes...

To EdM,
I don't know specific numbers, but the vast majority of useful/successful drugs will fall within these filters. The filters were devised for the specific purpose of encompassing features that exist in successful drugs and excluding those that have lead to past failures of drug candidates.

Permalink to Comment

16. Dimitris Agrafiotis on November 9, 2011 9:06 AM writes...

Hi everyone,

First, I would like to thank you for taking the time to read our paper and comment on this blog. I agree that the results of this work are neither groundbreaking nor unexpected. It is true that metformin would probably not have passed this test, but that’s not the real question. The real question is if you have X dollars that can buy you Y compounds, how do you choose which compounds to buy, and who should make that decision? Nowhere in our paper do we state that the observed ranges should be used as hard filters; all we are trying to do is capture intuition and diversity of opinion, which plays a big role in decision-making. Library design is a probabilistic game and one needs to play the odds. We have previously published a number of papers showing how one can bias a library towards “drug-like” matter using probabilistic weights and optimization algorithms as opposed to hard filters.

But, as many of you pointed out and as we state ourselves in our conclusions, the real value of this approach is yet to be proven. We obviously intend to publish a follow-up study in a couple of years to see if this approach resulted in more and better quality hits and leads. If that turns out to be true, then this strategy will be worth replicating in other pharmas and the public sector. If not, then at least we’ll know the limits of collective decision-making, at least as it pertains to this problem.

As for the benefits of democratizing decision making, anyone who works in pharma these days knows how motivating this can be to most medicinal chemists (and most employees, for that matter).

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry