About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Comment of the Day: Outsourcing and Architecture | Main | And Now, the Retractome »

November 11, 2010

And One Was Just Right?

Email This Entry

Posted by Derek

I've been reading an interesting new paper from Stuart Schreiber's research group(s) in PNAS. But I'm not sure if the authors and I would agree on the reasons that it's interesting.

This is another in the series that Schreiber has been writing on high-throughput screening and diversity-oriented synthesis (DOS). As mentioned here before, I have trouble getting my head around the whole DOS concept, so perhaps that's the root of my problems with this latest paper. In many ways, it's a companion to one that was published earlier this year in JACS. In that paper, he made the case that natural products aren't quite the right fit for drug screening, which fit with an earlier paper that made a similar claim for small-molecule collections. Natural products, the JACS paper said, were too optimized by evolution to hit targets that we don't want, while small molecules are too simple to hit a lot of the targets that we do. Now comes the latest pitch.

In this PNAS paper, Schreiber's crew takes three compound collections: 6,152 small commercial molecules, 2,477 natural products, and 6,623 from academic synthetic chemistry (with a preponderance of DOS compounds), for a total of 15, 252. They run all of these past a set of 100 proteins using their small-molecule microarray screening method, and look for trends in coverage and specificity. What they found, after getting rid of various artifacts, was that about 3400 compounds hit at least one protein (and if you're screening 100 proteins, that's a perfectly reasonable result). But, naturally, these hits weren't distributed evenly among the three compound collections. 26% of the academic compounds were hits, and 23% of the commercial set, but only 13% of the natural products.

Looking at specificity, it appears that the commercial compounds were more likely, when they hit, to hit six or more different proteins in the set, and the natural products the least. Looking at it in terms of compounds that hit only one or two targets gave a similar distribution - in each case, the DOS compounds were intermediate, and that turns out to be a theme of the whole paper. They analyzed the three compound collections for structural features, specifically their stereochemical complexity (chiral carbons as a per cent of all carbons) and shape complexity (sp3 carbons as a percent of the whole). And that showed that the commercial set was biased towards the flat, achiral side of things, while the natural products were the other way around, tilted toward the complex, multiple-chiral-center end. The DOS-centric screening set was right in the middle.

The take-home, then, is similar to the other papers mentioned above: small molecule collections are inadequate, natural product collections are inadequate: therefore, you need diversity-oriented synthesis compounds, which are just right. I'll let Schreiber sum up his own case:

. . .Both protein-binding frequencies and selectivities are increased among compounds having: (i) increased content of sp3-hybridized atoms relative to commercial compounds, and (ii) intermediate frequency of stereogenic elements relative to commercial (low frequency) and natural (high frequency) compounds. Encouragingly, these favorable structural features are increasingly accessible using modern advances in the methods of organic synthesis and commonly targeted by academic organic chemists as judged by the compounds used in this study that were contributed by members of this community. On the other hand, these features are notably deficient in members of compound collections currently widely used in probe- and drug-discovery efforts.

But something struck me while reading all this. The two metrics used to characterize these compound collections are fine, but they're also two that would be expected to distinguish them thoroughly - after all, natural products do indeed have a lot of chiral carbons, and run-of-the-mill commercial screening sets do indeed have a lot of aryl rings in them. There were several other properties that weren't mentioned at all, so I downloaded the compound set from the paper's supporting information and ran it through some in-house software that we use to break down such things.

I can't imagine, for example, evaluating a compound collection without taking a look at the molecular weights. Here's that graph - the X axis is the compound number, Y-axis is weight in Daltons:
The three different collections show up very well this way, too. The commercial compounds (almost every one under 500 MW) are on the left. Then you have that break of natural products in the middle, with some real whoppers. And after that, you have the various DOS libraries, which were apparently entered in batches, which makes things convenient.

Notice, for example that block of them standing up around 15,000 - that turns out to be the compounds from this 2004 Schreiber paper, which are a bunch of gigantic spirooxindole derivatives. In this paper, they found that this particular set was an outlier in the academic collection, with a lot more binding promiscuity than the rest of the set (and they went so far as to analyze the set with and without it included). The earlier paper, though, makes the case for these compounds as new probes of cellular pathways, but if they hit across so many proteins at the same time, you have to wonder how such assays can be interpreted. The experiments behind these two papers seem to have been run in the wrong order.

Note, also, that the commercial set includes a lot of small compounds, even many below 250 MW. This is down in the fragment screening range, for sure, and the whole point of looking at compounds of that molecular weight is that you'll always find something that binds to some degree. Downgrading the commercial set for promiscuous binding when you set the cutoffs that low isn't a fair complaint, especially when you consider that the DOS compounds have a much lower proportion of compounds in that range. Run a commercial/natural product/DOS comparison controlled for molecular weight, and we can talk.

I also can't imagine looking over a collection and not checking logP, but that's not in the paper, either. But here you are:
In this case, the natural products (around compound ID 7500) are much less obvious, but you can certainly see the different chemical classes standing out in the DOS set. Note, though, that those compounds explore high-logP regions that the other sets don't really touch.

How about polar surface area? Now the natural products really show their true character - looking over the structures, that's because there are an awful lot of polysaccharide-containing things in there, which will run your PSA up faster than anything:
And again, you can see the different libraries in the DOS set very clearly.

So there are a lot of other ways to distinguish these compounds, ways that (to be frank) are probably much more relevant to their biological activity. Just the molecular-weight one is a deal-breaker for me, I'm afraid. And that's before I start looking at the structures in the three collections at all. Now, that's another story.

I have to say, from my own biased viewpoint, I wouldn't pay money for any of the three collections. The natural product one, as mentioned, goes too high in molecular weight and is too polar for my tastes. I'd consider it for antibiotic drug discovery, but with gritted teeth. The commercial set can't make up its mind if it's a fragment collection or not. There are a bunch of compounds that are too small even for my tastes in fragments - 4-methylpyridine, for example. And there are a lot of ugly functional groups: imines of beta-napthylamine, which should not even get near the front door (unstable fluorescent compounds that break down to a known carcinogen? Return to sender). There are hydroxylamines, peroxides, thioureas, all kinds of things that I would just rather not spend my time on.

And what of the DOS collection? Well, to be fair, not all of it is DOS - there are a few compounds in there that I can't figure out, like isoquinoline, which you can buy from the catalog. But the great majority are indeed diversity-oriented, and (to my mind), diversity-oriented to a fault. The spirooxindole library is probably the worst - you should see the number of aryl rings decorating some of those things; it's like a fever dream - but they're not the only offenders in the "Let's just hang as many big things as we can off this sucker" category. Now, there are some interesting and reasonable DOS compounds in there, too, but there are also more endoperoxides and such. (And yes, I know that there are drug structures with endoperoxides in them, but damned few of them, and art is long while life is short). So no, I wouldn't have bought this set for screening, either; I'd have cherry-picked about 15 or 20% of it.

Summary of this long-winded post? I hate to say it, but I think this paper has its thumb on the scale. I'm just around the corner from the Broad Institute, though, so maybe a rock will come through my window this afternoon. . .

Comments (36) + TrackBacks (0) | Category: Academia (vs. Industry) | Drug Assays | Drug Development | Natural Products


1. Anonymous on November 11, 2010 10:20 AM writes...

DOS is the optimal synthesis method for perturbagen discovery.

Derek is talking about small molecule discovery, which is completely different.

Permalink to Comment

2. ronathan richardson on November 11, 2010 11:10 AM writes...

I remember seeing Schreiber last year claiming that his DOS compounds hit transcription factors well (which nothing else can). The evidence for this is still...lacking.

But remember, using a blog post to provide a rational argument that takes down an article in the PROCEEDINGS OF THE NATIONAL ACADEMY doesn't count according to Royce Murray.

Permalink to Comment

3. Pete on November 11, 2010 11:25 AM writes...

Hi Derek,

Thanks for doing the additional analysis using LogP, Molecular weight and PSA.

Appreciate if you could post the raw data for LogP, Molecular weight and PSA as an excel file.

Thank you

Permalink to Comment

4. JMB on November 11, 2010 11:34 AM writes...

This kind of analysis is a perfect example of why so many of us keep coming back here. As for Murray...what if SS was actually participating in the discussion here? And if the other synthetic profs at Big Name U joined in? What would happen?

I'd argue that Derek holds more sway than all of them on this post and a plenty other topics.

Permalink to Comment

5. anon on November 11, 2010 11:41 AM writes...

Nice. This is just what the science blogosphere needs - tough, smart, totally business-like criticism of papers that appear in high profile journals. Unfortunately, no academic will ever do this in a non-anonymous way because grants and paper acceptances depend on not criticizing your peers. Derek - if you can figure out a way for academics to do this in a reasonable and anonymous way that matters, it would be awesome.

Permalink to Comment

6. ronathan richardson on November 11, 2010 11:41 AM writes...

It appears to me that the analysis that really needs to be done (and maybe somebody with some spare time can) is to take at least a few hundred DOS, Natural product, and commercial compounds that are all between, say, 400 and 450 in MW and see if they still have binding/specificity differences once the size factor is removed, and if, within this mass range, Sp3 carbons and such determine compound activity.

Permalink to Comment

7. exPharma on November 11, 2010 12:20 PM writes...

So, Derek does a nice deconstruction of the library characteristics. That begs the question as to where are the editors and reviewer's of these papers??? AWOL....

Permalink to Comment

8. Anonymous on November 11, 2010 12:27 PM writes...

Bravo! Bravo!

Permalink to Comment

9. lynn on November 11, 2010 12:40 PM writes...

Excellent post, Derek. According to Wikipedia, the PNAS peer review process for contributions by NAS members is that the reviewers are selected by the author/contributor and the review is via open communication between author and reviewers. So, gee, not much of a surprise that these points weren't raised.

Permalink to Comment

10. Anon on November 11, 2010 12:42 PM writes...

Remember this is in PNAS. Members of the national academy (of which Stuart is one) are allowed one free submission (without actual reviews, no questions asked) per year in PNAS. I am not sure if this was the case here but it may justify the limited data.

Permalink to Comment

11. You're Pfizered on November 11, 2010 1:21 PM writes...

I'd look both ways before crossing the street, Derek. His minions may be out for revenge...

Permalink to Comment

12. barry on November 11, 2010 1:49 PM writes...

By focussing on binding events, the Schreiber paper blinds us to the difficult business of getting from "hit" to "drug". How many man-years have been spent trying to carve a drug out of e.g. Staurosporine? LogP, MW and Polar Surface Area are easily calculated flags that (often) can point to problems in getting from "hit" to "drug".
But maybe professor Schreiber is onto something. The marketers who now run Pharma don't have the attention span to do Drug Discovery. Maybe the can be persuaded to put up the money for the much easier/shorter game called "Hit Discovery".

Permalink to Comment

13. retread on November 11, 2010 2:56 PM writes...

My late friend Nick (who edited PNAS for 10 years) must be rolling in his grave at all the snidery about PNAS. He did great work on the topoisomerases.

So a question for the readership: Has it always been this way with PNAS (or is this a new turn of events since Nick passed in 2006)? Most of the papers I've read there seem pretty solid.

I had to read the medical literature the way Derek read this one (back when I had to do it when in practice). For two horrible examples see

Permalink to Comment

14. Chemoptoplex on November 11, 2010 3:35 PM writes...

Thanks, Derek, for this post. I'm going to get off the direct topic, but I would like to say this has brought to the surface some thoughts I've had rattling around the 'ol peanut about just how big of a deal the internet is for the presentation of scientific discovery. This paper was put online in October. Before November is up you've got a decent criticism of it presented in a forum where knowledgeable individuals are reading and commenting upon it.

Could you have sent this off to a journal and got it published? If so, then in such a short period of time? What does this mean for the journals themselves? Finally, do you think there could ever be a world where research groups essentially blog their results and let the public have at them as a form of peer review? The system might be a nightmare to manage but I feel it would be nice to have comment sections on papers where readers could give their thoughts and inputs. Unfortunately, to get this, someone is going to have to stick their neck way out there.

Sorry for the long post filled with questions.

Permalink to Comment

15. ronathan richardson on November 11, 2010 4:54 PM writes...

Off topic, but the famous "reactome" paper that this site contributed to "questioning", in the least, was retracted from science today, without much of an admission of guilt from the authors.

Permalink to Comment

16. Aspirin on November 11, 2010 5:17 PM writes...

I remember a recent Schreiber talk which was full of Michael acceptors as potential druglike compounds. Stuart needs to brush up on some basic chemistry.

Permalink to Comment

17. Luke on November 11, 2010 5:34 PM writes...

And he's bald!

Permalink to Comment

18. Jose on November 11, 2010 6:43 PM writes...

I wonder- is more a matter of the reviewers assuming that such a dead-basic analysis was done *before* convoluted binding metrics entered the picture?

Permalink to Comment

19. Anon on November 11, 2010 7:46 PM writes...

Good analysis and discussion. I also wonder How much structural bias there is in the datasets and how much the choice of protein targets affects the outcome. It is really impossible to conclude anything from such studies because it is impossible to define structural diversity and to uncouple this from bulk properties. In a world where there are between 10 to the 60 and 10 to the 200 possible drug like structures it is hard to see how a set of "dos" libraries of 15000 compounds could be diverse especially when many of them are not druglike.

Permalink to Comment

20. barry on November 11, 2010 8:19 PM writes...

re: #19
If the set didn't include non-druglike cmpds, it would fail most definitions of "diverse". We mustn't preclude cmpds from our screening sets just because they aren't just like things that have gone to market in the past.

Permalink to Comment

21. dudeinpharma on November 11, 2010 11:28 PM writes...

What a stupid self-serving paper. I think they need to pass a law that professors can't come up with fancy names for obvious concepts. Do people read this crap and think it has any real meaning?

Permalink to Comment

22. Spiny Norman on November 11, 2010 11:44 PM writes...

Retread asks: "Has it always been this way with PNAS (or is this a new turn of events since Nick passed in 2006)?"

No. It used to be much, much worse.

NAS members were allowed more submissions, and they were allowed to submit papers for friends/former students, for which they bore no real responsibility. Nick C. began the clampdown on these practices, and Schekman has continued with it. The quality is (at least in my field) steadily increasing. Again, speaking only for my field, the average PNAS paper is considerably better than the average Science or Nature paper.

Permalink to Comment

23. weirdo on November 12, 2010 1:23 AM writes...

"Do people read this crap and think it has any real meaning?"

Yes, yes, they do. We call them the "Nobel Committee".

Permalink to Comment

24. Anonymous on November 12, 2010 1:48 AM writes...

"Do people read this crap and think it has any real meaning?"

Mumbo jumbo lipstick on the combichem pig.

Permalink to Comment

25. Jose on November 12, 2010 1:48 AM writes...

"Do people read this crap and think it has any real meaning?"

Mumbo jumbo lipstick on the combichem pig.

Permalink to Comment

26. Dodo on November 12, 2010 2:15 AM writes...

Is anyone surprised by this analysis? Great Job Derek. I'm sure we could do this for all of his nonsense papers.

Schreiber is worse than any slimy used car salesman. He is selling the world horse poop and because of his ivy league home people give him a stage and an audience.

How does he keep getting endless NIH and donation $$ when every single level headed scientist has come to the same conclusion about him and his brand of pseudoscience?

Permalink to Comment

27. Wagonwheel on November 12, 2010 3:55 AM writes...

The point here is that any analysis and comparison of datasets should start with a rigorous discussions of their calculated physicochemical properties. Binding and particularly promiscuity are known to be affected by MW and logP. If the argument is that DOS allows access to more diverse and biologically relevant structures then the case should be made fairly with compounds of similar phys-chem property distributions.
The in-silico world has been grappling with this for many years and now standard datasets such as DUD have been adopted. Anyone using in-house or non-standard datasets needs to at least describe their properties in detail.

Permalink to Comment

28. Chris on November 12, 2010 5:22 AM writes...

"so I downloaded the compound set from the paper's supporting information". Derek's post would not have been possible without access to the compound sdf file. So I give credit to PNAS and those other journals like J Med Chem that have good supporting information. Not every journal does this. I think this also say something about the future of Journal articles. This type of analysis only works when you have access to the web version.The hard copy doesn't cut it.

Permalink to Comment

29. KC Nicolaou on November 12, 2010 8:54 AM writes...

There's a bit of the 'circle of life' with someone at Vertex publicly questioning Schreiber's work.

Joshua Boger is chuckling somewhere.

Permalink to Comment

30. J Boger on November 12, 2010 1:28 PM writes...

Yes I'm laughing my ass off

Permalink to Comment

31. A postdoc on November 13, 2010 1:07 AM writes...

PNAS has a letters option that allows comments on a paper to be written. I highly suggest you submit this analysis. The letters can (and usually have to) be responded to by the original authors. It is good for science for you to submit this.

Permalink to Comment

32. GSDeK on November 13, 2010 6:19 AM writes...

Have to agree with all the comments on the pure ugliness of the DOS cpds (and the others for that matter) but to be fair to Stu, if you use the cpds to probe biology only (notwithstanding the promiscuity associated with some of the DOS stuff) then perhaps they have a place as academic tools.
The real problem comes here if you try to use a biological tool as a start point to do drug discovery with - that just doesn't work and you'll just waste a lot of time.
As an aside, there are plenty of commercial libraries out there which actually start to look pretty decent these days (just look over some Russian & Ukranian vendors) so there is no need to focus on the flat earth rubbish highlighted here any more.

Permalink to Comment

33. Alchemist on November 16, 2010 12:44 PM writes...

The fascinating thing is not that Schreiber always claims to have it right, but his definition of right keeps shifting.
Back in the old days, bead-based screening of gazillion peptides was his big thing, and he was going to find all these great compounds faster than industry... Then he realized peptide space was limited and it was gazillions of synthetic organics. Folk like Novartis certainly funded him handsomely but it's unclear if that phase ever produced a significant compound. Then, it was DOS although I never understood how this was different than what all good combinatorial chemists were doing anyway. Then, pairwise library synthesis, again another concept that was obvious and already practiced by others. Again we wonder how many interesting new compounds have emerged from the Broad after all these years. Excellent analysis by Derek and it shows up the dangers of overinterpreting large database collections based on just one or two parameters that give you the answer you want, and how poor academics are at understanding druglike space.
Only point I'd disagree with is Derek's dislike of natural products. You can crudely divide chemical space into four quadrants by MW and log P. The low log P/ low MW is what we aim for in medchem and in fact many NPs fit this space too. In addition, nature is very good at occupying the low log P/high MW quadrant that is poorly populated by synthetic compounds.

Permalink to Comment

34. Ben Gadoua on November 17, 2010 9:39 AM writes...

I haven't worked in a lab or read papers in awhile, but this seems to fall into the class of papers that I came across a lot in our molecular cardiac biology lab. They talked a lot about how this subset of cells or that subset of cells did something with a specific transcription factor, whoopty crap, when you inject an animal model with cells, you aren't just looking at a transcription factor, you're looking at a lot of them. I realize that has absolutely nothing to do with this particular paper, but a lot of the papers that I read were from PNAS, they just had some useless information.

Permalink to Comment

35. Anonymous on November 19, 2010 1:49 PM writes...

I don't have time to read the full blog and all of the comments but to this point:

"And what of the DOS collection? Well, to be fair, not all of it is DOS - there are a few compounds in there that I can't figure out, like isoquinoline, which you can buy from the catalog. But the great majority are indeed diversity-oriented, and (to my mind), diversity-oriented to a fault. The spirooxindole library is probably the worst - you should see the number of aryl rings decorating some of those things; it's like a fever dream - but they're not the only offenders in the "Let's just hang as many big things as we can off this sucker" category. Now, there are some interesting and reasonable DOS compounds in there, too, but there are also more endoperoxides and such. (And yes, I know that there are drug structures with endoperoxides in them, but damned few of them, and art is long while life is short). So no, I wouldn't have bought this set for screening, either; I'd have cherry-picked about 15 or 20% of it."

I agree. The spirooxindole library is the worst. The majority of the 6000 "DOS" compounds analyzed in this paper are no longer in the Broad Screening collection. It is an old set and not representative of the current collection. We apply all of the filters used by most pharmaceuticals companies.

DOS compounds are not inherently greasy and high MW. It's depends on how you design the library. You can control for these things. And that's all I have time to contribute on this topic...

Permalink to Comment

36. Tyrosine on November 22, 2010 9:37 PM writes...

The conclusions drawn out of the data by the authors this paper seem like a clear case of confirmation bias. There is always a tendency for such problems in scientific research and the bigger the ego the worse the problem. Regardless of the outcome of comparing these libraries, cognitive dissonance by Schreiber would have made it impossible for DOS to not come out on top.

Side note... Derek wrote: "There are hydroxylamines, peroxides, thioureas, all kinds of things that I would just rather not spend my time on."

I agree up to a point, but I worry that we ignore these things through dogma (or, if you prefer, "experience") and perhaps bias our discovery tools in other directions. I long assumed that peroxides were too reactive to ever be considered as viable drug molecules. But peroxides turn up in natural products all over the place (e.g. Juvenile hormones). So are they really so unstable that they could never be viable drugs or is that a confirmation bias of our own?

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry