I've been reading an interesting new paper from Stuart Schreiber's research group(s) in PNAS. But I'm not sure if the authors and I would agree on the reasons that it's interesting.
This is another in the series that Schreiber has been writing on high-throughput screening and diversity-oriented synthesis (DOS). As mentioned here before, I have trouble getting my head around the whole DOS concept, so perhaps that's the root of my problems with this latest paper. In many ways, it's a companion to one that was published earlier this year in JACS. In that paper, he made the case that natural products aren't quite the right fit for drug screening, which fit with an earlier paper that made a similar claim for small-molecule collections. Natural products, the JACS paper said, were too optimized by evolution to hit targets that we don't want, while small molecules are too simple to hit a lot of the targets that we do. Now comes the latest pitch.
In this PNAS paper, Schreiber's crew takes three compound collections: 6,152 small commercial molecules, 2,477 natural products, and 6,623 from academic synthetic chemistry (with a preponderance of DOS compounds), for a total of 15, 252. They run all of these past a set of 100 proteins using their small-molecule microarray screening method, and look for trends in coverage and specificity. What they found, after getting rid of various artifacts, was that about 3400 compounds hit at least one protein (and if you're screening 100 proteins, that's a perfectly reasonable result). But, naturally, these hits weren't distributed evenly among the three compound collections. 26% of the academic compounds were hits, and 23% of the commercial set, but only 13% of the natural products.
Looking at specificity, it appears that the commercial compounds were more likely, when they hit, to hit six or more different proteins in the set, and the natural products the least. Looking at it in terms of compounds that hit only one or two targets gave a similar distribution - in each case, the DOS compounds were intermediate, and that turns out to be a theme of the whole paper. They analyzed the three compound collections for structural features, specifically their stereochemical complexity (chiral carbons as a per cent of all carbons) and shape complexity (sp3 carbons as a percent of the whole). And that showed that the commercial set was biased towards the flat, achiral side of things, while the natural products were the other way around, tilted toward the complex, multiple-chiral-center end. The DOS-centric screening set was right in the middle.
The take-home, then, is similar to the other papers mentioned above: small molecule collections are inadequate, natural product collections are inadequate: therefore, you need diversity-oriented synthesis compounds, which are just right. I'll let Schreiber sum up his own case:
. . .Both protein-binding frequencies and selectivities are increased among compounds having: (i) increased content of sp3-hybridized atoms relative to commercial compounds, and (ii) intermediate frequency of stereogenic elements relative to commercial (low frequency) and natural (high frequency) compounds. Encouragingly, these favorable structural features are increasingly accessible using modern advances in the methods of organic synthesis and commonly targeted by academic organic chemists as judged by the compounds used in this study that were contributed by members of this community. On the other hand, these features are notably deficient in members of compound collections currently widely used in probe- and drug-discovery efforts.
But something struck me while reading all this. The two metrics used to characterize these compound collections are fine, but they're also two that would be expected to distinguish them thoroughly - after all, natural products do indeed have a lot of chiral carbons, and run-of-the-mill commercial screening sets do indeed have a lot of aryl rings in them. There were several other properties that weren't mentioned at all, so I downloaded the compound set from the paper's supporting information and ran it through some in-house software that we use to break down such things.
I can't imagine, for example, evaluating a compound collection without taking a look at the molecular weights. Here's that graph - the X axis is the compound number, Y-axis is weight in Daltons:
The three different collections show up very well this way, too. The commercial compounds (almost every one under 500 MW) are on the left. Then you have that break of natural products in the middle, with some real whoppers. And after that, you have the various DOS libraries, which were apparently entered in batches, which makes things convenient.
Notice, for example that block of them standing up around 15,000 - that turns out to be the compounds from this 2004 Schreiber paper, which are a bunch of gigantic spirooxindole derivatives. In this paper, they found that this particular set was an outlier in the academic collection, with a lot more binding promiscuity than the rest of the set (and they went so far as to analyze the set with and without it included). The earlier paper, though, makes the case for these compounds as new probes of cellular pathways, but if they hit across so many proteins at the same time, you have to wonder how such assays can be interpreted. The experiments behind these two papers seem to have been run in the wrong order.
Note, also, that the commercial set includes a lot of small compounds, even many below 250 MW. This is down in the fragment screening range, for sure, and the whole point of looking at compounds of that molecular weight is that you'll always find something that binds to some degree. Downgrading the commercial set for promiscuous binding when you set the cutoffs that low isn't a fair complaint, especially when you consider that the DOS compounds have a much lower proportion of compounds in that range. Run a commercial/natural product/DOS comparison controlled for molecular weight, and we can talk.
I also can't imagine looking over a collection and not checking logP, but that's not in the paper, either. But here you are:
In this case, the natural products (around compound ID 7500) are much less obvious, but you can certainly see the different chemical classes standing out in the DOS set. Note, though, that those compounds explore high-logP regions that the other sets don't really touch.
How about polar surface area? Now the natural products really show their true character - looking over the structures, that's because there are an awful lot of polysaccharide-containing things in there, which will run your PSA up faster than anything:
And again, you can see the different libraries in the DOS set very clearly.
So there are a lot of other ways to distinguish these compounds, ways that (to be frank) are probably much more relevant to their biological activity. Just the molecular-weight one is a deal-breaker for me, I'm afraid. And that's before I start looking at the structures in the three collections at all. Now, that's another story.
I have to say, from my own biased viewpoint, I wouldn't pay money for any of the three collections. The natural product one, as mentioned, goes too high in molecular weight and is too polar for my tastes. I'd consider it for antibiotic drug discovery, but with gritted teeth. The commercial set can't make up its mind if it's a fragment collection or not. There are a bunch of compounds that are too small even for my tastes in fragments - 4-methylpyridine, for example. And there are a lot of ugly functional groups: imines of beta-napthylamine, which should not even get near the front door (unstable fluorescent compounds that break down to a known carcinogen? Return to sender). There are hydroxylamines, peroxides, thioureas, all kinds of things that I would just rather not spend my time on.
And what of the DOS collection? Well, to be fair, not all of it is DOS - there are a few compounds in there that I can't figure out, like isoquinoline, which you can buy from the catalog. But the great majority are indeed diversity-oriented, and (to my mind), diversity-oriented to a fault. The spirooxindole library is probably the worst - you should see the number of aryl rings decorating some of those things; it's like a fever dream - but they're not the only offenders in the "Let's just hang as many big things as we can off this sucker" category. Now, there are some interesting and reasonable DOS compounds in there, too, but there are also more endoperoxides and such. (And yes, I know that there are drug structures with endoperoxides in them, but damned few of them, and art is long while life is short). So no, I wouldn't have bought this set for screening, either; I'd have cherry-picked about 15 or 20% of it.
Summary of this long-winded post? I hate to say it, but I think this paper has its thumb on the scale. I'm just around the corner from the Broad Institute, though, so maybe a rock will come through my window this afternoon. . .