About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Another Recipe: Cornbread | Main | Overselling p53 Drugs »

January 2, 2013

How Many Good Screening Compounds Are There?

Email This Entry

Posted by Derek

So, how many good screening compounds are there to be had? We can now start to argue about the definition of "good"; that's the traditional next step in this process. But there's a new paper from Australia's Jonathan Baell on this very question that's worth a look.

He and his co-workers have already called attention to the number of compounds with possibly problematic functional groups for high-throughput screening. In this paper, he also quantifies the way that commercial compound collections tend to go wild on certain scaffolds - giving you, say, three hundred of one series and one of another. One does not mind diagnosing synthetic ease as the reason for this. And it's not always bad - if you get a hit from the series, then you have an SAR collection ready to go in the follow-up. But you wouldn't necessarily want all of them in there for the first go-round.

But there are many other criteria, and as anyone who's done the exercise can appreciate, large lists of compounds tend to be cut down to size rapidly. The paper shows this in action with a commercial set of 400,000 compounds. Apply some not-too-stringent criteria (between 1 and 4 rings, molecular weights between 150 and 450, cLogP less than 6, no more than 5 hydrogen bond donors and no more than 8 acceptors, up to three chiral centers, and up to 12 rotatable bonds), and you're down to 250K compounds right there. Clear out some functional groups and the PAINS list, and you're down to 170K. Want to cut the molecular weight down to 400, and rotatable bonds down to 10? 130,000 compounds remain. cLogP only up to 5, donors down to 3 or fewer, acceptors down to 6 or fewer? 110,000.

At this point, the paper says, further inspection of the list led to the realization that there were still a lot of problematic functional groups present. (I had a similar experience myself recently, filtered down a less humungous list. Even after several rounds, I was surprised to find, on looking more closely, how many oximes, hydrazones, Schiff bases, hydrazines, and N-hydroxyls were left). In Baell's case, clearing out the not-so-great at this point cut things down to 50,000 compounds. Then a Tanimoto cutoff (to get rid of things that were at least 90% similar to the existing screening compounds) cleared out all but 10,000. Applying the same cutoff, but getting rid of compounds on the list that were more than 90% similar to each other, reduced it to 6,000. So, in other words, one could make a good case for getting rid of over 98% of the vendor's list for high-throughput screening purposes. Similar results were obtained for many other commercial sets of compounds; the paper has the exact numbers (although not, alas, the vendor names involved!)

There were other vendor considerations as well. By the time Baell and his group had gone through all this compound-crunching and placed orders, significant numbers of compounds turned out to be unavailable. (I'm willing to bet that quite a few of them would have turned out to be unavailable even if they'd placed their orders that afternoon, but I'm of a cynical bent). That catalog turnover also brings up the problem of being able to re-order compounds if they turn out to be hits:

. . .there were only two vendors whose resupply philosophy we considered to be sound, this philosophy being that around 40 mg stock was set aside and accessible exclusively to prior buyers of that compound for the purposes of resupply of ca. 1−2 mg for secondary assay of a confirmed screening hit. We believe this issue of resupply is in urgent need of attention by vendors and will provide a competitive edge to those vendors willing to better guarantee resupply.

By the time they'd surveyed the various large-scale compound vendors, the group had looked over the majority of commercially available screening compounds. Given the attrition rates, how many actual compounds would cover the world's purchasable chemical space? The best guess is about 340,000, of the many millions of potentially purchasable items.

Of course, all these numbers are subject to dispute - you may not agree with some of the functional group or property cutoffs, or you might want things cut down even more. The paper addresses this question, and the general one of why any particular compound should be in a screening collection at all. My own criterion is "Would I be willing to follow up on this compound if it were a hit?" But different chemists, as has been proven many times, will answer such questions in different ways.

A big part of the discussion are those Tanimoto similarity scores, and the paper has a good deal to say about that. You wouldn't want to cut everything down to just singleton compounds (most likely), but you also don't need to have dozens and dozens of para-chloro/para-fluoro methyl-ethyl analogs in each series, either. The best guess is that most vendor catalogs are still rather unbalanced: they have far too many analogs for some compound classes, but too few for many more. Singleton compounds represent most of the chemical diversity for many collections, but you could make the case that there shouldn't be any singletons, ideally. Even two or three representatives from each structural class would be a real improvement. A vendor collection of 400,000 compounds that consisted of 40,000 fairly distinct structures with ten members of each class would be something to see - but no one's ever seen such a thing.

This new paper, by the way, is full of references to the screening-collection literature, as well as discussing many of the issues itself. I recommend it to anyone thinking about these issues; there are a lot of things that you don't want to have to rediscover!

Comments (17) + TrackBacks (0) | Category: Drug Assays


1. ngr on January 2, 2013 2:05 PM writes...

Having done this exercise myself a few times I'd agree. Maybe this is why the reported overlap between the Bayer and AZ collections is so low and in the order of a few hundred thousand compounds - there ain't that many realy available quality commercial compounds

Permalink to Comment

2. TX Raven on January 2, 2013 6:42 PM writes...

Interesting analysis...
I just remain skeptic about how broad the novel target coverage is with a 100K compound library.

Based on my experience, I would say this is too low number.

Permalink to Comment

3. JB on January 2, 2013 10:27 PM writes...

Nice piece on our paper Derek. As you note, we didn't list the names of vendors A-J, though we list the pool of vendors looked at in the footnotes of Table 4, without specifically assigning which was which. Actually, originally we didn't even list any vendor names at all, but finally included these in response to referee requests, but declining to specify who was whom. This is because we felt that some degree of vendor protection (from potential misinterpretations of our data by potential compound purchasers) was appropriate.

Permalink to Comment

4. Chris Swain on January 3, 2013 3:14 AM writes...

JB I liked the sentence below ;-)

" We construct our arguments in a structurally focused manner to be most useful to medicinal chemists, the key players in drug discovery"

Permalink to Comment

5. JB on January 3, 2013 4:43 AM writes...

Thanks #4 Chris Swain. I am guilty of the occasional educational message. It concerns me a little that drug candidates may be too strongly associated with their distant origin, e.g. the likes of Vemurafenib and Navitoclax as examples of successful FBDD whereas perhaps medicinal chemistry excellence should also be equally accredited. Same goes with numerous HTS-originating hits of course. This can become an issue if reviewers in the academic granting system don't appreciate the extent of importance of good medchem to turn relatively worthless screening hits into something useful. I regard myself first and foremost as a medicinal chemist (mostly in a public system) so admittedly have an interest in conveying this message - but I think it is important to do so.

Permalink to Comment

6. Anonymous on January 3, 2013 7:04 AM writes...

I may have missed something, but is there any more precise descrption on how compound similarity was measured for this particular study? Was it a classic "Daylight fingerprint Tanimoto" or some other fingerprint? Does it even matter, which Fingerprint is used?

Permalink to Comment

7. newnickname on January 3, 2013 7:44 AM writes...

Q to Capablanca: How many moves do look ahead (in chess)?

A: I only look ahead ONE move. But it is always the BEST move.

(Same story attributed to Steinetz, Tarrasch and others.)

Q: How many compounds do you have to test to find a drug?
A: A lot!

Permalink to Comment

8. oldhand on January 3, 2013 9:15 AM writes...

We went through this same exercise of finding compounds to add to the in house collection for many years, with pretty much the same set of criteria. An important criteria up front was to only look at compounds for which there was 40 mgs available, even though we only ordered 10 mgs-we wanted compounds we could reorder if needed. Later in the process we got into ordering collections of compounds that were focused on meeting specific phenotypic goals-e.g kinase inhibition-by applying some general 2D and 3D constraints based upon expected (suspected) interactions in the ATP binding site. It was modestly successful.
This last effort reminds me of a talk given by a major company about their screening efforts. At that time (ca. 10 years ago) they, like everyone else were screening everything they could get their hands on. They concluded that this exercise was too expensive and decided to do focused libraries. The talk was about the criteria to for the inclusion in the focused library. At the end, I asked if the criteria would have encompassed their current phase 2 clinical compound-it would not. The lesson for me was to not be too clever in your screening selection-you might miss something.

Permalink to Comment

9. Helical_Investor on January 3, 2013 11:38 AM writes...

I get a chuckle out of the resupply quote. If you don't want to carry an inventory, don't expect anyone else to either. If you want a supply set aside, pay for it.

Permalink to Comment

10. TX Raven on January 3, 2013 12:57 PM writes...

@ #4 and JB:

I am looking forward to the day that comment will be made by someone other than a medicinal chemist.

Permalink to Comment

11. TX Raven on January 3, 2013 1:06 PM writes...

@ #6,

I also kept thinking about the similarity criterion used...

Does the Tanimoto cutoff set mean only one representative compound per chemotype will be present in the whole library?

Permalink to Comment

12. JB on January 3, 2013 7:24 PM writes...

@ #2. Agreed. Our combined stage 3-6 libraries total almost 300K. My feeling is 500K would be a good upper limit if drawn from available compounds only.

@ #10. While we should never underestimate the (inreasing) value of other components (e.g. target validation biology etc etc), yes, I do think that in a relative sense, the importance of medchem is underestimated by those outside this field.

@ #11. Read the paper! It's all in there. (e.g. t=0.9 cutoff - Unity fingerprints - can still allow say 60 analogs in if the class is highly populated in the first place)

Permalink to Comment

13. Anonymous on January 3, 2013 7:49 PM writes...

I'm new to this. What's the problem with oximes, hydrazones, Schiff bases, hydrazines, and N-hydroxyls??

Permalink to Comment

14. Anonymous on January 3, 2013 7:50 PM writes...

I'm new to this. What's the problem with oximes, hydrazones, Schiff bases, hydrazines, and N-hydroxyls??

Permalink to Comment

15. Anonymous on January 3, 2013 7:50 PM writes...

I'm new to this. What's the problem with oximes, hydrazones, Schiff bases, hydrazines, and N-hydroxyls?? Is it just general stability?

Permalink to Comment

16. Anonymous on January 3, 2013 7:51 PM writes...

general stability?

Permalink to Comment

17. JB on January 4, 2013 4:24 AM writes...

To anonymous.....this gets to the heart of subjectivity and the difficulties in this area and how to judge. Oximes can be metabolically unstable (and indeed, in moxidectin and others, I recall are used as a pro-drug of a ketone). Hydrazones, too, can be unstable, but particularly as acylhydrazones (used as cleavable linkers in antibody conjugates), while hydrazines and N-hydroxyls are associated with metabolic toxicity. But then some would say in a screening hit, some of these aspects are fine as long as you medchem out the problems...if they are problems....early on. The main question is: is the chemistry sound enough that the hit is real? If so, include in your library but progress with eyes wide open. Read my paper but especially those that it cites on this matter and more will be answered than can be done here. It's all about minimizing risk in the starting point. For the record, I am comfortable with starting with oximes and (non acyl) hydrazones but cautiously...and data published in future pharma literature may show me to be not cautious enough for these two classes but much of the reasoning for this has yet to see the light of day.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry