About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« RUR | Main | The Further In You Go, The Bigger It Gets »

July 15, 2009

Why Does Screening Work At All? (Free Business Proposal Included!)

Email This Entry

Posted by Derek

I've been meaning to get around to a very interesting paper from the Shoichet group that came out a month or so ago in Nature Chemical Biology. Today's the day! It examines the content of screening libraries and compares them to what natural products generally look like, and they turn up some surprising things along the way. The main question they're trying to answer is: given the huge numbers of possible compounds, and the relatively tiny fraction of those we can screen, why does high-throughput screening even work at all?

The first data set they consider is the Generated Database (GDB), a calculated set of all the reasonable structures with 11 or fewer nonhydrogen atoms, which grew out of this work. Neglecting stereochemistry, that gives you between 26 and 27 million compounds. Once you're past the assumptions of the enumeration (which certainly seem defensible - no multiheteroatom single-bond chains, no gem-diols, no acid chlorides, etc.), then there are no human bias involved: that's the list.

The second list is everything from the Dictionary of Natural Products and all the metabolites and natural products from the Kyoto Encyclopedia of Genes and Genomes. That gives you 140,000+ compounds. And the final list is the ZINC database of over 9 million commercially available compounds, which (as they point out) is a pretty good proxy for a lot of screening collections as well.

One rather disturbing statistic comes out early when you start looking at overlaps between these data sets. For example, how many of the possible GDB structures are commercially available? The answer: 25,810 of them - in other words, you can only buy fewer than 0.01% of the possible compounds with 11 heavy atoms or below, making the "purchasable GDB" a paltry list indeed.

Now, what happens when you compare that list of natural products to these other data sets? Well, for one thing, the purchasable part of the GDB turns out to be much more similar to the natural product list than the full set. Everything in the GDB has at least 20% Tanimoto similarity to at least one compound in the natural products set, not that 20% means much of anything in that scoring system. But only 1% of the GDB has a 40% Tanimoto similarity, and less than 0.005% has an 80% Tanimoto similarity. That's a pretty steep dropoff!

But the "purchasable GDB" holds up much better. 10% of that list has 100% Tanimoto similarity (that is, 10% of the purchasable compounds are natural products themselves). The authors also compare individual commercial screening collections. If you're interested, ChemBridge and Asinex are the least natural-product-rich (about 5% of their collections), whereas IBS and Otava are the most (about 10%).

So one answer to "why does HTS ever work for anything" is that compound collections seem to be biased toward natural-product type structures, which we can reasonably assume have generally evolved to have some sort of biological activity. It would be most interesting to see the results of such an analysis run from inside several drug companies against their own compound collections. My guess is that the natural product similarities would be even higher than the "purchasable GDB" set's, because drug company collections have been deliberately stocked with structural series that have shown activity in one project or another.

That's certainly looking at things from a different perspective, because you can also hear a lot of talk about how our compound files are too ugly - too flat, too hydrophobic, not natural-product-like enough. These viewpoints aren't contradictory, though - if Shoichet is right, then improving those similarities would indeed lead to higher hit rates. Compared to everything else, we're already at the top of the similarity list, but in absolute terms there's still a lot of room for improvement.

So how would one go about changing this, assuming that one buys into this set of assumptions? The authors have searched through the various databases for ring structures, taking those as a good proxy for structural scaffolds. As it turns out 83% of the ring scaffolds among the natural products are unrepresented among the commercially available molecules - a result that I assume that Asinex, ChemBridge, Life Chemicals, Otava, Bionet and their ilk are noting with great interest. In fact, the authors go even further in pointing out opportunities, with a table of rings from this group that closely resemble known drug-like ring systems.

But wait a minute. . .when you look at those scaffolds, a number of them turn out to be rather, well, homely. I'd be worried about elimination to form a Michael acceptor in compound 19, for example. I'm not crazy about the N,S acetal in 21 or the overall stability of the acetals in 15, 17 and 31. The propiolactone in 23 is surely reactive, as is the quinone in 25, and I'd be very surprised if that's not what they owe their biological activities to. And so on.
All that said, there are still some structures in there that I'd be willing to check out, and there must be more of them in that 83%. No doubt a number of the rings that do sneak into the commercial list are not very well elaborated, either. I think that there is a real commercial opportunity here. A company could do quite well for itself by promoting its compound collection as being more natural-product similar than the competition, with tractable molecules, and a huge number of them unrepresented in any other catalog.

Now all you'd have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren't easy to make, or to work with. And as it so happens, there are quite a few good ones available these days. Anyone want to take this business model to heart?

Comments (13) + TrackBacks (0) | Category: Drug Assays | Drug Industry History | In Silico


1. molecular architect on July 15, 2009 9:44 AM writes...

"which we can reasonably assume have generally evolved to have some sort of biological activity"

The real value of natural products as biologically active leads is due to a more fundamental property. There are a limited number of protein structural motifs (secondary structures). Natural products have evolved to BIND to these motifs, either in the proteins involved in their biosynthesis or in their biological targets. A NP which binds to one of these motifs represents a logical starting point for another protein which shares the motif, even if not part of the same enzyme class. For an excellent analysis of this property of natural products see the recent series of papers about "Biology Oriented Synthesis" by Herbert Waldmann. doi 10.1007/s00018-007-7492-1 and references therin.

Based on your comments, Soichet's analysis looks like a very interesting analysis. Will have to set time aside to read it in detail this afternoon.

Permalink to Comment

2. Retread on July 15, 2009 10:22 AM writes...

#1 "There are a limited number of protein structural motifs (secondary structures). Natural products have evolved to BIND to these motifs, either in the proteins involved in their biosynthesis or in their biological targets." True enough as far as it goes, but this is pretty protein-centric. Consider Thiamine, B12 etc. etc. Either they've evolved to bind to RNA (which they do in bacterial riboswitches) or RNA has evolved to bind them -- more likely, since both are enzyme cofactors. It is possible that some natural products have evolved to bind RNA (of all sorts not just mRNAs), DNA or even the glycoproteins and mucopolysaccharides of the extracellular matrix.

P.S. hope to have my own blog -- probably called Chemiotics-II up soon

Permalink to Comment

3. EngelGW on July 15, 2009 11:44 AM writes...

Well... As you mention "These things aren't easy to make, or to work with..." That already two major drawbacks for a medicinal chemist in the pharmaceutical industry. If, in addition, the IP isn't owned by the pharmaceutical company who employ him, it will definitely be difficult to find customers for such a business model.

Permalink to Comment

4. Rubiscoman on July 15, 2009 11:44 AM writes...

I like the double meaning:

"Now all you'd have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren't easy to make, or to work with. And as it so happens, there are quite a few good ones available these days."

Are you saying synthetic organic chemists are hard to work with, and that quite a few good synthetic chemists are available these days ;-)

Permalink to Comment

5. Sili on July 15, 2009 2:12 PM writes...

I don't know that these 'issues' would have presented themselves to me so readily when I was fresh out of organics class, but now I even have to think about what you mean by your evaluations. Disturbing how much can seep out of a brain in a few years.

Permalink to Comment

6. molecular architect on July 15, 2009 2:15 PM writes...

#2 Point taken. My comment is protein-centric but then all (to the best of my knowledge) NPs are the product of protein-catalyzed biosynthesis and thus are designed to bind the biosynthetic enzymes. They then are predisposed to bind other enzymes composed of similar 3D motifs. Likewise, enzymes bind to other macromolecules (DNA, RNA, ploysaccharides, etc.). Thus, you could predict that NPs will likely bind to complementary motifs in these macromolecules too.

While the ability of modern medicinal and synthetic chemists to design and make molecules is impressive, Mother Nature is still an outstanding, if not the best, source of inspiration.

Permalink to Comment

7. NP_chemist on July 15, 2009 5:59 PM writes...

If in addition to Waldmann's BIOS analyses one looks also at Quinn's biosynthetic schemas, that effectively state that the mirror image of the last biosynthetic enzyme in the cascade is a proxy for the binding domain of the biosynthesis product (read target enzyme), then the circle is closed for NP structures and hence one potential reason for their activities.

Permalink to Comment

8. bootsy on July 15, 2009 7:33 PM writes...

"Natural product likeness" is something that seems to keep popping up as a major topic every few years. I admit, I tend to get a bit turned off when someone says that just because it has lots of sp3 carbons, stereodefined alcohols, and complicated rings, that it is "natural product like". Natural products, being products of incomprehensibly long periods of optimization, are much more refined than that. A few minor changes and all of the special properties that let them be large and still bioavailable are gone. Move a single methyl group on cyclosporin and now it's NIM811 and doesn't act on the immune system anymore. Change one more methyl group and it's PSC833 and it doesn't touch anything but PGP pumps.

Also, it seems like a bad idea to make such specific molecules and ask them to be hits in an HTS. As this paper shows, any screening deck is a paltry amount of diversity, however you measure it. If you wants hits, you need some molecules that are more general inhibitors but make good starting points for building in potency and specificity. In this regard, the rising tide of fragment based screening seems a lot better way to hedge one's bets.

Finally, it also seems that when a protein encounters a molecule, the core atom connectivity is not something that matters much. Rather it is the shape of the surface and the relative distribution of charges and such that the protein (or RNA, or DNA) sees. I'm not sure how much the line drawings we use to represent what a molecule is matter. That's why scaffold hopping can work at all.

Still, I like reading papers like this for the thoughts and discussions they bring out. The overall idea sounds a bit like Infinity Redux though.

Permalink to Comment

9. Morten G on July 16, 2009 3:03 AM writes...

There's a company,, that does natural product chemistry but I don't think they employ that many synthetic chemists. I guess they use chemists for product purification and QC.
The idea is that they mix up gene cassettes from various organisms that produce something along the lines of what they want and put them in yeast. Then they select for the yeast that produce the products that inhibit their target best.
I think it's a pretty small company but at least they are hiring.

Whether natural-like or non-natural like compounds are best... Well, intuitively I'd say that the natural-like are more likely to bind proteins but on the other hand I've never seen any data to support that hypothesis.

Permalink to Comment

10. retread on July 16, 2009 7:19 AM writes...

#1 && #6 -- Interesting way to look at natural products and what their analogues might bind to. One example of this sort of mechanism would be anti-idiotypic antibodies (if you regard proteins as natural products).

Forcing the idea to where it probably doesn't belong -- one might expect lectins (proteins which bind the sugar components of glycoproteins) to resemble the various glycosyl transferases, sulfotransferases etc. etc. which build and modify the sugar chains -- I don't know if they do.

Similarly, do the active sites of enzymes making the huge variety of neurotoxins which bind to the transmembrane segments of ion channels resemble these segments -- particularly if the enzymes are cytosolic? Again, I don't know but I doubt that they do.

Nonetheless, an interesting idea, and like all such, it makes you think and try to come up with ways to test it.

Permalink to Comment

11. Curious Wavefunction on July 16, 2009 10:10 AM writes...

I blogged about this paper a couple of weeks ago here

Permalink to Comment

12. kerri on July 19, 2009 11:40 AM writes...

I just wanted to thank you for this post! I am a 4th year grad student and literally the day you posted this, my PI asked if I would take this compound our group works with and use it as a template for screening in silico and then take the results (well as many of them as we can get our hands on) and do a cell study. I had no where to start learning about how to do such a task.... and you just gave me the best starting point ever!


Permalink to Comment

13. Jane Yao on July 18, 2012 1:37 PM writes...

Screen our Newly Isolated compound library to generate new drug leads.

Please take a look at our unique sample library containing low hanging fruits, and consider screening it in your next drug lead discovery.

We ( provide over 12,000 non-commercially available compounds and fractions obtained by column separation of worldwide chemically untapped natural products.


Health Resource Pharmaceuticals LLC

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry