About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Diversity-Oriented Synthesis: Oriented The Right Way? | Main | Nitromed: Someone Wants Them »

January 21, 2009

The Hideous Numbers of Compounds

Email This Entry

Posted by Derek

I was blithely throwing around the term “chemical space” in yesterday’s post. So, what am I talking about, and how much room is in there, anyway?

Let's narrow it down to organic compounds, to start with, or at least compounds that are mostly organic. A working definition, as far as people interested in biology and medicine go, might then be “the domain of chemical compounds compatible with living systems”. That excludes the red-hot reactive stuff and the unstable exploders, but leaves most everything else. Let’s also ignore macromolecules of various kinds and cut back to “drug-like” sizes – say, molecular weight 500 or less. That way we don’t have infinite numbers of polymers going off in all directions; that should help. And that leaves us with. . .?

A ridiculously large set of compounds, still. You can see how things get out of control pretty quickly if you just consider a building-block problem. Imagine breaking compounds down into simple units - an aryl ring, an ether, a tertiary amine, and so on. What sorts of numbers do you get when you start mixing and matching them? Well, there are an awful lot of possible building blocks. You could quickly fill out a hundred different examples of each of those three subunits, so there's one hundred to the third, or a million possible compounds without even exerting yourself very much.

This sort of thought experiment has been done several times. One estimate done by this fragment approach and considering only stable structures came in between 10 to the twentieth and ten to the twenty-fourth compounds that could potentially be prepared using known synthetic methods. (See here for another "how many compounds are possible?" paper, from a different angle - the group that did that work has followed it up recently, which will be the subject of another post sometime). Needless to say, that is considerably larger than the total number of organic compounds ever described in reality. There's not enough carbon, oxygen, and nitrogen on earth to prepare a vial of each of these, and where would you put the vials? The terrifying thing is that this is actually one of the lower estimates, and thus perhaps a very reasonable and conservative one. You can find ten-to-the-sixtieth estimates out there, which is a figure that cannot be dealt with by human efforts.

These sorts of numbers are why some people doubt the utility of just cranking out neat structures. But looked at from the other direction, the number of compounds we have available isn't nearly so impressive, so making new ones, especially long lists of new ones, makes a difference in what we actually have in hand. But is it a difference akin to buying a thousand lottery tickets rather than buying one?

Comments (13) + TrackBacks (0) | Category: Drug Assays


1. Rich Apodaca on January 21, 2009 10:13 AM writes...

Just trying to enumerate all compounds of a given molecular formula leads to large numbers very quickly:

Permalink to Comment

2. Thomas E. McEntee on January 21, 2009 10:35 AM writes...

Last time I looked--10 years ago?--CAS registry numbers had been assigned to 25 million (2.5E7) substances, including polymers, mixtures, and anything biological that CAS will index. That's a long way from the range of 10E20 to 10E24...

Permalink to Comment

3. Tot. Syn. on January 21, 2009 10:41 AM writes...

One of the things to consider is similarity of structures, which is the inverse of the diversity. And then you need to consider the parameters used to define the diversity, which more-or-less defines the axes of the chemical space. This is a tough task, as you need to specify what makes molecule A different to molecule B. And I'd guess that this needs to be done whilst considering the three-dimensional energy-minimised form of the structures, rather than 2D sketches. So does that boil-down to a description of a 3D shape, with certain variables mapped onto it, like polarity? Surely we can make any shape we like, given enough steps and material? No, because other parameters like flexibility are important... This is why I find DOS so hard to get a handle on.

I guess these concepts are particularly important when considering the compound collections owned by big-pharma, as they have to represent a 'therapeutically useful' array of chemical space. So how do you make sure that your library is as diverse as it needs to be, and keep it that diverse over time (as samples are exhausted). Do you remake exhausted samples, or fill it with new ones?

The other term that it hard to get a handle on is 'drug-space'. Sure, you can use Lipinski to define some parameters, but that's going to leave a lot of current drugs outside your fence, like taxotere.

Argh... thinking in more than three dimensions turns my brain inside-out...

Permalink to Comment

4. TW Andrews on January 21, 2009 11:46 AM writes...

Having spent a few years looking at HTS data from a variety of screening labs, put me into the camp that's pretty skeptical of the crank-stuff-out and test it approach.

Chemical space is huge, and the biologically interesting molecules are likely to only be a very small fraction of it. I don't recall the Combi-chem wave resulting in significant uptick in the number of viable leads (though having worked far upstream from the med-chem end of things, this is more an impression than a careful observation).

That said, I'm not sure what the best way to go about finding novel, biologically interesting compounds is. I've always been a fan of natural products, since we at least know that they're somehow relevant to biological systems--but wow they're a pain.

Permalink to Comment

5. anon on January 21, 2009 12:17 PM writes...

My impression of combi chem is that speeds up a project maybe 10%; about the same as outsourcing. From a cost/benefit POV, better than zero, but not impressive.

Permalink to Comment

6. milkshake on January 21, 2009 3:57 PM writes...

Why would you want to exclude primary explosives? Nitroglycerin is a life-saving drug.

Permalink to Comment

7. nitric oxide 99 on January 21, 2009 5:26 PM writes...

Hey Derek - enjoy the blog daily. A question that came up in a group meeting recently - are there any know (or believed) uses for organosilicon compounds in medicinal chemistry? I was hard pressed to think of any and a quick look doesn't turn up very much. Would appreciate your thoughts.

Permalink to Comment

9. PhMe2SiLi on January 21, 2009 6:34 PM writes...

Nitric oxide - i know of one company that specialised in organosilicon compounds as med. chem. targets, Amedis Pharmaceuticals (see below) based in Cambridge UK, they were taken over by Argenta in 2003. I guess the idea of replacing a carbon or two with a silicon could in theory bust a patent?

"About Amedis Pharmaceuticals Ltd

Amedis Pharmaceuticals is a chemistry driven drug discovery company developing innovative pharmaceutical technologies and products. The company has two broad technology platforms:

Exploitation of silicon based medicinal chemistry to develop both improved versions of existing drugs (Silicon Switches) and novel drug candidates (silicon based NCEs)

Artificial Intelligence technology applied to drug discovery to predict the properties of molecules before they are tested and to identify Therapeutic Switch drug candidates (new uses for existing drugs)"

Permalink to Comment

10. Hap on January 21, 2009 6:39 PM writes...

Scott Sieburth (and one other professor) have written papers on silanediols as transition state analogs for proteases. They look like a pain to make, though.

Permalink to Comment

11. Shane on January 21, 2009 9:53 PM writes...

One vial of each compound is an awful lot. There are 10^23 molecules in a mole. How many actual molecules does a bacterium need to make to influence its survival? How many bacteria in an ocean? And how many generations in a few million years?

It seems to me the main problem isnt the scale of chemical diversity, but how crude and clumsy our screening methods are, and how impatient we are to "cure" things like "obesity", "ageing" and "depression" with billion dollar making drugs.

Permalink to Comment

12. sjb on January 22, 2009 6:22 AM writes...

re PhMe2SiLi (#9)

I don't think Amedis were taken over by Argenta, rather by Paradigm and what is now Takeda.


Permalink to Comment

13. nitric oxide 99 on January 22, 2009 10:30 AM writes...

Thanks for the feedback guys/gals!

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry