Corante

About this Author
Derek Lowe
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
MedChem Buzz
Kilomentor
On Pharma
A New Merck, Reviewed
Liberal Arts Chemistry
One in Ten Thousand
Electron Pusher
Periodic Tabloid
All Things Metathesis
C&E News Blog
Propter Doc
Chemiotics II
The Chemical Notebook
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
Chirality
BBSRC/Douglas Kell
ChemBark
Drug Discovery Opinion
Realizations in Biostatistics
Chemjobber
Pharmalot
WSJ Health Blog
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Useful Chemistry
Chiral Jones
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Totally Synthetic
Fragment Literature
The F- Blog
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Carbon-Based Curiosities
Experimental Error
Business|Bytes|Genes|Molecules
Eye on FDA
Sigma-Aldrich ChemBlogs
Chemical Forums
Depth-First
Symyx Blog
P212121
ChemCafe
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
A Scientist's Life
Speculist
Cosmic Variance
The Capsule
Zeroth Order Approximation
Biology News Net


Medical Blogs
Med Tech Sentinel
DB's Medical Rants
Science-Based Medicine
GruntDoc
The Health Care Blog
Respectful Insolence
Black Triangle
Diabetes Mine


Economics and Business
Marginal Revolution
Arnold Kling
The Volokh Conspiracy
Knowledge Problem
The Stalwart


Politics / Current Events
Virginia Postrel
Tinkerty Tonk
Instapundit
Megan McArdle
Mickey Kaus
Colby Cosh
Alien Corn
No Watermelons


Belles Lettres
Two Blowhards
Critical Mass
Arts and Letters Daily
God of the Machine
Armavirumque
About Last Night
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Diversity-Oriented Synthesis: Oriented The Right Way? | Main | Nitromed: Someone Wants Them »

January 21, 2009

The Hideous Numbers of Compounds

Email This Entry

Posted by Derek

I was blithely throwing around the term “chemical space” in yesterday’s post. So, what am I talking about, and how much room is in there, anyway?

Let's narrow it down to organic compounds, to start with, or at least compounds that are mostly organic. A working definition, as far as people interested in biology and medicine go, might then be “the domain of chemical compounds compatible with living systems”. That excludes the red-hot reactive stuff and the unstable exploders, but leaves most everything else. Let’s also ignore macromolecules of various kinds and cut back to “drug-like” sizes – say, molecular weight 500 or less. That way we don’t have infinite numbers of polymers going off in all directions; that should help. And that leaves us with. . .?

A ridiculously large set of compounds, still. You can see how things get out of control pretty quickly if you just consider a building-block problem. Imagine breaking compounds down into simple units - an aryl ring, an ether, a tertiary amine, and so on. What sorts of numbers do you get when you start mixing and matching them? Well, there are an awful lot of possible building blocks. You could quickly fill out a hundred different examples of each of those three subunits, so there's one hundred to the third, or a million possible compounds without even exerting yourself very much.

This sort of thought experiment has been done several times. One estimate done by this fragment approach and considering only stable structures came in between 10 to the twentieth and ten to the twenty-fourth compounds that could potentially be prepared using known synthetic methods. (See here for another "how many compounds are possible?" paper, from a different angle - the group that did that work has followed it up recently, which will be the subject of another post sometime). Needless to say, that is considerably larger than the total number of organic compounds ever described in reality. There's not enough carbon, oxygen, and nitrogen on earth to prepare a vial of each of these, and where would you put the vials? The terrifying thing is that this is actually one of the lower estimates, and thus perhaps a very reasonable and conservative one. You can find ten-to-the-sixtieth estimates out there, which is a figure that cannot be dealt with by human efforts.

These sorts of numbers are why some people doubt the utility of just cranking out neat structures. But looked at from the other direction, the number of compounds we have available isn't nearly so impressive, so making new ones, especially long lists of new ones, makes a difference in what we actually have in hand. But is it a difference akin to buying a thousand lottery tickets rather than buying one?

Comments (13) + TrackBacks (0) | Category: Drug Assays


COMMENTS

1. Rich Apodaca on January 21, 2009 10:13 AM writes...

Just trying to enumerate all compounds of a given molecular formula leads to large numbers very quickly:

http://depth-first.com/articles/2006/11/15/diversity-oriented-chemical-informatics

Permalink to Comment

2. Thomas E. McEntee on January 21, 2009 10:35 AM writes...

Last time I looked--10 years ago?--CAS registry numbers had been assigned to 25 million (2.5E7) substances, including polymers, mixtures, and anything biological that CAS will index. That's a long way from the range of 10E20 to 10E24...

Permalink to Comment

3. Tot. Syn. on January 21, 2009 10:41 AM writes...

One of the things to consider is similarity of structures, which is the inverse of the diversity. And then you need to consider the parameters used to define the diversity, which more-or-less defines the axes of the chemical space. This is a tough task, as you need to specify what makes molecule A different to molecule B. And I'd guess that this needs to be done whilst considering the three-dimensional energy-minimised form of the structures, rather than 2D sketches. So does that boil-down to a description of a 3D shape, with certain variables mapped onto it, like polarity? Surely we can make any shape we like, given enough steps and material? No, because other parameters like flexibility are important... This is why I find DOS so hard to get a handle on.

I guess these concepts are particularly important when considering the compound collections owned by big-pharma, as they have to represent a 'therapeutically useful' array of chemical space. So how do you make sure that your library is as diverse as it needs to be, and keep it that diverse over time (as samples are exhausted). Do you remake exhausted samples, or fill it with new ones?

The other term that it hard to get a handle on is 'drug-space'. Sure, you can use Lipinski to define some parameters, but that's going to leave a lot of current drugs outside your fence, like taxotere.

Argh... thinking in more than three dimensions turns my brain inside-out...

Permalink to Comment

4. TW Andrews on January 21, 2009 11:46 AM writes...

Having spent a few years looking at HTS data from a variety of screening labs, put me into the camp that's pretty skeptical of the crank-stuff-out and test it approach.

Chemical space is huge, and the biologically interesting molecules are likely to only be a very small fraction of it. I don't recall the Combi-chem wave resulting in significant uptick in the number of viable leads (though having worked far upstream from the med-chem end of things, this is more an impression than a careful observation).

That said, I'm not sure what the best way to go about finding novel, biologically interesting compounds is. I've always been a fan of natural products, since we at least know that they're somehow relevant to biological systems--but wow they're a pain.

Permalink to Comment

5. anon on January 21, 2009 12:17 PM writes...

My impression of combi chem is that speeds up a project maybe 10%; about the same as outsourcing. From a cost/benefit POV, better than zero, but not impressive.

Permalink to Comment

6. milkshake on January 21, 2009 3:57 PM writes...

Why would you want to exclude primary explosives? Nitroglycerin is a life-saving drug.

Permalink to Comment

7. nitric oxide 99 on January 21, 2009 5:26 PM writes...

Hey Derek - enjoy the blog daily. A question that came up in a group meeting recently - are there any know (or believed) uses for organosilicon compounds in medicinal chemistry? I was hard pressed to think of any and a quick look doesn't turn up very much. Would appreciate your thoughts.

Permalink to Comment

9. PhMe2SiLi on January 21, 2009 6:34 PM writes...


Nitric oxide - i know of one company that specialised in organosilicon compounds as med. chem. targets, Amedis Pharmaceuticals (see below) based in Cambridge UK, they were taken over by Argenta in 2003. I guess the idea of replacing a carbon or two with a silicon could in theory bust a patent?

"About Amedis Pharmaceuticals Ltd

Amedis Pharmaceuticals is a chemistry driven drug discovery company developing innovative pharmaceutical technologies and products. The company has two broad technology platforms:

Exploitation of silicon based medicinal chemistry to develop both improved versions of existing drugs (Silicon Switches) and novel drug candidates (silicon based NCEs)

Artificial Intelligence technology applied to drug discovery to predict the properties of molecules before they are tested and to identify Therapeutic Switch drug candidates (new uses for existing drugs)"

Permalink to Comment

10. Hap on January 21, 2009 6:39 PM writes...

Scott Sieburth (and one other professor) have written papers on silanediols as transition state analogs for proteases. They look like a pain to make, though.

Permalink to Comment

11. Shane on January 21, 2009 9:53 PM writes...

One vial of each compound is an awful lot. There are 10^23 molecules in a mole. How many actual molecules does a bacterium need to make to influence its survival? How many bacteria in an ocean? And how many generations in a few million years?

It seems to me the main problem isnt the scale of chemical diversity, but how crude and clumsy our screening methods are, and how impatient we are to "cure" things like "obesity", "ageing" and "depression" with billion dollar making drugs.

Permalink to Comment

12. sjb on January 22, 2009 6:22 AM writes...

re PhMe2SiLi (#9)

I don't think Amedis were taken over by Argenta, rather by Paradigm and what is now Takeda.

S

Permalink to Comment

13. nitric oxide 99 on January 22, 2009 10:30 AM writes...

Thanks for the feedback guys/gals!

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Academia and Industry, Suing Each Other
Let's Start Off the Meeting With An Ad, OK?
The Academic-Industrial Collaboration in Drug Discovery Panel: Today
Glass Structure, Atom by Atom
How the Andrulis Paper Got Published
AstraZeneca in Waltham
Fluorine NMR: Why Not?
AstraZeneca Layoffs and Closings