Corante

About this Author
Derek Lowe
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Pharma Blogs:
The Science Business
Org Prep Daily
Kilomentor
On Pharma
Kinase Pro
Chemical Quantum Images
The LouRoe
One in Ten Thousand
Periodic Tabloid
Chemical Musings
C&E News Blog
Chemiotics II
Noel O'Blog
In Vivo Blog
Chirality
BBSRC/Douglas Kell
Drug Discovery Opinion
The Chemblog
Realizations in Biostatistics
Heterocyclic Chemistry Blog
Molecule of the Day
Chemjobber
WSJ Health Blog
PK/PD
Social Detritus
ChemSpider Blog
Node in the Noosphere
Pharmagossip
Organometallic Current
Useful Chemistry
Great Molecular Crapshoot
No Name No Slogan
Post Doc Ergo Propter Doc
SimBioSys
Culture of Chemistry
The Curious Wavefunction
Chemical Sabbatical
Totally Synthetic
Molecular Philosophy
Zusammen
Pharma's Cutting Edge
My Chemical Journey
The F- Blog
Chemical Professionals
Generally Chemistry
Chemistry World Blog
Eigenfunction/Eigenvalue
Synthesizing Ideas
Carbon-Based Curiosities
Business|Bytes|Genes|Molecules
Eye on FDA
Sigma-Aldrich ChemBlogs
Peter Murray-Rust
Chemical Forums
Depth-First
Curly Arrow
ChemCafe
Power of Goo
Fetz the Chemist
Carbon Tet
Chemical Crosspatch
Sceptical Chymist
Atomchuxky
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa
Making Graphite Work
Realm of Organic Synthesis
Liquid Carbon
Pharma Blog Review


Science Blogs and News:
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Life of a Lab Rat
Nobel Intent
SciTech Daily
Is This Thing On?
Science Blog
Eastern Blot
FuturePundit
Flags and Lollipops
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Terra Sigillata
Transterrestrial Musings
Slashdot Science
A Scientist's Life
Living the Scientific Life
Humans in Science
Speculist
Science, Shrimp and Grits
Cosmic Variance
The Capsule
Zeroth Order Approximation
Science Library Blog
Biology News Net


Medical Blogs
Med Tech Sentinel
DB's Medical Rants
Science-Based Medicine
GruntDoc
The Health Care Blog
Respectful Insolence
Black Triangle
Diabetes Mine


Economics and Business
Marginal Revolution
Arnold Kling
The Volokh Conspiracy
Knowledge Problem
The Stalwart


Politics / Current Events
Virginia Postrel
Tinkerty Tonk
Instapundit
Megan McArdle
Mickey Kaus
Colby Cosh
Alien Corn
No Watermelons


Belles Lettres
Two Blowhards
Critical Mass
Arts and Letters Daily
God of the Machine
Armavirumque
About Last Night
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Things I Won't Work With: Carbon Diselenide | Main | The Next Science »

March 6, 2005

Just How Many Compounds Are We Talking About?

Email This Entry

Posted by Derek

Just how many chemicals are there? As written, you can find estimates of anywhere from 10 to the eighteenth (pretty big, all right) all the way up to the gibbering, flee-in-terror order of ten to the two hundredth. A range like that makes it clear that no one knows what they're talking about, so the question need to be cut down to size. "How many chemicals are there below a certain molecular weight?" is a good start, and once you set that, you might want to stipulate the list of elements you'll include and whether or not the compounds are stable enough to be isolated.

A group from the University of Berne has just published a paper in Angewantde Chemie (44, 1504 in the English edition) which claims to answer just such a question, namely: "How many reasonably stable compounds are there with up to eleven atoms of either carbon, nitrogen, oxygen, or fluorine?" Should this one come up during your next poker game, you can now answer, in your best Mr. Spock voice, "Approximately 13,892,436." But hold on. Does that number sound low to you? If not, it should - read on.

The Berne group came up with their estimate by computationally assembling graphs which corresponded to all the saturated hydrocarbon backbones up to eleven carbons. Then they systematically replaced all possible carbons with N or O, allowed for double and triple bonds, and substituted all carbons with H or F. So far, so good. These variations generated a low of 4 and a high of 79236 compounds per carbon skeleton.

But they applied a set of mighty strict standards during these operations. Their algorithm rejected heteroatom-heteroatom bonds, except for those found in some aromatic heterocycles, as well as nitro groups, oximes and the like, so no peroxides (and no hydrazines, I suppose, although they're stable.) They also rejected bridgehead double bonds and allenes, and (to my surprise) only allowed triple bonds for nitriles (so no acetylenes.) They also rejected hydrolytically unstable groups - no enamines, no acyclic imines, no acyl halides, no enols and not even any orthoesters.

What this means is that there are plenty of compounds you can order from a catalog that aren't even on the list. Heck, there are compounds that are shipped in tank cars that aren't on the list. Allowing some of these compound classes to gain a foothold would have swelled the ranks a great deal. Moving further past their criteria, you can imagine how out of control things would get if you started calculating in sulfur, phosphorus, and more than one type of halogen atom. I don't know if this team is contemplating that exercise or not; they'll probably have to wait for a fresh crop of grad students before they can even try.

But I've left out a key statistic of theirs, a startling one. Back at that first step, when they graphically assembled those carbon frameworks, it turned out that the huge majority, a full 99.8% of them, had three- and four-membered rings in them. In order not to have a list so skewed toward cyclopropanes and cyclobutanes, they threw all of these out at the very start, leaving them with 1830 basic skeletons as opposed to 843,335 of them. Throwing out the likes of orthoesters and acetylenes, as it turns out, is nothing compared to the massive effect of shedding the small rings.

In this light, as the authors point out by an excellent astronomical analogy, their list of thirteen million stable compounds is actually surrounded and permeated by a huge unseen amount of "dark matter" - all those 3- and 4-membered rings. Many of them might be too strained to be stable, but many others would be fine. They just haven't been explored because they're too much of a pain to make. This, to me, was the single biggest surprise of the whole effort. I knew that there must be a lot of these compounds, but I never would have thought that their possible forms hugely outnumber all the other small molecules I've ever seen or thought of. What else don't we know?

Comments (4) + TrackBacks (0) | Category:


COMMENTS

1. Daniel Newby on March 7, 2005 12:12 AM writes...

I once got to thinking about a slightly different question: in the vast sea of reasonably stable structures, what fraction are immune to conventional synthesis? I.e., stuff where the most fruitful synthesis would be to smash likely-looking precursors in a particle accelerator and sort the debris with single-molecule NMR. For more than a few dozen carbons, I bet the fraction is appallingly large.

Now that I think about it, carbon nanotubes and fullerenes fall squarely in the "impossible to synthesize" category. Trying to make them an atom at a time would be madness. It is sheer luck that they self-assemble so nicely.

Permalink to Comment

2. David Govett on March 7, 2005 4:33 AM writes...

Will software ever be sophisticated enough and will hardware ever be fast and capacious enough to infer function from structure? If so, software would be able to model and characterize any number of compounds relatively quickly.

Permalink to Comment

3. Derek Lowe on March 7, 2005 8:41 AM writes...

The authors did run some "virtual screening" software through their library, and broke it down into how many structures were potential receptor ligands and so on. But my trust in those methods just barely moves the meter off zero.

I think that de novo function-from-structure falls into the category of "probably not quite impossible." What, in other words, an engineer would call, with a pained expression, "nontrivial."

Permalink to Comment

4. a Chemist on March 19, 2005 7:47 AM writes...

You might also want to have a look at Jonathan Goodman's article in the current Chem & Und (when it makes it's way across the pond, or online), Issue 6, p 18 (or the cited article from it, J Goodman and K de Silva, J. Chem. Inf. and Modeling 2005, 1, 81)

Looks at this from a slightly different perspective

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Dealing With Hedgehog Screening Results
Animal Rights, You Say?
Blogroll Update
Pharma's Return on Investment: Yikes
How A Real Drug Industry Project Meeting Goes
Ghostwriting
Just Give It to NIH
How Not To Do It: The Secret Patent Decoder Ring