Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Things I Won't Work With: Carbon Diselenide | Main | The Next Science »

March 6, 2005

Just How Many Compounds Are We Talking About?

Email This Entry

Posted by Derek

Just how many chemicals are there? As written, you can find estimates of anywhere from 10 to the eighteenth (pretty big, all right) all the way up to the gibbering, flee-in-terror order of ten to the two hundredth. A range like that makes it clear that no one knows what they're talking about, so the question need to be cut down to size. "How many chemicals are there below a certain molecular weight?" is a good start, and once you set that, you might want to stipulate the list of elements you'll include and whether or not the compounds are stable enough to be isolated.

A group from the University of Berne has just published a paper in Angewantde Chemie (44, 1504 in the English edition) which claims to answer just such a question, namely: "How many reasonably stable compounds are there with up to eleven atoms of either carbon, nitrogen, oxygen, or fluorine?" Should this one come up during your next poker game, you can now answer, in your best Mr. Spock voice, "Approximately 13,892,436." But hold on. Does that number sound low to you? If not, it should - read on.

The Berne group came up with their estimate by computationally assembling graphs which corresponded to all the saturated hydrocarbon backbones up to eleven carbons. Then they systematically replaced all possible carbons with N or O, allowed for double and triple bonds, and substituted all carbons with H or F. So far, so good. These variations generated a low of 4 and a high of 79236 compounds per carbon skeleton.

But they applied a set of mighty strict standards during these operations. Their algorithm rejected heteroatom-heteroatom bonds, except for those found in some aromatic heterocycles, as well as nitro groups, oximes and the like, so no peroxides (and no hydrazines, I suppose, although they're stable.) They also rejected bridgehead double bonds and allenes, and (to my surprise) only allowed triple bonds for nitriles (so no acetylenes.) They also rejected hydrolytically unstable groups - no enamines, no acyclic imines, no acyl halides, no enols and not even any orthoesters.

What this means is that there are plenty of compounds you can order from a catalog that aren't even on the list. Heck, there are compounds that are shipped in tank cars that aren't on the list. Allowing some of these compound classes to gain a foothold would have swelled the ranks a great deal. Moving further past their criteria, you can imagine how out of control things would get if you started calculating in sulfur, phosphorus, and more than one type of halogen atom. I don't know if this team is contemplating that exercise or not; they'll probably have to wait for a fresh crop of grad students before they can even try.

But I've left out a key statistic of theirs, a startling one. Back at that first step, when they graphically assembled those carbon frameworks, it turned out that the huge majority, a full 99.8% of them, had three- and four-membered rings in them. In order not to have a list so skewed toward cyclopropanes and cyclobutanes, they threw all of these out at the very start, leaving them with 1830 basic skeletons as opposed to 843,335 of them. Throwing out the likes of orthoesters and acetylenes, as it turns out, is nothing compared to the massive effect of shedding the small rings.

In this light, as the authors point out by an excellent astronomical analogy, their list of thirteen million stable compounds is actually surrounded and permeated by a huge unseen amount of "dark matter" - all those 3- and 4-membered rings. Many of them might be too strained to be stable, but many others would be fine. They just haven't been explored because they're too much of a pain to make. This, to me, was the single biggest surprise of the whole effort. I knew that there must be a lot of these compounds, but I never would have thought that their possible forms hugely outnumber all the other small molecules I've ever seen or thought of. What else don't we know?

Comments (4) + TrackBacks (0) | Category:


COMMENTS

1. Daniel Newby on March 7, 2005 12:12 AM writes...

I once got to thinking about a slightly different question: in the vast sea of reasonably stable structures, what fraction are immune to conventional synthesis? I.e., stuff where the most fruitful synthesis would be to smash likely-looking precursors in a particle accelerator and sort the debris with single-molecule NMR. For more than a few dozen carbons, I bet the fraction is appallingly large.

Now that I think about it, carbon nanotubes and fullerenes fall squarely in the "impossible to synthesize" category. Trying to make them an atom at a time would be madness. It is sheer luck that they self-assemble so nicely.

Permalink to Comment

2. David Govett on March 7, 2005 4:33 AM writes...

Will software ever be sophisticated enough and will hardware ever be fast and capacious enough to infer function from structure? If so, software would be able to model and characterize any number of compounds relatively quickly.

Permalink to Comment

3. Derek Lowe on March 7, 2005 8:41 AM writes...

The authors did run some "virtual screening" software through their library, and broke it down into how many structures were potential receptor ligands and so on. But my trust in those methods just barely moves the meter off zero.

I think that de novo function-from-structure falls into the category of "probably not quite impossible." What, in other words, an engineer would call, with a pained expression, "nontrivial."

Permalink to Comment

4. a Chemist on March 19, 2005 7:47 AM writes...

You might also want to have a look at Jonathan Goodman's article in the current Chem & Und (when it makes it's way across the pond, or online), Issue 6, p 18 (or the cited article from it, J Goodman and K de Silva, J. Chem. Inf. and Modeling 2005, 1, 81)

Looks at this from a slightly different perspective

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
XKCD on Protein Folding
The 2014 Chemistry Nobel: Beating the Diffraction Limit
German Pharma, Or What's Left of It
Sunesis Fails with Vosaroxin
A New Way to Estimate a Compound's Chances?
Meinwald Honored
Molecular Biology Turns Into Chemistry
Speaking at Northeastern