Corante

About this Author
Derek Lowe
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com

Chemistry and Pharma Blogs:
Pharmalot
Org Prep Daily
On Pharma
One in Ten Thousand
Away From the Bench
QDIS Blog
Chemical Musings
In Vivo Blog
The Chemblog
Molecule of the Day
Kinase Pro
Drugs and Poisons
Jungfreudlich
Chembark
Social Detritus
Pharmagossip
Whistling in the Wind
Organometallic Current
Great Molecular Crapshoot
Post Doc Ergo Propter Doc
A Chemist's Lab Notebook
The Curious Wavefunction Totally Synthetic
Pharma's Cutting Edge
The F- Blog
Synthetic Environment
Atom Pusher
Chemistry World Blog
Carbon-Based Curiosities
Eye on FDA
Hdreioplus
Closeted Chemistry
Chemical Forums
Curly Arrow
Power of Goo
Carbon Tet
Totally Medicinal
Sceptical Chymist
Lamentations on Chemistry
PeterMR
Mining Drugs
Regulatory Affairs of the Heart
Making Graphite Work
Liquid Carbon
Half-Decent Pharma Blog


Science Blogs and News:
The Loom
Uncertain Principles
The Crimson Canary
Fierce Biotech
Blogs for Industry
The Futile Cycle
Omics! Omics!
Young Female Scientist
Notional Slurry
Life of a Lab Rat
TP With Page Numbers
Nobel Intent
SciTech Daily
Is This Thing On?
Science Blog
Eastern Blot
Oncology Updates
FuturePundit
Flags and Lollipops
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Daily Biomed
Voyage to Arcturus
Adventures in Ethics and Science
Terra Sigillata
Transterrestrial Musings
The Mass Spectrometry Blog
Nodal Point
Slashdot Science
A Scientist's Life
Living the Scientific Life
John Johnson
Humans in Science
Tobias Sing's Bioinformatics Blog
Speculist
Science, Shrimp and Grits
Biopeer
Cosmic Variance
The Capsule
Zeroth Order Approximation
Science Library Blog
Biology News Net


Medical Blogs
MedPundit
Med Tech Sentinel
DB's Medical Rants
Dr. Charles
RangelMD
GruntDoc
The Health Care Blog
Cut to Cure
Respectful Insolence
Black Triangle
Diabetes Mine


Economics and Business
Marginal Revolution
Arnold Kling
The Volokh Conspiracy
Knowledge Problem
The Stalwart


Politics / Current Events
Virginia Postrel
Tinkerty Tonk
Instapundit
Asymmetrical Information
Belmont Club
Man Without Qualities
Belgravia Dispatch
Mickey Kaus
Colby Cosh
Progressive Reaction
No Watermelons


Belles Lettres
Two Blowhards
Critical Mass
Arts and Letters Daily
God of the Machine
Armavirumque
About Last Night
Just Released the 2008 Tribalization of Business study - an in-depth look at how 140+ organizations are managing and measuring online communities

In the Pipeline

« Things I Won't Work With: Carbon Diselenide | Main | The Next Science »

March 6, 2005

Just How Many Compounds Are We Talking About?

Email This Entry

Posted by Derek

Just how many chemicals are there? As written, you can find estimates of anywhere from 10 to the eighteenth (pretty big, all right) all the way up to the gibbering, flee-in-terror order of ten to the two hundredth. A range like that makes it clear that no one knows what they're talking about, so the question need to be cut down to size. "How many chemicals are there below a certain molecular weight?" is a good start, and once you set that, you might want to stipulate the list of elements you'll include and whether or not the compounds are stable enough to be isolated.

A group from the University of Berne has just published a paper in Angewantde Chemie (44, 1504 in the English edition) which claims to answer just such a question, namely: "How many reasonably stable compounds are there with up to eleven atoms of either carbon, nitrogen, oxygen, or fluorine?" Should this one come up during your next poker game, you can now answer, in your best Mr. Spock voice, "Approximately 13,892,436." But hold on. Does that number sound low to you? If not, it should - read on.

The Berne group came up with their estimate by computationally assembling graphs which corresponded to all the saturated hydrocarbon backbones up to eleven carbons. Then they systematically replaced all possible carbons with N or O, allowed for double and triple bonds, and substituted all carbons with H or F. So far, so good. These variations generated a low of 4 and a high of 79236 compounds per carbon skeleton.

But they applied a set of mighty strict standards during these operations. Their algorithm rejected heteroatom-heteroatom bonds, except for those found in some aromatic heterocycles, as well as nitro groups, oximes and the like, so no peroxides (and no hydrazines, I suppose, although they're stable.) They also rejected bridgehead double bonds and allenes, and (to my surprise) only allowed triple bonds for nitriles (so no acetylenes.) They also rejected hydrolytically unstable groups - no enamines, no acyclic imines, no acyl halides, no enols and not even any orthoesters.

What this means is that there are plenty of compounds you can order from a catalog that aren't even on the list. Heck, there are compounds that are shipped in tank cars that aren't on the list. Allowing some of these compound classes to gain a foothold would have swelled the ranks a great deal. Moving further past their criteria, you can imagine how out of control things would get if you started calculating in sulfur, phosphorus, and more than one type of halogen atom. I don't know if this team is contemplating that exercise or not; they'll probably have to wait for a fresh crop of grad students before they can even try.

But I've left out a key statistic of theirs, a startling one. Back at that first step, when they graphically assembled those carbon frameworks, it turned out that the huge majority, a full 99.8% of them, had three- and four-membered rings in them. In order not to have a list so skewed toward cyclopropanes and cyclobutanes, they threw all of these out at the very start, leaving them with 1830 basic skeletons as opposed to 843,335 of them. Throwing out the likes of orthoesters and acetylenes, as it turns out, is nothing compared to the massive effect of shedding the small rings.

In this light, as the authors point out by an excellent astronomical analogy, their list of thirteen million stable compounds is actually surrounded and permeated by a huge unseen amount of "dark matter" - all those 3- and 4-membered rings. Many of them might be too strained to be stable, but many others would be fine. They just haven't been explored because they're too much of a pain to make. This, to me, was the single biggest surprise of the whole effort. I knew that there must be a lot of these compounds, but I never would have thought that their possible forms hugely outnumber all the other small molecules I've ever seen or thought of. What else don't we know?

Comments (4) + TrackBacks (0) | Category:


COMMENTS

1. Daniel Newby on March 7, 2005 12:12 AM writes...

I once got to thinking about a slightly different question: in the vast sea of reasonably stable structures, what fraction are immune to conventional synthesis? I.e., stuff where the most fruitful synthesis would be to smash likely-looking precursors in a particle accelerator and sort the debris with single-molecule NMR. For more than a few dozen carbons, I bet the fraction is appallingly large.

Now that I think about it, carbon nanotubes and fullerenes fall squarely in the "impossible to synthesize" category. Trying to make them an atom at a time would be madness. It is sheer luck that they self-assemble so nicely.

Permalink to Comment

2. David Govett on March 7, 2005 4:33 AM writes...

Will software ever be sophisticated enough and will hardware ever be fast and capacious enough to infer function from structure? If so, software would be able to model and characterize any number of compounds relatively quickly.

Permalink to Comment

3. Derek Lowe on March 7, 2005 8:41 AM writes...

The authors did run some "virtual screening" software through their library, and broke it down into how many structures were potential receptor ligands and so on. But my trust in those methods just barely moves the meter off zero.

I think that de novo function-from-structure falls into the category of "probably not quite impossible." What, in other words, an engineer would call, with a pained expression, "nontrivial."

Permalink to Comment

4. a Chemist on March 19, 2005 7:47 AM writes...

You might also want to have a look at Jonathan Goodman's article in the current Chem & Und (when it makes it's way across the pond, or online), Issue 6, p 18 (or the cited article from it, J Goodman and K de Silva, J. Chem. Inf. and Modeling 2005, 1, 81)

Looks at this from a slightly different perspective

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Pfizer's Prospects: Just Ducky
Happy Fourth of July
I Can Has Ugly Molecules?
More Pfizer Layoffs?
Leaving Comments: A Fix
The Gates Foundation: Dissatisfied With Results?
Another Alzheimer's Compound Goes Down
Unknown - But You Can Buy It