About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Pfizer / Wyeth: Different This Time? | Main | Pfizer and the Credit Crunch »

January 27, 2009

A Long Tail Indeed

Email This Entry

Posted by Derek

A reader reminded me of this paper, which I meant to blog on when it came out last year. The authors looked over the entire Chemical Abstracts Service registry file – in theory, every compound that’s ever been reported in the chemical literature – and asked how many different chemical scaffolds make up the organic chemistry part of the collection. (That ran to a bit over 24 million compounds at the time the paper was written).

You’d expect a power-law (“long tail”) distribution in a data set like this, and that’s just what they found. Among heteroatom-containing scaffolds, the most common 5% were found in about 75% of the compounds. In fact, it was even steeper than that – the most common 0.25% of the heteroatom frameworks made up half the compounds! The flip side of this is that about half of the known scaffolds occur only once, which is about as long a tail as you can get.

That’s almost completely accounted for by (1) the availability of certain starting materials, largely from petroleum and from natural products and (2) the interest in preparing a given framework. Put more crassly, it depends on how much it’ll cost (in time and money), and how much you expect to get back. As the authors put it:

” We believe the presence of this power law is quantitative evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of organic chemistry.”

Tiny variations can send a given scaffold diving off the charts. Think, for example, about the usual steroid framework – there have been a huge number of variations worked on that, since they’re of medical interest and the starting materials are available (thanks, in the early days, to some Mexican yams and their biggest fan). But imagine going in and replacing one or two of those carbon atoms with nitrogens: whoosh, down you go. Many of those frameworks have hardly been touched at all, partly because they’re quite difficult to make. You’d have to have a very good reason to go after them, and that hasn’t presented itself. Meanwhile, the vast numbers of indoles, piperazines, and piperidines in drug molecules help to perpetuate themselves.

The same goes, and even more so, for general compound shapes (heteroatoms or all-carbon). The authors found 836708 different framework shapes, but that breaks down rather sharply: half the compounds are accounted for by 143 frameworks, and the other 836565 make up the other half. I’ll let the authors have the last word:

”It seems plausible to expect that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. If many compounds derived from a framework have already been synthesized, these derivatives can serve as a pool of potential starting materials for further syntheses. The availability of published schemes for making these derivatives, or the existence of these derivatives as commercial chemicals, would then facilitate the construction of more compounds based on the same framework. Of course, not all frameworks are equally likely to become the focus of a high degree of synthetic activity. Some frameworks are intrinsically more interesting than others due to their functional importance (e.g., as a building block in drug design), and this interest will stimulate the synthesis of derivatives. Once this synthetic activity is initiated, it may be amplified over time by a rich-get-richer process. . .”

Comments (5) + TrackBacks (0) | Category: Chemical News | Drug Industry History | The Scientific Literature


1. Jose on January 27, 2009 10:26 AM writes...

The real question becomes: are there prizes to be found by devising solid syntheses of funky heterocycles, and filling out that space with lead compounds, or does statistics always win?

Permalink to Comment

2. John Spevacek on January 27, 2009 11:39 AM writes...

You should point out that the article has free access.

Permalink to Comment

3. MTK on January 27, 2009 2:51 PM writes...

Sort of suggests that it isn't just combichem that only makes what it can make. It's all of chemistry.

Permalink to Comment

4. UK Chemist on January 28, 2009 6:12 AM writes...

This could also show the damage that has been done to the synthetic capability of most med chem departments.The decline in this capability has been a not insignificant factor in the decline in big pharma R&D productivity.

Permalink to Comment

5. SteveM on January 29, 2009 8:52 AM writes...

I am not a medicinal chemist. However, if the government is going to piss away over $800 billion with its “Stimulus� plan, perhaps it should consider shoveling some of that money to NIH. Let them hire a bunch of laid-off synthesis chemists (sorry, no H-1B’s) and have them create a set of modified scaffold libraries in sufficient quantities for downstream derivatization and biological testing.

If unique medicinal qualities are discovered, it could catalyze new areas of pharmaceutical activity. If not, well the scientific value generated would probably still exceed that of re-sodding the national Mall. And with all of the pharma R&D downsizing, there are plenty of shuttered facilities that NIH could lease out for the duration of the initiative.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry