« Pfizer / Wyeth: Different This Time? |
| Pfizer and the Credit Crunch »
January 27, 2009
A Long Tail Indeed
A reader reminded me of this paper, which I meant to blog on when it came out last year. The authors looked over the entire Chemical Abstracts Service registry file – in theory, every compound that’s ever been reported in the chemical literature – and asked how many different chemical scaffolds make up the organic chemistry part of the collection. (That ran to a bit over 24 million compounds at the time the paper was written).
You’d expect a power-law (“long tail”) distribution in a data set like this, and that’s just what they found. Among heteroatom-containing scaffolds, the most common 5% were found in about 75% of the compounds. In fact, it was even steeper than that – the most common 0.25% of the heteroatom frameworks made up half the compounds! The flip side of this is that about half of the known scaffolds occur only once, which is about as long a tail as you can get.
That’s almost completely accounted for by (1) the availability of certain starting materials, largely from petroleum and from natural products and (2) the interest in preparing a given framework. Put more crassly, it depends on how much it’ll cost (in time and money), and how much you expect to get back. As the authors put it:
” We believe the presence of this power law is quantitative evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of organic chemistry.”
Tiny variations can send a given scaffold diving off the charts. Think, for example, about the usual steroid framework – there have been a huge number of variations worked on that, since they’re of medical interest and the starting materials are available (thanks, in the early days, to some Mexican yams and their biggest fan). But imagine going in and replacing one or two of those carbon atoms with nitrogens: whoosh, down you go. Many of those frameworks have hardly been touched at all, partly because they’re quite difficult to make. You’d have to have a very good reason to go after them, and that hasn’t presented itself. Meanwhile, the vast numbers of indoles, piperazines, and piperidines in drug molecules help to perpetuate themselves.
The same goes, and even more so, for general compound shapes (heteroatoms or all-carbon). The authors found 836708 different framework shapes, but that breaks down rather sharply: half the compounds are accounted for by 143 frameworks, and the other 836565 make up the other half. I’ll let the authors have the last word:
”It seems plausible to expect that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. If many compounds derived from a framework have already been synthesized, these derivatives can serve as a pool of potential starting materials for further syntheses. The availability of published schemes for making these derivatives, or the existence of these derivatives as commercial chemicals, would then facilitate the construction of more compounds based on the same framework. Of course, not all frameworks are equally likely to become the focus of a high degree of synthetic activity. Some frameworks are intrinsically more interesting than others due to their functional importance (e.g., as a building block in drug design), and this interest will stimulate the synthesis of derivatives. Once this synthetic activity is initiated, it may be amplified over time by a rich-get-richer process. . .”
+ TrackBacks (0) | Category: Chemical News | Drug Industry History | The Scientific Literature
POST A COMMENT
- RELATED ENTRIES
- How Not to Do It: NMR Magnets
- Allergan Escapes Valeant
- Vytorin Actually Works
- Fatalities at DuPont
- The New York TImes on Drug Discovery
- How Are Things at Princeton?
- Phage-Derived Catalysts
- Our Most Snorted-At Papers This Month. . .