Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe
A reader reminded me of this paper, which I meant to blog on when it came out last year. The authors looked over the entire Chemical Abstracts Service registry file – in theory, every compound that’s ever been reported in the chemical literature – and asked how many different chemical scaffolds make up the organic chemistry part of the collection. (That ran to a bit over 24 million compounds at the time the paper was written).
You’d expect a power-law (“long tail”) distribution in a data set like this, and that’s just what they found. Among heteroatom-containing scaffolds, the most common 5% were found in about 75% of the compounds. In fact, it was even steeper than that – the most common 0.25% of the heteroatom frameworks made up half the compounds! The flip side of this is that about half of the known scaffolds occur only once, which is about as long a tail as you can get.
That’s almost completely accounted for by (1) the availability of certain starting materials, largely from petroleum and from natural products and (2) the interest in preparing a given framework. Put more crassly, it depends on how much it’ll cost (in time and money), and how much you expect to get back. As the authors put it:
” We believe the presence of this power law is quantitative evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of organic chemistry.”
Tiny variations can send a given scaffold diving off the charts. Think, for example, about the usual steroid framework – there have been a huge number of variations worked on that, since they’re of medical interest and the starting materials are available (thanks, in the early days, to some Mexican yams and their biggest fan). But imagine going in and replacing one or two of those carbon atoms with nitrogens: whoosh, down you go. Many of those frameworks have hardly been touched at all, partly because they’re quite difficult to make. You’d have to have a very good reason to go after them, and that hasn’t presented itself. Meanwhile, the vast numbers of indoles, piperazines, and piperidines in drug molecules help to perpetuate themselves.
The same goes, and even more so, for general compound shapes (heteroatoms or all-carbon). The authors found 836708 different framework shapes, but that breaks down rather sharply: half the compounds are accounted for by 143 frameworks, and the other 836565 make up the other half. I’ll let the authors have the last word:
”It seems plausible to expect that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. If many compounds derived from a framework have already been synthesized, these derivatives can serve as a pool of potential starting materials for further syntheses. The availability of published schemes for making these derivatives, or the existence of these derivatives as commercial chemicals, would then facilitate the construction of more compounds based on the same framework. Of course, not all frameworks are equally likely to become the focus of a high degree of synthetic activity. Some frameworks are intrinsically more interesting than others due to their functional importance (e.g., as a building block in drug design), and this interest will stimulate the synthesis of derivatives. Once this synthetic activity is initiated, it may be amplified over time by a rich-get-richer process. . .”
The real question becomes: are there prizes to be found by devising solid syntheses of funky heterocycles, and filling out that space with lead compounds, or does statistics always win?
4. UK Chemist on January 28, 2009 6:12 AM writes...
This could also show the damage that has been done to the synthetic capability of most med chem departments.The decline in this capability has been a not insignificant factor in the decline in big pharma R&D productivity.
I am not a medicinal chemist. However, if the government is going to piss away over $800 billion with its “Stimulus� plan, perhaps it should consider shoveling some of that money to NIH. Let them hire a bunch of laid-off synthesis chemists (sorry, no H-1B’s) and have them create a set of modified scaffold libraries in sufficient quantities for downstream derivatization and biological testing.
If unique medicinal qualities are discovered, it could catalyze new areas of pharmaceutical activity. If not, well the scientific value generated would probably still exceed that of re-sodding the national Mall. And with all of the pharma R&D downsizing, there are plenty of shuttered facilities that NIH could lease out for the duration of the initiative.
1. Jose on January 27, 2009 10:26 AM writes...
The real question becomes: are there prizes to be found by devising solid syntheses of funky heterocycles, and filling out that space with lead compounds, or does statistics always win?
Permalink to Comment2. John Spevacek on January 27, 2009 11:39 AM writes...
You should point out that the article has free access.
Permalink to Comment3. MTK on January 27, 2009 2:51 PM writes...
Sort of suggests that it isn't just combichem that only makes what it can make. It's all of chemistry.
Permalink to Comment4. UK Chemist on January 28, 2009 6:12 AM writes...
This could also show the damage that has been done to the synthetic capability of most med chem departments.The decline in this capability has been a not insignificant factor in the decline in big pharma R&D productivity.
Permalink to Comment5. SteveM on January 29, 2009 8:52 AM writes...
I am not a medicinal chemist. However, if the government is going to piss away over $800 billion with its “Stimulus� plan, perhaps it should consider shoveling some of that money to NIH. Let them hire a bunch of laid-off synthesis chemists (sorry, no H-1B’s) and have them create a set of modified scaffold libraries in sufficient quantities for downstream derivatization and biological testing.
If unique medicinal qualities are discovered, it could catalyze new areas of pharmaceutical activity. If not, well the scientific value generated would probably still exceed that of re-sodding the national Mall. And with all of the pharma R&D downsizing, there are plenty of shuttered facilities that NIH could lease out for the duration of the initiative.
Permalink to Comment