Chemists who don't (or don't yet) work in drug discovery often wonder just what sort of chemistry we do over here. There are a lot of jokes about methyl-ethyl-butyl-futile, which have a bit of an edge to them for people just coming out of a big-deal total synthesis group in academia. They wonder if they're really setting themselves up for a yawn-inducing lab career of Suzuki couplings and amide formation, gradually becoming leery of anything that takes more than three steps to make.
Well, now there's some hard data on that topic. The authors took the combined publication output from their company, Pfizer, and GSK, as published in the Journal of Medicinal Chemistry, Bioorganic Med Chem Letters and Bioorganic and Medicinal Chemistry, starting in 2008. And they analyzed this set for what kinds of reactions were used, how long the synthetic routes were, and what kinds of compounds were produced. Their motivation?
. . .discussions with other chemists have revealed that many of our drug discovery colleagues outside the synthetic community perceive our syntheses to consist of typically six steps, predominantly composed of amine deprotections to facilitate amide formation reactions and Suzuki couplings to produce biaryl derivatives. These “typical” syntheses invariably result in large, ﬂat, achiral derivatives destined for screening cascades. We believed these statements to be misconceptions, or at the very least exaggerations, but noted there was little if any hard evidence in the literature to support our case.
Six steps? You must really want those compounds, eh? At any rate, their data set ended up with about 7300 reactions and about 3600 compounds. And some clear trends showed up. For example, nearly half the reactions involved forming carbon-heteroatom bonds, with half of those (22% of the total) being acylations. mostly amide formation. But only about one tenth of the reactions were C-C bond-forming steps (40% of those were Suzuki-style couplings and 18% were Sonogoshira reactions). One-fifth were protecting group manipulations (almost entirely on COOH and amine groups), and eight per cent were heterocycle formation, and everything else was well down into the single digits.
There are some interesting trends in those other reactions, though. Reduction reactions are much more common than oxidations - the frequency of nitro-to-amine reductions is one factor behind that, followed by other groups down to amines (few of these are typically run in the other direction). Among those oxidations, alcohol-to-aldehyde is the favorite. Outside of changes in reduction state, alcohol-to-halide is the single most favorite functional group transformation, followed by acid to acid chloride, both of which make sense from their reactivity in later steps.
Overall, the single biggest reaction is. . .N-acylation to an amide. So that part of the stereotype is true. At the bottom of the list, with only one reaction apiece, were N-alkylation of an aniline, benzylic/allylic oxidation, and alkene oxidation. Sulfonation, nitration, and the Heck reaction were just barely represented as well.
Analyzing the compounds instead of the reactions, they found that 99% of the compounds contained at least one aromatic ring (with almost 40% showing an aryl-aryl linkage) and over half have an amide, which totals aren't going to do much to dispel the stereotypes, either. The most popular heteroaromatic ring is pyridine, followed by pyrimidine and then the most popular of the five-membered ones, pyrazole. 43% have an aliphatic amine, which I can well believe (in fact, I'm surprised that it's not even higher). Most of those are tertiary amines, and the most-represented of those are pyrrolidines, followed closely by piperazines.
In other functionality, about a third of the compounds have at least one fluorine atom in them, and 30% have an aryl chloride. In contrast to the amides, there are only about 10% of the compounds with sulfonamides. 35% have an aryl ether (mostly methoxy), 10% have an aliphatic alcohol (versus only 5% with a phenol). The least-represented functional groups (of the ones that show up at all!) are carbonate, sulfoxide, alkyl chloride, and aryl nitro, followed by amidines and thiols. There's not a single alkyl bromide or aliphatic nitro in the bunch.
The last part of the paper looks at synthetic complexity. About 3000 of the compounds were part of traceable synthetic schemes, and most of these were 3 and 4 steps long. (The distribution has a pretty long tail, though, going out past 10 steps). Molecular weights tend to peak at between 350 and 550, and clogP peaks at around 3.5 to 5. These all sound pretty plausible to me.
Now that we've got a reasonable med-chem snapshot, though, what does it tell us? I'm going to use a whole different post to go into that, but I think that my take-away was that, for the most part, we have a pretty accurate mental picture of the sorts of compounds we make. But is that a good picture, or not?