Here's a new paper in J. Med. Chem. on software that tries to implement matched-molecular-pair type analysis. The goal is a recommendation - what R group should I put on next?
Now, any such approach is going to have to deal with this paper from Abbott in 2008. In that one, an analysis of 84,000 compounds across 30 targets strongly suggested that most R-group replacements had, on average, very little effect on potency. That's not to say that they don't or can't affect binding, far from it - just that over a large series, those effects are pretty much a normal distribution centered on zero. There are also analyses that claim the same thing for adding methyl groups - to be sure, there are many dramatic "magic methyl" enhancement examples, but are they balanced out, on the whole, by a similar number of dramatic drop-offs, along with a larger cohort of examples where not much happened at all?
To their credit, the authors of this new paper reference these others right up front. The answer to these earlier papers, most likely, is that when you average across all sorts of binding sites, you're going to see all sorts of effects. For this to work, you've got a far better chance of getting something useful if you're working inside the same target or assay. Here we get to the nuts and bolts:
The predictive method proposed, Matsy, relies on the hypothesis that a particular matched series tends to have a preferred activity order, for example, that not all six possible orders of [Br, Cl, F] are equally frequent. . .Although a rather straightforward idea, we have been unable to find any quantitative analysis of this question in the literature.
So they go on to provide one, with halogen substituents. There's not much to be found comparing pairs of halogen compounds head to head, but when you go to the longer series, you find that the order Br > Cl > F > H is by far the most common (and that appears to be just a good old grease effect). The next most common order just swaps the bromine and chlorine, but the third most common is the original order, in reverse. The other end of the distribution is interesting, too - for example, the least most common order is Br > H > F > Cl, which is believable, since it doesn't make much sense along any property axis.
They go on to do the same sorts of analyses for other matched series, and the question then becomes, if you have such a matched series in your own SAR, what does that order tell you about what to make next? The idea of "SAR transfer" has been explored, and older readers will remember the Topliss tree for picking aromatic substituents (do younger ones?)
The Matsy algorithm may be considered a formalism of aspects of how a medicinal chemist works in practice. Observing a particular trend, a chemist considers what to make next on the basis of chemical intuition, experience with related compounds or targets, and ease of synthesis. The structures suggested by Matsy preserve the core features of molecules while recommending small modifications, a process very much in line with the type of functional group replacement that is common in lead optimization projects. This is in contrast to recommendations from fingerprint-based similarity comparisons where the structural similarity is not always straightforward to rationalize and near-neighbors may look unnatural to a medicinal chemist.
And there's a key point: prediction and recommendation programs walk a fine line, between "There's no way I'm going out of my way to make that" and "I didn't need this program to tell me this". Sometimes there's hardly any space between those two territories at all. Where do this program's recommendations fall? As companies try this out in-house, some people will be finding out. . .