About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Going After Poor Published Research | Main | What If Total Syntheses Had Only 25 Steps to Work In? »

March 17, 2014

Predicting What Group to Put On Next

Email This Entry

Posted by Derek

Here's a new paper in J. Med. Chem. on software that tries to implement matched-molecular-pair type analysis. The goal is a recommendation - what R group should I put on next?

Now, any such approach is going to have to deal with this paper from Abbott in 2008. In that one, an analysis of 84,000 compounds across 30 targets strongly suggested that most R-group replacements had, on average, very little effect on potency. That's not to say that they don't or can't affect binding, far from it - just that over a large series, those effects are pretty much a normal distribution centered on zero. There are also analyses that claim the same thing for adding methyl groups - to be sure, there are many dramatic "magic methyl" enhancement examples, but are they balanced out, on the whole, by a similar number of dramatic drop-offs, along with a larger cohort of examples where not much happened at all?

To their credit, the authors of this new paper reference these others right up front. The answer to these earlier papers, most likely, is that when you average across all sorts of binding sites, you're going to see all sorts of effects. For this to work, you've got a far better chance of getting something useful if you're working inside the same target or assay. Here we get to the nuts and bolts:

The predictive method proposed, Matsy, relies on the hypothesis that a particular matched series tends to have a preferred activity order, for example, that not all six possible orders of [Br, Cl, F] are equally frequent. . .Although a rather straightforward idea, we have been unable to find any quantitative analysis of this question in the literature.

So they go on to provide one, with halogen substituents. There's not much to be found comparing pairs of halogen compounds head to head, but when you go to the longer series, you find that the order Br > Cl > F > H is by far the most common (and that appears to be just a good old grease effect). The next most common order just swaps the bromine and chlorine, but the third most common is the original order, in reverse. The other end of the distribution is interesting, too - for example, the least most common order is Br > H > F > Cl, which is believable, since it doesn't make much sense along any property axis.

They go on to do the same sorts of analyses for other matched series, and the question then becomes, if you have such a matched series in your own SAR, what does that order tell you about what to make next? The idea of "SAR transfer" has been explored, and older readers will remember the Topliss tree for picking aromatic substituents (do younger ones?)

The Matsy algorithm may be considered a formalism of aspects of how a medicinal chemist works in practice. Observing a particular trend, a chemist considers what to make next on the basis of chemical intuition, experience with related compounds or targets, and ease of synthesis. The structures suggested by Matsy preserve the core features of molecules while recommending small modifications, a process very much in line with the type of functional group replacement that is common in lead optimization projects. This is in contrast to recommendations from fingerprint-based similarity comparisons where the structural similarity is not always straightforward to rationalize and near-neighbors may look unnatural to a medicinal chemist.

And there's a key point: prediction and recommendation programs walk a fine line, between "There's no way I'm going out of my way to make that" and "I didn't need this program to tell me this". Sometimes there's hardly any space between those two territories at all. Where do this program's recommendations fall? As companies try this out in-house, some people will be finding out. . .

Comments (13) + TrackBacks (0) | Category: Drug Development | In Silico


1. petros on March 17, 2014 10:36 AM writes...

Long time since I've seen a Topliss tree mentioned. However, it was a very useful pragmatic approach to exploring substituents in the days when reactions were run in singlicate.

Interesting to see that this is another AZ study on matched pairs.

Permalink to Comment

2. bhip on March 17, 2014 10:52 AM writes...

Can't help noticing that the chemist illustrated in the cartoon in the abstract appears to be of Asian ancestry...even cartoon chemists are been outsourced now....

Permalink to Comment

3. does not meet journal standards on March 17, 2014 10:59 AM writes...

Pretty sure JMC has a policy of only accepting papers with experimental work. This paper has none. Sure they make a bunch of predictions and check them with chembl, but they used chembl to train. For shame, JMC editors.

We all know the real story, those rules only apply to the unclean.

Permalink to Comment

4. Anonymous on March 17, 2014 11:12 AM writes...

Nature has already answered this question: just look at the range of amino acid side chains and add a few of those.

Permalink to Comment

5. anonymous coward on March 17, 2014 11:27 AM writes...

3: Maybe they have an unspoken policy for publishing only studies with experimental work, but that isn't what they say:

They specifically include computational studies as examples in their topics list:

Substantially novel computational chemistry methods with demonstrated value for the identification, optimization, or target interaction analysis of bioactive molecules.

Though they say the method has to be validated, they don't say it has to be done experimentally - for example, it could more compactly account for previous results.

Permalink to Comment

6. LeeH on March 17, 2014 12:00 PM writes...

People often miss the point of MMPs. They are not going to be indicative of average behavior. On the contrary. They are rare events, occurring way out on the edges of the curve. The question is - can those pairs (which really boil down to a particular molecular transformation) describe a change in structure that can be reapplied to some other compound, thereby changing some property in a favorable way? And is it obvious where on the starting structure it should be applied?

Permalink to Comment

7. Noel O'Boyle on March 17, 2014 12:01 PM writes...

I'm one of the authors on the above paper. Thanks for the write-up Derek!

@3: The journal guidelines (see section are fairly transparent on what categories of computational work are accepted. In particular, we made a case that our manuscript was within scope as it fit under "Substantially novel methods along with evidence for utility in medicinal chemistry with significant potential for advancing the field".

Regarding the use of ChEMBL, if you are referring to the retrospective test, both the training data and test data were indeed from ChEMBL, but from different time periods (we predicted newer data using older data).

Permalink to Comment

8. Ex Med Chem on March 17, 2014 1:13 PM writes...

There's a lot more to subtle group changes than trying to improve potency.

My experience is that potency on your intended target is the easiest part (i.e usually the hit to lead phase gets you in the ball park potency you want). Its the subtle or sometimes dramatic changes to balance everything else (during lead optimisation), from off target activity, physical properties, PK, metabolism that usually ends in the not so futile Me, Et, Pr, Br, Cl, OR, etc etc.

I've seen many examples where this kind of scan of a position has a found something unpredictable, such as a Cl picking up a strange H-bond type interaction.

As for "predictive computational models" resulting in lets just make 1 or 2 to test the hypothesis by using some flawed computational random ranking generator, I'd always balance this approach with a hefty dose of empirical med chem.

Permalink to Comment

9. ScientistSailor on March 17, 2014 3:54 PM writes...

@1 Petros,
Funny I hadn't heard about Topliss in years either, however it was mentioned in a talk this morning at the meeting I'm at. So that's twice in one day. Maybe time to resurrect it?

Permalink to Comment

10. Piero on March 18, 2014 3:41 AM writes...

An attempt of getting rid of even more "thinking head" chemists in favor of cheap hand labourers from far east?

Permalink to Comment

11. Anonymous on March 18, 2014 6:20 AM writes...

Another useless paper.

Permalink to Comment

12. Ex Med Chem on March 18, 2014 8:54 AM writes...

@11 nailed it!!

Permalink to Comment

13. Noel O'Boyle on March 18, 2014 10:41 AM writes...

@6: But MMP does work quite well for physicochemical properties such as solubility/logP. This follows from the fact that group contribution approaches are widely used for such properties. But as you say, with activities, it's doesn't work so well for the reason you and Derek describe.

@8: Sure, improving potency isn't everything and may indeed be the easiest part. That's no reason not to make it easier though. Also, the method is general; we've focused on potency as it's known that the matched pair approach doesn't work well for that.

I'm all aginst flawed computational random ranking generators too - we always use the Mersenne Twister.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry