About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Alex Shulgin, 1925-2014 | Main | Pharmalot Returns »

June 4, 2014

Predicting New Targets - Another Approach

Email This Entry

Posted by Derek

So you make a new chemical structure as part of a drug research program. What's it going to hit when it goes into an animal?

That question is a good indicator of the divide between the general public and actual chemists and pharmacologists. People without any med-chem background tend to think that we can predict these things, and people with it know that we can't predict much at all. Even just predicting activity at the actual desired target is no joke, and guessing what other targets a given compound might hit is, well, usually just guessing. We get surprised all the time.

That hasn't been for lack of trying, of course. Here's an effort from a few years ago on this exact question, and a team from Novartis has just published another approach. It builds on some earlier work of theirs (HTS fingerprints, HTSFP) that tries to classify compounds according to similar fingerprints of biological activity in suites of assays, rather than by their structures, and this latest one is called HTSFP-TID (target ID, and I think the acronym is getting a bit overloaded at that point).

We apply HTSFP-TID to make predictions for 1,357 natural products (NPs) and 1,416 experimental small molecules and marketed drugs (hereafter generally referred to as drugs). Our large-scale target prediction enables us to detect differences in the protein classes predicted for the two data sets, reveal target classes that so far have been underrepresented in target elucidation efforts, and devise strategies for a more effective targeting of the druggable genome. Our results show that even for highly investigated compounds such as marketed drugs, HTSFP-TID provides fresh hypotheses that were previously not pursued because they were not obvious based on the chemical structure of a molecule or against human intuition.

They have up to 230 or so assays to pick from, although it's for sure that none of the compounds have been through all of them. They required that any given compound have at least 50 different assays to its name, though (and these were dealt with as standard deviations off the mean, to keep things comparable). And what they found shows some interesting (and believable) discrepancies between the two sets of compounds. The natural product set gave mostly predictions for enzyme targets (70%), half of them being kinases. Proteases were about 15% of the target predictions, and only 4% were predicted GPCR targets. The drug-like set also predicted a lot of kinase interactions (44%), and this from a set where only 20% of the compounds were known to hit any kinases before. But it had only 5% protease target predictions, as opposed to 23% GPCR target predictions.

The group took a subset of compounds and ran them through new assays to see how the predictions came out, and the results weren't bad - overall, about 73% of the predictions were borne out by experiment. The kinase predictions, especially, seemed fairly accurate, although the GPCR calls were less so. They identified several new modes of action for existing compounds (a few of which they later discovered buried in the literature). They also tried a set of predictions based on chemical descriptor (the other standard approach), but found a lower hit rate. Interestingly, though, the two methods tended to give orthogonal predictions, which suggests that you might want to run things both ways if you care enough. Such efforts would seem particularly useful as you push into weirdo chemical or biological space, where we'll take whatever guidance we can get.

Novartis has 1.8 million compounds to work with, and plenty of assay data. It would be worth knowing what some other large collections would yield with the same algorithms: if you used (say) Merck's in-house data as a training set, and then applied it to all the compounds in the CHEMBL database, how similar would the set of predictions for them be? I'd very much like for someone to do something like this (and publish the results), but we'll see if that happens or not.

Comments (11) + TrackBacks (0) | Category: Drug Assays | In Silico


1. Anonymous on June 4, 2014 8:33 AM writes...

Will each additional assay provide as much value as the last? Seems like a great way to increase R&D costs and decrease R&D productivity even further, if you ask me.

Permalink to Comment

2. anon the II on June 4, 2014 8:58 AM writes...

If one guy can do this part time, then fine. Otherwise, hire a few medicinal chemists. They can do the job better, make more molecules at the same time and they need the work.

Permalink to Comment

3. Neo on June 4, 2014 9:24 AM writes...

Derek, your comment "People without any med-chem background tend to think that we can predict these things, and people with it know that we can't predict much at all." puzzles me.

The authors of this study achieved 73.8% hit rate in prospective validations. Are you implying that they are lying? Or perhaps that these Novartis scientists don't have any med-chem background?

Perhaps you should look for other informatics collaborators if you think this way...

Permalink to Comment

4. Cellbio on June 4, 2014 10:01 AM writes...


You ask a very relevant question. In my opinion, drawn from experience, one does have to assess the value of each assay. At some point in the evolution of a data set, it becomes possible to compare how each assay is segregating compound behavior. The work I was associated with demonstrated that a small number of assays, ~4 bioassays chosen from a larger set, were sufficient to account for the bulk of behavioral diversity.

The value of these distinctions is not guaranteed, however, it could be demonstrated that screening for previously unappreciated biological impacts did segregate compounds that had very different profiles in tox studies (benign to nasty).

To be fair, most of my experience could suffer from over-fitting data sets. Only one example was prospective, but it did work to find compounds that escaped tox problems that were not associated with biochemical profiles and therefore only known after scale up and expensive tox studies. Few examples like this that save one from preclinical or clinical failure are necessary to offset the costs of additional screening data.

However, it is true that measuring more is not an assurance of success. In my opinion, this is one of the ways that pharma responds, non-productively, to risk inherent in human biology and pharmacological intervention.

Permalink to Comment

5. Derek Lowe on June 4, 2014 10:39 AM writes...

#3 Neo - I'm not putting down the Novartis work at all; I think it's quite worthwhile. It does indeed look like a step towards predicting such things. But there's a long way to go: the fact that they confirmed 73% of their predictions is good news, but what we don't know are the false negatives: how many of the things that were predicted to be inactive were actually active? That would require a good deal of (tedious) work to get data on.

And even more importantly, this technique generates information on binding sites that it knows about, based on assay data. There are far more binding sites in vivo, though, and most of them are very poorly characterized, if at all. There's not much way an effort like this could tell you about many of those; they're "unknown unknowns".

What I run into when talking with the general public, though, is some sort of idea that we can look at a compound and get some sort of instant profile on it - "Oh, that'll do this and that and the other thing". Compared to that, we really can predict very little.

Permalink to Comment

6. simpl on June 4, 2014 11:03 AM writes...

"To predict truly novel and unexpected small molecule–target interactions, compounds must be compared by means other than their chemical structure alone".
Chemistry is essentially about predicting reactivity, m.p., solubility etc. on the basis of structure. I finally grasped the paradox when the HTS people explained that they were searching to maximise diversity of structures with a similar effect, so that they can choose between alternative structural strategies when tox. problems appear.

Permalink to Comment

7. JoJo on June 4, 2014 11:23 AM writes...

we can look at a compound and get some sort of instant profile on it - "Oh, that'll do this and that and the other thing".

Isn't that what med. chemists do all the time? I see such comments on this blog all the time, including phys. chem. properties...

Permalink to Comment

8. Joshua Cranmer on June 4, 2014 3:25 PM writes...

Just reporting predictive success rates can be misleading. For example, I can build a spam detector with roughly 90% accuracy: just say every message is spam. It's not clear to me from this description (I never took any biochem) how much of the 73% accuracy can be gotten just by blind and naive guessing instead of actually intelligent prediction.

Permalink to Comment

9. Harrison on June 4, 2014 5:46 PM writes...

Although he might not have meant it this way, I interpreted Derek's comment about in vivo testing as a statement towards the PETA-types who insist that science is beyond animal testing and that it is completely unnecessary. I think most pharmacologists would love an completely in vitro system that accurately predicated activity 80% of the time.

Permalink to Comment

10. Anonymous on June 4, 2014 6:25 PM writes...

People should read Shiochet's paper on polypharmacology

Permalink to Comment

11. pgwu on June 4, 2014 10:30 PM writes...

Wonder how this approach differs from what Terrapin (now Telik) started about 20 years ago using HTS data to get some kind of molecular fingerprints.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry