About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Nexavar Licensed by Force in India | Main | Not Quite So Accelerated, Says PhRMA »

March 14, 2012

The Blackian Demon of Drug Discovery

Email This Entry

Posted by Derek

There's an on-line appendix to that Nature Reviews Drug Discovery article that I've been writing about, and I don't think that many people have read it yet. Jack Scannell, one of the authors, sent along a note about it, and he's interested to see what the readership here makes of it.

It gets to the point that came up in the comments to this post, about the order that you do your screening assays in (see #55 and #56). Do you run everything through a binding assay first, or do you run things through a phenotypic assay first and then try to figure out how they bind? More generally, with either sort of assay, is it better to do a large random screen first off, or is it better to do iterative rounds of SAR from a smaller data set? (I'm distinguishing those two because phenotypic assays provide very different sorts of data density than do focused binding assays).

Statistically, there's actually a pretty big difference there. I'll quote from the appendix:

Imagine that you know all of the 600,000 or so words in the English language and that you are asked to guess an English word written in a sealed envelope. You are offered two search strategies. The first is the familiar ‘20 questions’ game. You can ask a series of questions. You are provided with a "yes" or "no" answer to each, and you win if you guess the word in the envelope having asked 20 questions or fewer. The second strategy is a brute force method. You get 20,000 guesses, but you only get a "yes" or "no" once you have made all 20,000 guesses. So which is more likely to succeed, 20 questions or 20,000 guesses?

A skilled player should usually succeed with 20 questions (since 600,000 is less than than 2^20) but would fail nearly 97% of the time with "only" 20,000 guesses.

Our view is that the old iterative method of drug discovery was more like 20 questions, while HTS of a static compound library is more like 20,000 guesses. With the iterative approach, the characteristics of each molecule could be measured on several dimensions (for example, potency, toxicity, ADME). This led to multidimensional structure–activity relationships, which in turn meant that each new generation of candidates tended to be better than the previous generation. In conventional HTS, on the other hand, search is focused on a small and pre-defined part of chemical space, with potency alone as the dominant factor for molecular selection.

Aha, you say, but the game of twenty questions is equivalent to running perfect experiments each time: "Is the word a noun? Does it have more than five letters?" and so on. Each question carves up the 600,000 word set flawlessly and iteratively, and you never have to backtrack. Good experimental design aspires to that, but it's a hard standard to reach. Too often, we get answers that would correspond to "Well, it can be used like a noun on Tuesdays, but if it's more than five letters, then that switches to Wednesday, unless it starts with a vowel".

The authors try to address this multi-dimensionality with a thought experiment. Imagine chemical SAR space - huge number of points, large number of parameters needed to describe each point.

Imagine we have two search strategies to find the single best molecule in this space. One is a brute force search, which assays a molecule and then simply steps to the next molecule, and so exhaustively searches the entire space. We call this "super-HTS". The other, which we call the “Blackian demon” (in reference to the “Darwinian demon”, which is used sometimes to reflect ideal performance in evolutionary thought experiments, and in tribute to James Black, often acknowledged as one of the most successful drug discoverers), is equivalent to an omniscient drug designer who can assay a molecule, and then make a single chemical modification to step it one position through chemical space, and who can then assay the new molecule, modify it again, and so on. The Blackian demon can make only one step at a time, to a nearest neighbour molecule, but it always steps in the right direction; towards the best molecule in the space. . .

The number of steps for the Blackian demon follows from simple geometry. If you have a d dimensional space with n nodes in the space, and – for simplicity – these are arranged in a neat line, square, cube, or hypercube, you can traverse the entire space, from corner to corner with d x (n^(1/d)-1) steps. This is because each vertex is n nodes in length, and there are d vertices. . .When the search space is high dimensional (as is chemical space) and there is a very large number of nodes (as is the case for drug-like molecules), the Blackian demon is many orders of magnitude more efficient than super-HTS. For example, in a 10 dimensional space with 10^40 molecules, the Blackian demon can search the entire space in 10^5 steps (or less), while the brute force method requires 10^40 steps.

These are idealized cases, needless to say. One problem is that none of us are exactly Blackian demons - what if you don't always make the right step to the next molecule? What if your iteration only gives one out of ten molecules that get better, or one out of a hundred? I'd be interested to see how that affects the mathematical argument.

And there's another conceptual problem: for many points in chemical space, the numbers are even much more sparse. One assumption with this thought experiment (correct me if I'm wrong) is that there actually is a better node to move to each time. But for any drug target, there are huge regions of flat, dead, inactive, un-assayable chemical space. If you started off in one of those, you could iterate until your hair fell out and never get out of the hole. And that leads to another objection to the ground rules of this exercise: no one tries to optimize by random HTS. It's only used to get starting points for medicinal chemists to work on, to make sure that they're not starting in one of those "dead zones". Thoughts?

Comments (45) + TrackBacks (0) | Category: Drug Assays | Drug Development | Drug Industry History


1. Generic Company Team Player on March 14, 2012 9:38 AM writes...

Off Topic Post - I am sure that this has been brought up before but all this talk about the nuts and bolts of drug discovery has really got me interested. Is there a 'Drug Discovery for Idiots' type of book that could provide some background? I have a BS in Chemistry but only started working in pharmaceuticals for about 3 years. A Amazon search shows plenty of books but they get pricey. Any suggestions to seperate the good from the bad? Thank you and sorry for the off topic post.

Permalink to Comment

2. LeeH on March 14, 2012 10:04 AM writes...


You have deconstructed the drug discovery optimization process down reasonably well, but you haven't accounted for hybrid methods. For instance, in genetic optimization, you have greedy component (the crossing over, creating and selecting for offspring who have inherited exactly certain characteristics of both parents) which tends to push the process downhill, and the random component (mutations, which can give the offspring characterics that are completely novel), which tries to keep the population from completely being mired in local minima (optimizing by random HTS in your example). I would argue that inherently lead optimization does have elements of both processes. New molecules definitely inherit structure from older ones, and new elements creep in by virtue of adding new fragments or transforming existing ones. The problem with performing too big a mutation is that, as you point out, chemical space is so vast that if one takes too big a step, you're almost guaranteed of having a completely inactive compound. No project team can tolerate (for whatever reason) very much of this.

So in the end, similarity is king, but sometimes he is a cruel despot.

Permalink to Comment

3. Anonymous on March 14, 2012 10:13 AM writes...

HTS is one of the questions asked.

Permalink to Comment

4. John Spevacek on March 14, 2012 10:17 AM writes...

Mathematically, this is multivariable optimization. The while many routines exist, there is not way to guarantee that the found optimum is a global optimum and not just a local optimum. A corollary to this is that the global optimum is seldom found by optimizing each variable independently. And since this can be reduced to a mathematical problem, all fields of science (hard or otherwise) would kill for the results.

Consider the 2 variable problem of finding the latitude and longitude of the highest point in North America (Denali, in Alaska). HTS could be equivalent to getting the elevation at every degree of longitude and latitude, which would quickly tell you to search the western side of the continent. But if the Demon were to start at my hometown of St. Paul, he would spend a tremendous amount of time stuck in the Great Plains, and only by knowing a priori about the Rocky Mountains would it be able to move in that direction. But being a demon, it does have that knowledge.

The worst option (and yet this is so commonly done) would be to find the maximum elevation along the longitude and latitude separately. Again starting in St. Paul and looking at the 93W longitude line, you would find the greatest elevation somewhere up in northern Minnesota, about 48N latitude. Then looking along that line, you would end up somewhere in western Montana, nowhere near the optimal solution. If you were to explore longitude first and then latitude, you wouldn't even end up in the same spot, and while you could decide which of these two locations is better, you still wouldn't know that higher elevations exist.

And this is just with 2 variables. Imagine 10.

Permalink to Comment

5. luigi on March 14, 2012 10:45 AM writes...

Re #1 and "Drug Discovery for Dummies a very interesting - although not perfect book - is "Real World Drug Discovery" by R.M. Rydzewski. Amazon has it. I found it to be pretty good with out being either pedantic or having a big axe to grind.

Permalink to Comment

6. Jim on March 14, 2012 10:54 AM writes...

To quote Chey Chase imitating Gerald Ford, "I was told there'd be no math." Still, very interesting stuff. LeeH has it dead on - optimization for one variable (PK, affinity, solubility, CNS penetration) will affect the limit of quality of the other variables. There probably is no "right" answer for the best molecule, but there is a molecule that may best fit a given company's desired profile. It seems like you need a couple of demons going off and communicating with each other, building on their collective knowledge. Oh wait, that's a (good, high-functioning) chemistry department.

Permalink to Comment

7. Rick Wobbe on March 14, 2012 11:06 AM writes...

There are at least three major themes in your question, each of which could fill volumes: 1, what's the optimal chemical diversity for primary screening? 2, What's the optimal screen for primary screening?, 3, What's the best biology and chemistry to use for early lead optimization? On top of that, answers to questions 2 and 3 would necessarily begin with "That depends..."

Here's my thought about Q1 and Q2 under conditions where you have genetically validated target whose physiologic function we probably don't fully understand (which is probably most of them), no known small molecule substrate, possibility to interact directly with other proteins in a pathway and fewer than a dozen homologs (modest size gene family) and, finally, a phenotypic assay with a throughput of thousands of compounds per week. 1, run the phenotypic screen on mixtures of 10 compounds/assay from the most diverse library you can put together (defined by the best diversity algorithm you can find). 2, deconvolute the "hits" by titrating individual compounds in the primary assay, which'll also give you a ballpark potency. 3, Run the highest potency hits plus related compounds from your library that were inactive in the primary screen against the target assay.

This should leave you with 3 types of active compounds: 1, those that hit the phenotypic and target assays at sensical concentrations; related compounds that were inactive in phenotypic assay also inactive in target assay; 2, those that hit phenotypic and target assay; but related compounds showed divergent activity in phenotypic and target assays; 3, those that hit the phenotypic assay but did not hit the target assay. All three classes are interesting, but require different follow-up, which you may prioritize differently.

Compounds that hit the phenotypic and target assays whose relatives have parallel activity in phenotypic and target assays would be candidates for a target-based optimization, though I would strongly recommend doing what you can to show that hits inhibit phenotypic assay BY hitting the target in situ (many of the same tools for genetic target validation are B-E-A-utiful for this!)

For compounds (IMHO potentially very interesting and novel) that hit the phenotypic assay, but not the target assay, run as many off-the-shelf analogs as you can to see if there's evidence of an SAR in the phenotypic assay. Deprioritize compounds that don't show evidence of SAR. Then, use a set of active and inactive compounds, along with those wonderful genetic validation tools (again!) to ask some rudimentary mechanism questions (e.g. is the actual target likely in the same pathway as my genetically validated target?) You will get two kinds of compounds, both of which are interesting: those that impinge on your validated target's pathway and those that don't. For both types, you can continue to apply SAR and genetic validation tools to identify and validate the actual target, which may prove to be more interesting and useful than your original target!

The middle class of compounds, those that suggest divergent SAR in phenotypic and target assays, will require some head-scratching and you may want to set those aside because the results may tell you more about the quality of your library than the SAR of the compounds.

This will give you a tremendous amount of information, almost all of which will give you very important insights into the druggability of the phenomenon your trying to address. By starting with a phenotypic assay, you will learn things that you simply could not have learned any other way - think of it as sweeping "knowledge-space", not just chemistry space - no matter how clever you are, and that's often the basis of a competitive advantage.

Honestly, I'm a little embarrassed by what I've written above because I've taken a long time to describe what a great many people have considered to be the most sensible path for a very, very long time. Bottom line is that we should reconsider the paradigm of matching phenotypic assays with the broadest chemical diversity early, but NOT at the expense of using genetic target validation tools and target-based assays in following up. I still believe knowing the target can be like rocket fuel for any program in its early stages even as I believe that relying on it too much too early can be counterproductive.

By the way, if you want an example of compounds that followed this pathway, look at the HCV NS5A "inhibitors": pM compounds in whole cell assays with good preclinical and clinical pharmacology (so far!) that simply would not have been discovered using binding to the target as the primary assay.

Permalink to Comment

8. HTSguy on March 14, 2012 11:07 AM writes...

I agree w/ #3. Unless there is an active starting point, a purely iterative approach is unlikely to work, as optimization requires that you get results other than 0 activity (or statistically indistinguishable from 0) within one "unit" of your starting point. Since >99% of compounds in "random" libraries are inactive, this seems unlikely. There are also the problems of local optima mentioned by #4.

Permalink to Comment

9. Anon on March 14, 2012 11:26 AM writes...

While I know a point of this post is to deduce the best molecule using a logical, reductionist process. I feel that if it could be put in a process everyone would be doing it. And if it could be put in a linear process (test w, then modify it to x, test y, modify to z, etc) the Chinese CROs would have been dominating from day one.
I ask, is this a moot point? What is the culture of iterative vs. HTS?
We know how management works, and they'd never be patient enough for an iterative method. Can you imagine the face of an exec who is told that the project is moving by one experiment at a time? Maybe this is how smaller companies are going to be the way of the future? HTS on the other hand? That is WAY more in line with management's bonus structure. It allows you to accomplish your goals by hiring, firing, shifting, acquiring, merging, etc. Which is exactly what they are trained to do. They have been educated a certain way and have a manage.

The example I'd put up. In HTS, Management likes to pick the best examples out the screen and that means the final product will be the best. (What it really gets you is for the Simpsons followers). As opposed to a refined method that inherently relies on advnaces to get to the next step (

Permalink to Comment

10. g on March 14, 2012 12:05 PM writes...

I think that a big problem with the HTS method is that it reduces serendipity. You are looking for a specific key for a specific lock that may not even be important for that disease. This is the basic research bias-you think you know the disease, but you really don't. Even if you find that right key, it doesn't help treat the disease.

With more phenotypic approaches, you may find something from serendipity that works wonderfully, but does not target that one protein that you thought was so important. In fact, it might have high potency on multiple targets.

A limitation with phenotypic screening is that your models may not be very good and then you are simply finding compounds that pass your low-validity models.

Permalink to Comment

11. DCRogers on March 14, 2012 12:12 PM writes...

#3, nice analogy.

My thought was similar: that the 'best' case is not a single Blackian demon, but a roomful of them, all advising you: one for binding, one for transport, one for liver tox, one for hERG, one for metabolic stability, one for... you get the picture. And in this 'best' case where each demon gives the dang-best advice, you'd still end up with an irreconcilable cacophony of conflicting suggestions.

It's important to distinguish two kinds of high dimensionality here: of the molecules, and of the dependent variables. The former gets more explicit attention (e.g., surveying this large space is the point of HTS) but the latter is where most real pain (e.g., Phase III failures) resides.

Permalink to Comment

12. NJBiologist on March 14, 2012 12:12 PM writes...

@4, John Spevacek: The issues you're describing are analogous to those in nonlinear regression. Goodness of fit must be optimized by finding (sometimes several) good values for variables that don't have linear effects (and which may interact). Strategies for managing this seem to come in three flavors: observing the results for goodness of fit after you're done, selecting initial values that approximate the final value whenever possible, and tinkering with the search strategy by which you modify the initial value for new test values. All have their place, although attention to the last is probably the most helpful in your case. If you know the country is six time zones wide, and your algorithm has only searched the great plains, it's time for a change.

Permalink to Comment

13. Jack Scannell on March 14, 2012 12:13 PM writes...

I can claim the dubious "honour" of inventing the idea of the “Blackian demon”. I will briefly explain why I did it, then suggest another possible analogy, and finally make a couple of comments on the other posts.

First, the motivation: I am struck that R&D efficiency has declined on some important measures while things that - superficially at least - should have made it more efficient have improved (by orders of magnitude is some cases). I wonder if part of the explanation could be a change in the way that chemical space is searched. I don't know enough about chemical space to answer the question myself. However, I do know that the efficiency of different search methods can vary massively with the nature of the space that is being searched, and that high dimensional spaces have some unusual properties that means navigating them can be counter-intuitive. Perhaps a process that starts with a static pre-defined library (inevitably tiny vs. the universe of drug-like molecules) is intrinsically less likely to find an acceptable local optimum than one that allows directed iterative search from the start? That is the general question I am trying to raise. Clever quantitative chemists can probably answer the question fairly quickly one way or the other.

Second, there is another analogy that I nearly used and it may be better than either 20 questions or the Blackian demon. It is the somewhat familiar “six degrees of separation” idea. Suppose we want to find someone who has shaken hands with the president of the United States. Here the HTS analogy would be to build a big machine and ask 100,000 randomly selected people the question “have YOU shaken hands with the president?”. The pre-1980s method would be to randomly select one person and ask “can you direct me to the person you know who is most likely to have shaken hands with the president”. This should get you there in six (or fewer) steps. This closeness follows from the topography of high dimensional spaces, such as social networks [For more on this see Watts and Strogatz, who we cite in the NRDD article]. It is possible to span large distances with a small number of steps, and the chance of getting trapped in local optima can be low. This brings me on to the third point…..

Third, to respond specifically to post #4, we did, in the thought experiment, allow the Blackian demon to make only small steps. We did, however, allow the demon to look far into the distance before making each of the steps (as noted in post #4). Infinite vision is clearly cheating. Again, though, I wonder if someone with better skills than I have could model this question in a more plausible way. Given the nature of chemical space (local optima, sparse deserts, etc.), how far does the demon have to see in order to be better than a static library that has been pre-selected? What size steps could it plausibly take? How often would it step in the right direction given what can reasonably be measured to generate structure-activity relationships? How many molecules would you need at each iteration to generate decent structure-activity relationships? Etc? Etc?

Permalink to Comment

14. alig on March 14, 2012 12:23 PM writes...

This also seems to make the flawed assumption only one right answer exists. There are obviously more than one drug per disease and even more than one drug per target. So the question is how to find a right answer not the right answer.

Permalink to Comment

15. molecular architect on March 14, 2012 12:31 PM writes...

@ 1 Generic Company Team Player:

imho, one of the best introductory books for a chemist entering pharma research is

"Real World Drug Discovery: A Chemist's Guide to Biotech and Pharmaceutical Research" by Robert Rydewski
i.s.b.n. 0080466176

I used to work with Bob, he knows this stuff well.

Permalink to Comment

16. noname on March 14, 2012 12:47 PM writes...

Am I missing something? HTS doesn't produce drugs or candidates; it produces starting points for iterative medicinal chemistry. The 20 vs. 20K question thought experiment is therefore a red herring. The outputs are not comparable. In chemistry, unlike in English, there is no question you can ask that will neatly fractionate the space. i.e. you can't ask "will an active contain a pyridine ring?" and then subdivide the space on presence or absence of a pyridine.

When you have no starting point, what else can you do but randomly screen?

Permalink to Comment

17. Rick Wobbe on March 14, 2012 12:48 PM writes...

Jack, 13, If I understand your high-level view of ways to survey chemistry and target space, it seems to assume that you're asking clear, unambiguous questions that elicit a small number of compact, completely non-deceptive answers, phrased in a syntax that you can understand and use immediately. I think that doesn't happen as often as we'd like, especially at the outset of a problem, and you often don't know if you've done that until after you've gone down the path a while to find out where you messed up the process. Generally, you find out you need to ask different questions, phrased differently, of a different group and treat their answers with greater skepticism, which is inherently an iterative system that's predominantly empirical at the outset.

Permalink to Comment

18. Hap on March 14, 2012 1:20 PM writes...

I'm misunderstanding the output of HTS. I thought that HTS would either 1) tell you the hit/non-hit status of the members of the library (with a specified cutoff - so you can say which X members are active of a library of Y members and thus that Y-X members have activity below threshhold) or 2) tell you whether there were any hits from a compound library, and if so, allow you to determine and identify the hits relatively rapidly (not requiring synthesis and testing of all compounds or a significant fraction of them individually) - in addition, the lack of hits would be like 20,000 "No" answers rather than just one (if would tell what doesn't work, which is much better than nothing). In either of these cases, HTS would give you significant information (assuming the compounds and the assay results are reliable and useful). What am I missing?

Permalink to Comment

19. Jim on March 14, 2012 2:20 PM writes...

As I looked at Jack #13's statement about the declining efficiency of R&D, something crossed my mind that made me wonder if we're not fooling ourselves a bit. I've always bought into the declining efficiency story and there's just too much data out there to say that it's not true. But, if you consider that molecular biology has completely changed how this game is measured, we may not be doing as poorly as we think. I've only been in industry during the HTS era, so when I think of how drug discovery worked prior to that, I can only base that on what I've been told. What I do know now is that we can say program X, which was attempting to design an inhibitor of GPCR Y, failed, so we're at 0 for 1. Or 100. Back when screening was done with tissue baths and there were functional screens for a disease, you might have been looking at 5 or 6 different targets, so you never knew what your success rate was. Was it 1 for 5, or 1 for 10? o for 5 or o for 10?

I know we're bringing fewer drugs to market, but molecular biology has also changed what's required to do so. I never heard anyone say that drug discovery used to be easy.

Permalink to Comment

20. MoMo on March 14, 2012 2:44 PM writes...

The post and responses here are typical of hack-neyed science in drug discovery, and I am sorry I wasted heartbeats reading this.

Its a sign of scientific chaos and weakness when over-intellectualization occurs in drug discovery, and you all are guilty. Such approaches resulted in CombiChem, proteomics and a host of other "technologies" that have decimated the Pharma industry. So keep up the good work, you'll be selling Slurpies at the 7-11 any day now!

Just go get some molecules, roll up your sleeves and get to work! Cell, receptors and animals-It don't matter! You either get lucky in chemical space and activity or you BITE the HOOTERKNOB! And there are many chewing ferociously these days.

Talk is Cheap, and when you can't discover you start throwing such useless tripe-thought- patterns out at the general science public, thinking you all are genuises.

You don't fool all of us. The industry is declining because you all want "form" and forgot about "function".

Permalink to Comment

21. MTK on March 14, 2012 2:46 PM writes...

@13 and 17,

I think Rick sort of nailed it regarding the assumption of clear, unambiguous questions which give clean answers.

So what's the the most efficient approach if we turn Jack's analogy of how to find someone who has shook hands with the President into "Identify the US citizen most likely to be elected President in 2036?"


In essence that's what discovery groups try to do, right? Identify those candidates (fortuitously the same word in politics and drug discovery) that are most likely to succeed in the clinic.

Now you have to think about a myriad of factors which determine a person's electability and things are not quite as straightforward. I'm sure it could be modeled, but that requires a large number of assumptions.

Permalink to Comment

22. noname on March 14, 2012 3:26 PM writes...

Any day you read the word hooterknob is a good day.

Permalink to Comment

23. LeeH on March 14, 2012 4:10 PM writes...

Face it, guys. Drug discovery should be a (not necessarily simple) exercise in multiparameter optimization, but it's not. All of the properties that need to be ultimately optimized are NEVER measured for each compound. For practical reasons (cost and time), a few sentinel properties are measured - always activity, sometimes or rarely (depending on circumstance) solubility, hERG, some measure of absorption or metabolism or tox or something else, but NEVER human properties (such as oral absorption, until you're in clinical trials). So you're trying to solve a problem where you're not even completely sure what the critical issues are. This will ALWAYS be a problem where it's better to be lucky than good.

It's funny how HTS is often blamed. After 1 compound in a billion (my guess) possible descendant compounds (based on some arbitrary starting point) fails a clinical trial, somehow it's the starting point that was the issue. Not the unbelievably complex and chaotic path (however well-intentioned and scholarly) that that project team took. How can having more choices for a starting point be worse than having fewer?

(And don't bellyache about the quality of the hits from an HTS. If you get crap out, someone didn't do their homework when they put the collection together. It's not the HTS's fault that someone put them there in the first place)

Permalink to Comment

24. MikeC on March 14, 2012 5:33 PM writes...

Hey Derek,
thanks for picking up this thread. After making that comment (#56),
I was reading the NRDD paper (and thank you Jack Scannell for providing a reprint!), thinking about phenotypic vs binding assays, and wondering if the "low hanging fruit" argument could be rescued by considering not how easy a condition is to drug per se, but rather the availability of assays for the condition that are highly predictive of an approvable drug. Conditions with solid, highly predictive phenotypic animal models (pain, infection, diabetes, etc) would be the lowest hanging fruit. Conditions where the best assays are less predictive (they are noisy, off target, or measure a proxy rather than the condition itself) would be higher up. Conditions where creating the assay required new technology would be higher as well, and conditions with no good assay yet (Alzheimer's) would be mostly out of reach.

One could attempt to quantitate how predictive the phenotypic assays of each era were for the drugs actually finally approved and then look for a trend ... and the idea of doing that correctly makes my head hurt.

Permalink to Comment

25. Biotechtranslated on March 14, 2012 5:47 PM writes...

@24, MikeC

The exact same thought has crossed my mind as well!

This is probably really oversimplifying things, but what if more R&D dollars went into developing reliable phenotype assays?

We have to remember this is drug DISCOVERY. If you look into the golden age of drug discovery, you'll see that a lot of it was driven by biological discoveries (new receptors, new endogenous ligands, etc).

I just wonder if the switch from "let's find something new" to "let's see if we can push the boundaries of what we already know" has anything to do with the current R&D woes?

After all, R&D hasn't been all bad, in the last 20 years the numbers of new, innovative biologics developed has been incredible.


Permalink to Comment

26. MikeC on March 14, 2012 6:33 PM writes...

@25 "This is probably really oversimplifying things, but what if more R&D dollars went into developing reliable phenotype assays?"

That's certainly been suggested as a better focus. The problem (I think pointed out here some weeks ago) is that you generally can't know if you've created good in vitro/in vivo assay until you have a good drug to test it against. Until then you're just measuring how well your assay correlates with your chosen proxies for success, not success itself.

Permalink to Comment

27. anon2 on March 14, 2012 7:22 PM writes...

HTS is a tool. The rest......blah, blah, blah, human behavior and ego.......

Permalink to Comment

28. SteveM on March 14, 2012 7:35 PM writes...

This is a great discussion. I started out in Chemistry, but then migrated to Applied Mathematics, (Mathematical Optimization mostly, but not Pharma related).

Since I moved out of the lab years ago, the instrumentation has gotten fantastic, (I was pre-FFT NMR) and on its face, computational chemistry has exploded. So I'd think that the increase in throughput per chemist would be exponential.

So what went wrong? I mean there is so much money thrown into the Pharma R&D domain, it's hard to believe that a shared understanding about Discovery Best Practices has not be developed.

I'm really curious as to what's going on there. Is the flaw MBA intrusion or scientific hubris?

Permalink to Comment

29. Anonymous BMS Researcher on March 14, 2012 8:06 PM writes...

One major problem in my experience is we lack a good way to rank-order our SAR matrices. I've sat in many a Working Group meeting where people say, "well these have the smallest EC50 values, but those have better solubility and these others have longer half-lives..."

There is also an interesting mathematical proof known as The No Free Lunch Theorem -- look it up in Wikipedia -- which basically says for any multidimensional search space there cannot exist a globally optimal search strategy unless you can take advantage of domain knowledge (I've been in Big Pharma too long, I nearly wrote "leverage domain knowledge!").

Permalink to Comment

30. Biotechtranslated on March 14, 2012 8:21 PM writes...

@26, MikeC

I agree it's not an easy answer and I hesitate to over-generalize as to what might be the best solution.

One can't help but notice how many breakthroughs in drug discovery were completely serendipitous. And I don't mean "we tried random things", but rather "I thought this would be a good drug for X, but it turns out it works for Y!".

So taking an even simpler view: Are we spending too much effort on "see if compound X works for assay Y" and too little effort on "I wonder if compound X will work for diseases A,J and U?"

I say this because although a binding assay might give you great data (if it says it binds, you can be pretty confident it does), but it also gives you a binary answer (good enough/not good enough).

Throw a drug into a animal model and when that drug you thought would make a good antibiotic causes the animal to stay up all night, find out why!

I remember in my days in the lab when an AIDS drug we were working on was discontinued because the patients being dosed couldn't stop laughing. Nobody follow up as to why that happened, it was just judged as "not a suitable drug" and all further research was halted. Maybe it potentiated endogenous dopamine or maybe it hit an unknown receptor? I guess we'll never know.

A healthy dose of scientific curiosity can go pretty far when you're looking to break new ground.


Permalink to Comment

31. pharmadude on March 14, 2012 8:52 PM writes...

@30 cross screening of hits against other targets is very common. Actually, much of an HTS compound collection is made up of hits from other drug discovery programs. My take, local minima make it impossible for the 'demon' strategy to work in drug design. Med chemists are blind to their structural surroundings. They can be one fluorine away from success and not have the slightest clue of it...and end up dropping the program. Med chemists also have no idea how to jump from one scaffold to another. I don't any of this is being taken into account when describing the benefit of an iterative approach versus HTS. Scannel says he allows the demon a long distance view of its surroundings. But the med chemists have no distant view, they generally can't see even a single atom switch away.

Permalink to Comment

32. EB on March 14, 2012 9:45 PM writes...

still unsure how 2^20 > 600,000 ensures that the 20 questions route is bound to succeed

couldnt get past the that point to read the rest of this

Permalink to Comment

33. Bob on March 15, 2012 1:00 AM writes...

Agree with #16.

HTS is used to find hits and start a med chem 'iterative process' not to directly find drugs.

It gives the impression the authors are not fully aware of the drug discovery process.

Permalink to Comment

34. KG on March 15, 2012 1:09 AM writes...

@ 7 (Rick Wobbe) I really appreciated your insights. Thanks for posting! Can you go into a little more detail about "those wonderful genetic validation tool"? What are you specifically referring to? siRNA? What if the target isn't in the genetic family you were intending to look at?

Permalink to Comment

35. thomas on March 15, 2012 1:19 AM writes...

Relevant to some of Derek's original conceptual questions: Even with perfect assays, you do need some helpful structure in chemical space -- there needs to be a non-negligible fraction of the space where local improvements exist and can lead to a good molecule.

Without such structure the No Free Lunch theorems in optimization say that no approach does better on average than random search.

@EB: this is also relevant to your question -- the 2^20> 600000 is relevant if you can always find a question that cuts your remaining sample of possibilities in half. In generic optimization problems you can't. Whether you can in chemistry is not something I'm qualified to have an opinion on.

Permalink to Comment

36. RKN on March 15, 2012 1:22 AM writes...


I thought the same thing. While 2^20 (1048576) does enumerate a solution space >600K, it is not clear to me why 20 question would "usually" succeed in identifying the one correct word among 600K possibilities.

To be true, it seems to me that each successive question would need to reduce the remaining solution space by half or better. Intuitively, it doesn't seem to me that one could count on that, given the properties of words generally.

Interesting problem, though.

Permalink to Comment

37. Matt on March 15, 2012 1:38 AM writes...

"still unsure how 2^20 > 600,000 ensures that the 20 questions route is bound to succeed"

Binary search. If you have an alphabetical list of English words, like a dictionary, pick the middle word in the list and ask "is the mystery word before or after this word?" Depending on the answer, pick the top or bottom half of the list and repeat, picking the middle word in the successively smaller sub-lists, until there is only one word remaining. It takes the base-2 logarithm of N, rounded up to the nearest whole number, to search N entries.

Permalink to Comment

38. Anonymous BMS Researcher on March 15, 2012 6:25 AM writes...

As "#37 Matt" points out, the big win for 20 questions is binary search: if each iteration removes half the solution space from consideration then you get what computer science types call an "N Log N" problem where solution time grows much more slowly than does the size of the solution space. In fact, even if each iteration only eliminated five percent of the solution space you would still have an N log N problem: 260 iterations in which each iteration removes about 5% of the solution space could pick the winner from 600 thousand possibilities (mathematically, 0.95 to the power 261 equals 1 over 651837).

But of course the trouble is, we just do not know good ways to cut down the solution space for huge multivariate optimization problems such as drug discovery. Neuroscience has a particularly challenging solution space: the human brain is by far the most complex object of its size known to our science.

Permalink to Comment

39. Anonymous BMS Researcher on March 15, 2012 6:39 AM writes...

Further clarification on my comment about binary search: you can only apply binary search on a SORTED LIST of possibilities. If your dictionary or phone book were in random order then it would be much harder to find the desired entry.

Many years ago, back in the days when I still used printed phone books and the equivalent of Search Engine Optimization was naming your company "AAA-Able Plumbing" so it would be the first listing under Plumbing in the Yellow Pages, I recall a news item about somebody whose name was something like "Zimmerman" filing a lawsuit claiming the phone company was discriminating against customers whose names happened to be late in the alphabet. The Court dismissed the suit on the grounds that any alternative to alphabetical order would make the phone book nearly useless.

Permalink to Comment

40. Rick Wobbe on March 15, 2012 7:45 AM writes...

This may only make things worse, but I think we need to clarify the 20 questions metaphor, otherwise this discussion is a "3 blind men and an elephant" situation, to use another metaphor. Maybe I'm just slow, but I need a clearer idea of: 1, what is asking the question?; 2, to what is the question being addressed?; 3, how many degrees of freedom can the answer have?

It seems to me that, in this metaphor: 1, an ASSAY is asking the question "do you 'hit' me or not?"; 2, a PIECE OF CHEMICAL SPACE (i.e. a compound) is being queried; 3, the answer is either "YES" or "NO", or maybe "YOU'RE COLD" or "YOU'RE GETTING WARMER". Do we agree that that's a sufficiently accurate simplification?

If that's an accurate disambiguation of the metaphor, then it seems to me that an early problem is ensuring that a valid questioner asks a valid question. In the real world, it's often the case that one or both of those conditions is not met (e.g. the molecular target is not pharmacologically valid, the inherent assumption of cell-free assays - that the cellular milieu of the target has negligible influence on drug-target interaction - is wrong), in which case you will never get useful answers, regardless of how many "questions" you ask. In other words, the real search is actually a matrix of at least two query spaces, assay space and chemical diversity space, which is a vastly larger space.

Permalink to Comment

41. Rick Wobbe on March 15, 2012 7:56 AM writes...

KG, 34,
Derek is moving on (new post), so rather than making more people "sorry [they] wasted heartbeats reading this.", you can look me up on LinkedIn (that's right, I'm foolish enough to use my real name here) to share ideas.

Permalink to Comment

42. exGlaxoid on March 15, 2012 8:41 AM writes...


There is one other issue in drug discovery that also helps explain the lack of "success" currently. That is the fact that most of chemical space is now patented for something. So the HTS, the iterative 20 question game and others will not work when most of the chemicals found are already claimed. That may be one of the biggest reasons that the number of novel drugs getting discovered is dropping every year, since most compounds in the pool are already claimed by someone in some overly wide scope patent.

So if you want to see an improvement in drug discovery, I suggest the following:

1) Allow any new drug that is not already on the market to have 10-12 years exclusivity by FDA rules, rather than patents. I believe that this is one major reason that peptides and proteins are doing well currently, as they were not in the patented drug space until more recently, thus easier to get novelty on.

2) Allow low dose, phase 1 type, testing of a small number of drugs (say 10) per disease area, per year, by companies, to allow the projects a chance to get some relevant human data.

3) Allow clinical trials to proceed faster and with smaller numbers with more specific milestones and approval rules. If there is a clear guideline on what will be approved, then it is easier to tell if the trial is worth proceeding. But the FDA has frequently raised the bar after trials are started.

4) reduce the liability of drug companies for FDA approved drugs. This would spur novel drugs as well.

Some people would complain about these, but they would all lead to more novel treatments, if that is the goal.

Permalink to Comment

43. Phil on March 15, 2012 9:08 AM writes...

As someone who started his career in pharma and moved to polymer chemistry, I've observed that formulation of paints, adhesives, and other products works a lot like the old-fashioned pre-HTS approach to drug discovery. Just like in pharma, there are too many variables to do a brute-force experiment, and we usually don't understand how all the ingredients in the formulation interact with one another. We generally develop formulations through iterative rounds of experiments with a good deal of intuition and educated guesses thrown in. It's not perfect, but it gets results, and it's a good way of dealing with a multi-variable system you don't fully understand (such as the human body)!

Permalink to Comment

44. Jack Scannell on March 15, 2012 9:29 AM writes...

This has been an education. A general thank you.

Also, in defense of my NRDD co-authours (particularly Brian Warrington), any naivety I have expressed on the subject of discovery chemistry (posts #16 and #33) is all mine not theirs.

Permalink to Comment

45. Lars on March 16, 2012 7:05 PM writes...

@Anonymous BMS researcher: Binary search is O(log n), not O(n log n) - though the sorting process (assuming you need to do that first because the items are unsorted) is, in fact, O(n log n).

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry