There's an on-line appendix to that Nature Reviews Drug Discovery article that I've been writing about, and I don't think that many people have read it yet. Jack Scannell, one of the authors, sent along a note about it, and he's interested to see what the readership here makes of it.
It gets to the point that came up in the comments to this post, about the order that you do your screening assays in (see #55 and #56). Do you run everything through a binding assay first, or do you run things through a phenotypic assay first and then try to figure out how they bind? More generally, with either sort of assay, is it better to do a large random screen first off, or is it better to do iterative rounds of SAR from a smaller data set? (I'm distinguishing those two because phenotypic assays provide very different sorts of data density than do focused binding assays).
Statistically, there's actually a pretty big difference there. I'll quote from the appendix:
Imagine that you know all of the 600,000 or so words in the English language and that you are asked to guess an English word written in a sealed envelope. You are offered two search strategies. The first is the familiar ‘20 questions’ game. You can ask a series of questions. You are provided with a "yes" or "no" answer to each, and you win if you guess the word in the envelope having asked 20 questions or fewer. The second strategy is a brute force method. You get 20,000 guesses, but you only get a "yes" or "no" once you have made all 20,000 guesses. So which is more likely to succeed, 20 questions or 20,000 guesses?
A skilled player should usually succeed with 20 questions (since 600,000 is less than than 2^20) but would fail nearly 97% of the time with "only" 20,000 guesses.
Our view is that the old iterative method of drug discovery was more like 20 questions, while HTS of a static compound library is more like 20,000 guesses. With the iterative approach, the characteristics of each molecule could be measured on several dimensions (for example, potency, toxicity, ADME). This led to multidimensional structure–activity relationships, which in turn meant that each new generation of candidates tended to be better than the previous generation. In conventional HTS, on the other hand, search is focused on a small and pre-defined part of chemical space, with potency alone as the dominant factor for molecular selection.
Aha, you say, but the game of twenty questions is equivalent to running perfect experiments each time: "Is the word a noun? Does it have more than five letters?" and so on. Each question carves up the 600,000 word set flawlessly and iteratively, and you never have to backtrack. Good experimental design aspires to that, but it's a hard standard to reach. Too often, we get answers that would correspond to "Well, it can be used like a noun on Tuesdays, but if it's more than five letters, then that switches to Wednesday, unless it starts with a vowel".
The authors try to address this multi-dimensionality with a thought experiment. Imagine chemical SAR space - huge number of points, large number of parameters needed to describe each point.
Imagine we have two search strategies to find the single best molecule in this space. One is a brute force search, which assays a molecule and then simply steps to the next molecule, and so exhaustively searches the entire space. We call this "super-HTS". The other, which we call the “Blackian demon” (in reference to the “Darwinian demon”, which is used sometimes to reflect ideal performance in evolutionary thought experiments, and in tribute to James Black, often acknowledged as one of the most successful drug discoverers), is equivalent to an omniscient drug designer who can assay a molecule, and then make a single chemical modification to step it one position through chemical space, and who can then assay the new molecule, modify it again, and so on. The Blackian demon can make only one step at a time, to a nearest neighbour molecule, but it always steps in the right direction; towards the best molecule in the space. . .
The number of steps for the Blackian demon follows from simple geometry. If you have a d dimensional space with n nodes in the space, and – for simplicity – these are arranged in a neat line, square, cube, or hypercube, you can traverse the entire space, from corner to corner with d x (n^(1/d)-1) steps. This is because each vertex is n nodes in length, and there are d vertices. . .When the search space is high dimensional (as is chemical space) and there is a very large number of nodes (as is the case for drug-like molecules), the Blackian demon is many orders of magnitude more efficient than super-HTS. For example, in a 10 dimensional space with 10^40 molecules, the Blackian demon can search the entire space in 10^5 steps (or less), while the brute force method requires 10^40 steps.
These are idealized cases, needless to say. One problem is that none of us are exactly Blackian demons - what if you don't always make the right step to the next molecule? What if your iteration only gives one out of ten molecules that get better, or one out of a hundred? I'd be interested to see how that affects the mathematical argument.
And there's another conceptual problem: for many points in chemical space, the numbers are even much more sparse. One assumption with this thought experiment (correct me if I'm wrong) is that there actually is a better node to move to each time. But for any drug target, there are huge regions of flat, dead, inactive, un-assayable chemical space. If you started off in one of those, you could iterate until your hair fell out and never get out of the hole. And that leads to another objection to the ground rules of this exercise: no one tries to optimize by random HTS. It's only used to get starting points for medicinal chemists to work on, to make sure that they're not starting in one of those "dead zones". Thoughts?