A colleague e-mailed me last night with an observation that he’d heard recently: “Have you noticed,” he said, “that the number we use get less and less precise, the farther away they get from the chemists?”
Thinking about it, I’d have to say that’s right, although I don’t think that we can claim any particular credit. After all, we have our feet planted in physics. Our molecular weights are based on the weights of the elementary particles, which are known. . .pretty exactly. And we’ve got a pretty good handle on molecular formulae, too, so we can go around getting mass spectra out to four decimal places and learning all kinds of things from them.
But then when these compounds get run through the primary assays, purified enzymes or the like, the numbers start getting fuzzier. Protein preps are all subtly different – ideally, they should be different in ways that make no difference, but then there’s the actual running of the assay to consider. Reproducibility varies, but no on gets worked up about a compound that shows, say, a 3 nanomolar inhibition in one assay and a six nanomolar in another. “Single digit nanomolar” is all we need to know, and it’s good odds that the next one will split the difference and come in at four or five, anyway.
But then you go to cellular assays, and things get more complicated. Cells are ridiculously more complex than enzymes, and there are so many more things that can kick around your data. Where did this batch of cells come from? How many times have they divided? What stage of their life cycles are they in, on average? What are they growing on, and in? Are they clean (no nasty mycoplasms?) Even if you’ve got all those things under control, your compounds are going to be exposed to untold numbers of other proteins now, all with potential binding sites and activities of their own. And that’s if they can even get past the cell membrane at all – many don’t, for reasons that are not always clear. No, your cellular numbers are always going to have a pretty good spread in them.
But then you go to whole animals, which have all those problems and more. Absorption from the gut and later metabolism are tricky and poorly understood processes, and they’re affected by a bewildering number of variables. Is your compound crystalline? Same way each time? What’s the particle size? How much water does that powder have in it? What are you taking the compound up in to dose it? Have the rats eaten recently? What time of day are they getting the compound? Male rats, or female? Nothing bothering them, no loud noises or change in lighting? Every single one of these things can throw your data around all over the place.
But now you’re up to clinical trials, and animal data is as orderly as a brick wall compared to human data. All those variables listed above still obtain, although you've presumably controlled for several of them by the time you're in the clinic. But that's more than made up for by the heterogeneity of your human volunteers and that of your all-too-human clinical staff. (Ask anyone who's worked up close with clinical data, and you'll hear all about it).
So we start from chemistry, where if we make a compound once we assume that we can always make it again - not always a warranted assumption, mind you, but mostly true. Then we move to in vitro assays, where you really need to have n-of-3, at least, so you can get error bars on your numbers. And we end up in human trials with hundreds (or thousands) of people taking the resulting drug, desperately hoping all the while that we'll be able to pick out an interpretable signal in all the noise. That's the business, all right.
1. fred on November 20, 2008 10:18 AM writes...
"if we make a compound once we assume that we can always make it again"
I rarely assume that, as a pessimist, but, of course, I almost always CAN.
As for assay numbers-- when I present, I don't tend to use more than 2 sig figs, knowing how "fuzzy" biology is......
Permalink to Comment2. daen on November 20, 2008 10:46 AM writes...
Good grief Derek, have you been eavesdropping on our lunchtime conversations? This very topic came up today, regarding how precisely assay data should be stored in a database. I almost suggested "to the nearest order of magnitude", but thought better of it.
Permalink to Comment3. Cellbio on November 20, 2008 11:00 AM writes...
As someone reporting cell numbers back to chemists, I have a perspective. The number everyone wants is the IC50. For most curves, the IC10 to IC90 spans a log or so. this means that there will be few data points from within the meaningful part of the curve, and therefore the "number" comes more from curve fit determined from a top and bottom plateau, one or two points in the middle and assumptions about curve shape. In contrast, looking at the curve shape can be much more meaningful. We worked to have all the data of the curve captured so it could be called up for review, and also reported every IC50 with 95% CI. This is helpful, as for the example Derek gave, we could almost never say that 2-3 fold differences in IC50 were really different due to the confidence of our measures.
In clinical trials, we ran a PD assay. The range of responses to our stimulus was staggering. We had to forget about drawing IC50 curves for Schild analysis since some patients never reached an upper plateau of response, exceeding the levels of other patients by more than a thousand fold. Sensitivity, or IC50, also ranged by 2-3 logs. this meant that for any given concentration of a stimulus that we used to measure drug inhibition, the variability between patients in terms of strength of stimulus and magnitude of response was huge. Try to measure an IC50 in this setting! I thought we should just revert to old-time drug development principles and ask the patient if they felt better.
Permalink to Comment4. CMC guy on November 20, 2008 1:38 PM writes...
Perhaps trend as expressed moves in more fuzzy direction however not sure comparing Apples to Apples since Molecular Formula/Weight set by compound but when think of yields. purity and other experimental aspects of chemistry they can get highly variable also.
At the same time I would have taken the path slightly further by adding "Marketing assessments" for a R&D project beyond Clinical as even greater difference in most estimates verses real values.
Permalink to Comment5. Sili on November 20, 2008 1:43 PM writes...
You're telling me that none of that is controlled for?! But it's so obvious! - And those are just the known unknowns. Permalink to Comment6. HelicalZz on November 20, 2008 2:51 PM writes...
Chemists are well up the scale in this 'all to true' cartoon:
http://xkcd.com/435/
Zz
Permalink to Comment7. Great Molecular Crapshoot on November 20, 2008 6:27 PM writes...
Noise in the measured numbers always makes me uneasy when people claim to be able to predict human dose directly from a chemical structure. This is difficult enough even when you've measured logP, protein binding etc.
Ratios of noisy numbers can be problematic. What does 10-fold selective against an anti-target really mean? In PK/PD modelling it is common to take plasma concentration at particular time points (let's ignore uncetainties in how those time points are defined), mulitply my free fraction and divide by IC50.
Permalink to Comment8. Pat Pending on November 20, 2008 6:32 PM writes...
"At the same time I would have taken the path slightly further by adding "Marketing assessments" for a R&D project beyond Clinical as even greater difference in most estimates verses real values."
I have seen major differences between market assessments made by R&D and by Sales and the projected sales changing every six months.
Permalink to Comment9. imatter on November 20, 2008 10:56 PM writes...
Too true.
I just did a series of experiments where the biologists missed an interesting observations because they were working with mg/mL instead of moles and never bothered to explore the limits of detections.
Forget real numbers as data. There's papers that have (+) or (-) as observations. If it's really good data, it's (+++)!! And Western blots?
Permalink to Comment10. Kay on November 21, 2008 8:14 AM writes...
I see that chemists cringe at all the uncertainty that attempts to complicate their world. At least they can take comfort in the fact that the Rule of 5 continues to generate winners and that non-compliant compounds will never survive to waste the company’s assets. Within all this complexity, simple folks have at least one safe harbor.
Permalink to Comment11. carras on November 21, 2008 8:31 AM writes...
I hope we all realize that calculating bar errors from 3 data points is absolutely meaningless, and what is much worse it gives your data a false air of scientific respectability. Much more honest to forgo bar errors altogether. That said, let the one who has not ever done it to throw the first stone (myself included).
Permalink to Comment12. John Johnson on November 21, 2008 12:50 PM writes...
In clinical, at least, that should change to "numbers, if obtained at all, are noisy."
Permalink to Comment13. Cyan on November 21, 2008 1:23 PM writes...
It's not meaningless -- it's just highly sensitive tothe statistical assumptions which justify the calculation. Hmm... perhaps this is a distinction without a difference.
Permalink to Comment14. rhovero on November 21, 2008 4:30 PM writes...
Unfortunately the numbers that are most relevant to having a useful drug are at the far end away from chemistry. What's worse is that most clinicians and biologists are unable to produce or handle much more than qualitative (not quantitative) assessments of the dynamic and heterogeneous nature of human disease. Until that reductionist paradigm changes to a more quantitative engineering approach, we're pretty much stuck with pharma R&D as it exists today.
Permalink to Comment15. Anonymous BMS Researcher on November 21, 2008 11:02 PM writes...
Permalink to CommentWhen I was an engineering student, one professor used to say "if you care about the second significant digit of tensile strength, you are already in trouble."
16. DavidInRichmond on November 22, 2008 3:16 AM writes...
When I was an undergraduate taking introductory p-chem the professor - not a kineticist - defined kinetics as follows: Kinetics is the science whereby you can calculate anything from anything else to within a factor of 10 to the fifth power.
Permalink to Comment