Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe
Anyone looking over large data sets from human studies needs to be constantly on guard. Sinkholes are everywhere, many of them looking (at first glance) like perfectly solid ground on which to build some conclusions. This, to be honest, is one of the real problems with full release of clinical trial data sets: if you're not really up on your statistics, you can convince yourself of some pretty strange stuff.
Even people who are supposed to know what they're doing can bungle things. For instance, you may well have noticed a lot of papers coming out in the last few years correlating neuroimaging studies (such as fMRI) with human behaviors and personality traits. Neuroimaging is a wonderfully wide-open, complex, and important field, and I don't blame people for a minute for pushing it as far as it can go. But just how far is that?
A recent paper (PDF) suggests that the conclusions have run well ahead of the numbers. Recent papers have been reporting impressive correlations between the activation of particular brain regions and associated behaviors and traits. But when you look at the reproducibility of the behavioral measurements themselves, the correlation is 0.8 at best. And the reproducibility of the blood-oxygen fMRI measurements is about 0.7. The highest possible correlation you could expect from those two is the square root of their product, or 0.74. Problem is. . .a number of papers, including ones that get the big press, show correlations much higher than that. Which is impossible.
The Neurocritic blog has more details on this. What seems to have happened is that many researchers found signals in their patients that correlated with the behavior that they were studying, and then used that same set of data to compute the correlations between the subjects. I find, by watching people go by the in the street, that I can pick out a set of people who wear bright red jackets and have ugly haircuts. Herding them together and rating them on the redness of their attire and the heinousness of their hair, I find a notably strong correlation! Clearly, there is an underlying fashion deficiency that leads to both behaviors. Or people had their hair in their eyes when they bought their clothes. Further studies are indicated.
No, you can't do it like that. A selection error of that sort could let you relate anything to anything. The authors of the paper (Edward Vul and Nancy Kanwisher of MIT) have done the field a great favor by pointing this out. You can read how the field is taking the advice here.
2. NJBiologist on January 18, 2010 12:33 PM writes...
I'm going to nitpick here--and I acknowledge that it's nitpicking, as I agree wholeheartedly with the general idea that neuroimaging people may have overinterpreted their data.
"But when you look at the reproducibility of the behavioral measurements themselves, the correlation is 0.8 at best. And the reproducibility of the blood-oxygen fMRI measurements is about 0.7. The highest possible correlation you could expect from those two is the square root of their product, or 0.74."
Here's the nit: that's only if the error is in the measurement. If the between-trials subject error is greater than the error of the physical measurement, you might well expect correlations to improve when considering individual trials.
So let's say you're trying to correlate discrimination of plaid with BOLD activity in V1 visual cortex, and you're seeing variability. Some of this will be inherent in the scanner, and some of it will be inherent in the behavioral task of plaid discrimination. But there may be isolated trials where the hypothesized V1 plaid discriminator neurons are behaving inappropriately--leading to both a wrong result on the behavioral assay ("why yes, doc, that does look like Black Watch" in response to a solid green slide) as well as an unexpected activation result (i.e., scanner detects activity). Trials like these would decrease test-retest reliability on either endpoint, but would correlate.
But people require seers and priests.. Seriously, can you ever use fMRI for reading complex thoughts given that we do not think the same about a given object or stimuli?
Example- When I see a chemical structure, I start remembering similar structures, their fragments and pharmacology etc. When a medicinal chemist sees the same chemical structure, his mind most likely starts considering the synthetic accessibility. If more experienced, he will probably remember similar projects (and maybe failures).
A lawyer is thinking of ways to screw the company and looking at his options. The MBAS are busy copying it so they can outsource it's synthesis, and get a bonus.
Each is interpreting the same object in different ways, based on their current situation, past experience and knowledge base. Are they using the same combination of neuronal pathways to see the same thing? NO!
What you're spending on inhouse research scientists so that they can fail *practically* all the time could better be spent on inhouse expert statiticians while outsourcing and distributing the risk of early and intermediate R&D.
It can be easy to miss a correlation. I knew a student who was studying the effects of a drug, and using a known compound as a positive control. If the positive control did not produce the established effect, he discarded the data on the test compound. That sounded reasonable when he described the procedure to me. But when I was writing up the paper, I realized he was calculating the effects of both compounds relative to the same negative control, so the effects of the two compounds were not actually independent, but correlated. I had some very bad moments before I recalculated the statistics without exclusions, and found that the conclusions still held up.
9. Cartesian on January 19, 2010 5:18 AM writes...
I wanted to continue the work of Descartes about the functioning of the body linked to the human passions, this by studing hormones ; but I am not very motivated actually.
After reading this post, I remembered a paper along very similar lines (published last year as well): http://dx.doi.org/10.1038/nn.2303
The authors also do a very fine job in pointing out the troubles involved with what they call 'double dipping'...
12. David G. on January 20, 2010 11:56 AM writes...
This is an issue that has carried over from the issues in psychology. I was attracted to my field of neuroscience because it was an acknowledgement of the fact that our minds are apparently the product of biological processes, and that we'll need to understand those processes to understand the mind. However, just as with psychologists, many neuroscientists simply aren't patient enough to work towards a basic understanding of the process of cognition and want to skip ahead to understanding the personality that is a result of it; so, they jump to conclusions.
Many neuroscientists and psychologists get into their fields thinking that they will come out of school knowing what makes people tick and how to help or understand them. These sorts of sudden conclusions are symptomatic of this impatience. I'll take a Santiago Ramon y Cajal or a Camillo Golgi over a Freud or a Dennett any day, even though the former two make no attempt to tell me why or how I think.
Including the data that gave rise to the initial "Hmmm, that looks interesting" correlation in the data set that tries to confirm the correlation is a classic but still unforgivable mistake.
Once a set of experimental data has been used to select subjects for further experiments, those data are worse than useless in the subsequent analysis.
1. Cyan on January 18, 2010 11:15 AM writes...
Andrew Gelman had a couple of interesting posts on this a while back (here and here).
Permalink to Comment2. NJBiologist on January 18, 2010 12:33 PM writes...
I'm going to nitpick here--and I acknowledge that it's nitpicking, as I agree wholeheartedly with the general idea that neuroimaging people may have overinterpreted their data.
"But when you look at the reproducibility of the behavioral measurements themselves, the correlation is 0.8 at best. And the reproducibility of the blood-oxygen fMRI measurements is about 0.7. The highest possible correlation you could expect from those two is the square root of their product, or 0.74."
Here's the nit: that's only if the error is in the measurement. If the between-trials subject error is greater than the error of the physical measurement, you might well expect correlations to improve when considering individual trials.
So let's say you're trying to correlate discrimination of plaid with BOLD activity in V1 visual cortex, and you're seeing variability. Some of this will be inherent in the scanner, and some of it will be inherent in the behavioral task of plaid discrimination. But there may be isolated trials where the hypothesized V1 plaid discriminator neurons are behaving inappropriately--leading to both a wrong result on the behavioral assay ("why yes, doc, that does look like Black Watch" in response to a solid green slide) as well as an unexpected activation result (i.e., scanner detects activity). Trials like these would decrease test-retest reliability on either endpoint, but would correlate.
Permalink to Comment3. Anonymous BMS Researcher on January 18, 2010 12:49 PM writes...
http://xkcd.com/552/
Permalink to Comment4. PharmaHeretic on January 18, 2010 2:13 PM writes...
But people require seers and priests.. Seriously, can you ever use fMRI for reading complex thoughts given that we do not think the same about a given object or stimuli?
Example- When I see a chemical structure, I start remembering similar structures, their fragments and pharmacology etc. When a medicinal chemist sees the same chemical structure, his mind most likely starts considering the synthetic accessibility. If more experienced, he will probably remember similar projects (and maybe failures).
A lawyer is thinking of ways to screw the company and looking at his options. The MBAS are busy copying it so they can outsource it's synthesis, and get a bonus.
Each is interpreting the same object in different ways, based on their current situation, past experience and knowledge base. Are they using the same combination of neuronal pathways to see the same thing? NO!
Permalink to Comment5. Skeptic on January 18, 2010 5:15 PM writes...
"if you're not really up on your statistics..."
Or stated another way:
What you're spending on inhouse research scientists so that they can fail *practically* all the time could better be spent on inhouse expert statiticians while outsourcing and distributing the risk of early and intermediate R&D.
Gotcha. Already happening.
Permalink to Comment6. trrll on January 18, 2010 6:22 PM writes...
It can be easy to miss a correlation. I knew a student who was studying the effects of a drug, and using a known compound as a positive control. If the positive control did not produce the established effect, he discarded the data on the test compound. That sounded reasonable when he described the procedure to me. But when I was writing up the paper, I realized he was calculating the effects of both compounds relative to the same negative control, so the effects of the two compounds were not actually independent, but correlated. I had some very bad moments before I recalculated the statistics without exclusions, and found that the conclusions still held up.
Permalink to Comment7. A nonie mouse on January 18, 2010 9:08 PM writes...
Isn't 0.74 the expectation value of the correlation, not the upper-bound? Not that this lends any credibility to the correlations...
Permalink to Comment8. schinderhannes on January 19, 2010 4:10 AM writes...
Hughe data setes, poor statistics, overactive researchers, and a ginormus publication bias (file drawer effect)...
It happens over and over again when new toys hit the unprepared.
Just like with genetics a while back (the gay gene, the gene for laziness etc...)
LOL
Permalink to Comment9. Cartesian on January 19, 2010 5:18 AM writes...
I wanted to continue the work of Descartes about the functioning of the body linked to the human passions, this by studing hormones ; but I am not very motivated actually.
Permalink to Comment10. In Vivo Veritas on January 20, 2010 10:02 AM writes...
Neuroimaging - it's the new Phrenology!
Permalink to CommentPretty pictures and shaky corelations do not equal insight into the biology underlying behavior.
11. kwl on January 20, 2010 11:21 AM writes...
After reading this post, I remembered a paper along very similar lines (published last year as well):
Permalink to Commenthttp://dx.doi.org/10.1038/nn.2303
The authors also do a very fine job in pointing out the troubles involved with what they call 'double dipping'...
12. David G. on January 20, 2010 11:56 AM writes...
This is an issue that has carried over from the issues in psychology. I was attracted to my field of neuroscience because it was an acknowledgement of the fact that our minds are apparently the product of biological processes, and that we'll need to understand those processes to understand the mind. However, just as with psychologists, many neuroscientists simply aren't patient enough to work towards a basic understanding of the process of cognition and want to skip ahead to understanding the personality that is a result of it; so, they jump to conclusions.
Many neuroscientists and psychologists get into their fields thinking that they will come out of school knowing what makes people tick and how to help or understand them. These sorts of sudden conclusions are symptomatic of this impatience. I'll take a Santiago Ramon y Cajal or a Camillo Golgi over a Freud or a Dennett any day, even though the former two make no attempt to tell me why or how I think.
Permalink to Comment13. Henrik Olsen on January 21, 2010 8:18 PM writes...
Including the data that gave rise to the initial "Hmmm, that looks interesting" correlation in the data set that tries to confirm the correlation is a classic but still unforgivable mistake.
Once a set of experimental data has been used to select subjects for further experiments, those data are worse than useless in the subsequent analysis.
Permalink to Comment