Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Outsourced Assays, Now a Cause For Wonder? | Main | No Scripps/USC »

July 9, 2014

Studies Show? Not So Fast.

Email This Entry

Posted by Derek

Yesterday's post on yet another possible Alzheimer's blood test illustrates, yet again, that understanding statistics is not a strength of most headline writers (or most headline readers). I'm no statistician myself, but I have a healthy mistrust of numbers, since I deal with the little rotters all day long in one form or another. Working in science will do that to you: every result, ideally, is greeted with the hearty welcoming phrase of "Hmm. I wonder if that's real?"

A constant source for the medical headline folks is the constant flow of observational studies. Eating broccoli is associated with this. Chocolate is associated with that. Standing on your head is associated with something else. When you see these sorts of stories in the news, you can bet, quite safely, that you're not looking at the result of a controlled trial - one cohort eating broccoli while hanging upside down from their ankles, another group eating it while being whipped around on a carousel, while the control group gets broccoli-shaped rice puffs or eats the real stuff while being duct-taped to the wall. No, it's hard to get funding for that sort of thing, and it's not so easy to round up subjects who will stay the course, either. Those news stories are generated from people who've combed through large piles of data, from other studies, looking for correlations.

And those correlations are, as far as anyone can tell, usually spurious. Have a look at the 2011 paper by Young and Karr to that effect (here's a PDF). If you go back and look at the instances where observational effects in nutritional studies have been tested by randomized, controlled trials, the track record is not good. In fact, it's so horrendous that the authors state baldly that "There is now enough evidence to say what many have long thought: that any claim coming from an observational study is most likely to be wrong."

They draw the analogy between scientific publications and manufacturing lines, in terms of quality control. If you just inspect the final product rolling off the line for defects, you're doing it the expensive way. You're far better off breaking the whole flow into processes and considering each of those in turn, isolating problems early and fixing them, so you don't make so many defective products in the first place. In the same way, Young and Karr have this to say about the observational study papers:

Consider the production of an observational study: Workers – that is, researchers – do data collection, data cleaning, statistical analysis, interpretation, writing a report/paper. It is a craft with essentially no managerial control at each step of the process. In contrast, management dictates control at multiple steps in the manufacture of computer chips, to name only one process control example. But journal editors and referees inspect only the final product of the observational study production process and they release a lot of bad product. The consumer is left to sort it all out. No amount of educating the consumer will fix the process. No amount of teaching – or of blaming – the worker will materially change the group behaviour.

They propose a process control for any proposed observational study that looks like this:

Step 0: Data are made publicly available. Anyone can go in and check it if they like.

Step 1: The people doing the data collection should be totally separate from the ones doing the analysis.

Step 2: All the data should be split, right at the start, into a modeling group and a group used for testing the hypothesis that the modeling suggests.

Step 3: A plan is drawn up for the statistical treatment of the data, but using only the modeling data set, and without the response that's being predicted.

Step 4: This plan is written down, agreed on, and not modified as the data start to come in. That way lies madness.

Step 5: The analysis is done according to the protocol, and a paper is written up if there's one to be written. Note that we still haven't seen the other data set.

Step 6: The journal reviews the paper as is, based on the modeling data set, and they agree to do this without knowing what will happen when the second data set get looked at.

Step 7: The second data set gets analyzed according to the same protocol, and the results of this are attached to the paper in its published form.

Now that's a hard-core way of doing it, to be sure, but wouldn't we all be better off if something like this were the norm? How many people would have the nerve, do you think, to put their hypothesis up on the chopping block in public like this? But shouldn't we all?

Comments (20) + TrackBacks (0) | Category: Clinical Trials | Press Coverage


COMMENTS

1. luysii on July 9, 2014 12:19 PM writes...

There's nothing wrong with observational studies (that's how thalidomide teratogenicity was found), but they should ALWAYS result in a controlled trial before acting on them. Here's particularly horrible example of relying on observational studies. For a few more examples more please see - http://luysii.wordpress.com/2011/10/13/the-risks-of-risk-reduction/

[ Science vol. 297 pp. 325 - 326 '02 ] During the planning study for the Women’s Health Initiative, some argued that it was UNETHICAL to deny some women the benefit of hormones and give them a placebo. The basis for this was that 3 different meta-analyses concluded that estrogen replacement therapy decreases the risk of coronary heart disease by 35 – 50% — these were all meta-analyses of observational studies and not prospective and randomized.

The reason the HERS study was funded was that Wyeth couldn’t get the FDA to approve hormone replacement therapy as a treatment to prevent cardiovascular disease. So Wyeth funded HERS to prove their case.


More work from the Women’s Health Initiative trial in 16,608 women showed increased risk of stroke, dementia, global cognitive decline. In addition there was no benefit against mild cognitive impairment. This was published in the 28 May ’03 JAMA. The present work extends the initial early findings to followup to 5.6 years. The rate of stroke was 31% higher. 80% were ischemic. The increased risk was seen in all categories of baseline stroke risk. 40/61 women diagnosed with dementia were in the hormone (prempro) group. This is in a subgroup of 4,532/16,608 women in the study. The references are all JAMA vol. 289 pp. 2663 – 2672, 2651 – 2662, 2673 – 2684, 2717 – 2719 ’03.

So what was the problem. Why were the results so different from what was expected? The women taking hormones in the 50 observational studies were (1) thinner (2) better educated (3) concerned enough about their health and vigor to take hormones — it is well known that compliers with mediation — even placebo medication — have a better outcome than noncompliers — believe it or not (4) smoked less.

Permalink to Comment

2. Carmen on July 9, 2014 12:48 PM writes...

I'm a big fan of HealthNewsReview, a watchdog site for health stories. Their reviewers are a mix of MDs and PhDs, many with journalism chops too. They have a list of criteria for evaluating story quality that includes "Does the story seem to grasp the quality of the evidence?" (Click the link in my name for the full list).

Of course, none of this helps the folks who have to generate 22 articles a day and use a handy stack of press releases as a crutch.

Permalink to Comment

3. Lisa Balbes on July 9, 2014 12:59 PM writes...

One of my favorite books ever - "How to Lie with Statistics". The examples are dated now, but it should be required reading for anyone who wants to be an informed citizen.

Permalink to Comment

4. Ryan Powers on July 9, 2014 1:05 PM writes...

Whenever I see these sorts of stories on news sites, the xkcd Jellybean comic comes to mind:

http://xkcd.com/882/

Permalink to Comment

5. Helical Investor on July 9, 2014 1:22 PM writes...

The plan you put out seems sensible, but in many instances the data available is limited, so binning into separate sets may not be practical. That is changing though, especially as electronic medical records gain traction. Data in one silo can and should be blindly compared (after the initial analysis) to data in other.

Fully expect that 'outcomes' data on a wide array of different therapeutics and their use will be compared to with populations in different insurance programs.

Permalink to Comment

6. Andy mckenzie on July 9, 2014 1:48 PM writes...

This is often done in machine learning, see for example the website kaggle, or for a more biomedical example, the DREAM challenges.

Permalink to Comment

7. David Stone on July 9, 2014 2:01 PM writes...

I see someone's already posted the Jelly Bean XKCD comic. These are also worth taking a look at:

Spurious correlations http://www.tylervigen.com/
The origin of cell phones http://xkcd.com/925/

Anyone covering health news for a media outlet should be required to read and understand those first!

Permalink to Comment

8. Anonymous on July 9, 2014 2:31 PM writes...

Remember that study that claimed that people have psychic powers? And how the results weren't replicable?

That was an interesting one. A good test of how to handle this kind of thing.

The thing is though, it's only because the results were so unusual (they could imply the need to expand our theories of funamdental physics, and reevaluate the way we think about biology and human behavior) that the thing was put up to so much scrutiny. I know the whole "extraordinary claims require extraordinary evidence" thing, but one has to wonder what percentage of all papers' conclusions are just as wrong, but haven't ever had to stand up to any scrutiny merely because the results seem reasonable.

Permalink to Comment

9. bluefoot on July 9, 2014 3:39 PM writes...

A couple of years ago at a conference, one of presenters was talking about how great their model was. I asked what the differences were between the training data set and their test set.....and they had used the same data. The scientist wasn't exactly junior, but didn't see anything wrong with doing it the way they did. Sometimes I despair.

Permalink to Comment

10. lynn on July 9, 2014 8:18 PM writes...

@David Stone - thanks for mentioning Spurious Correlations site. Of course I've always been skeptical of correlations, but it's often hard to impart that skepticism to people who fall for them. But now I can go to Spurious Correlations and pick out a few graphs to show the gullible.

Permalink to Comment

11. Esteban on July 9, 2014 8:19 PM writes...

The sad reality is that those receiving funding for observational studies are desperate to publish a positive result at the end, so cannot afford to lay all of their cards on the table upfront. Instead, they churn the data looking at various endpoints/associations, find at least one with a p-value below .05 (unadjusted for the multiple looks at the data of course), concoct a story as to why the effect is scientifically plausible, then write it up for publication. If it was a very expensive study, they will dig up multiple such stories and publish multiple times.

Permalink to Comment

12. Jack Scannell on July 10, 2014 12:46 AM writes...

It is not clear to me that the sensible concerns expressed here, or in the Young and Karr paper, necessarily reflect an observational vs. experimental distinction.

A high ratio of published false positives to true positives is a consequence of factors such as relatively lax thresholds for rejecting null hypotheses (e.g., p is less than 0.05 rather than p is less than 0.00001), uncorrected multiple comparisons, low experimental power (i.e., a low true positive detection rate), shifting the analytical goalposts once the data are in, the desire to publish stuff that looks interesting, and a genuine rarity of true positives, among other things.

Most of these factors apply to experimental studies too, as Derek's blog has pointed out over the years (e.g., Begley & Ellis on cancer work in 2012, Prinz et al. 2011, Perrin on mouse models in 2014).

The beauty of experiments over observations, of course, is that you can believe that your false results prove causation.

Permalink to Comment

13. matt on July 10, 2014 2:44 AM writes...

The Young and Karr paper has it right, IMO.

Interesting to go back to the comments on the AD blood test, and pull out Feynman's Cargo Cult lecture: this follows precisely, I think, his "magic sauce" separating the science-make-believers from the actual science-makers. Feynman's advice was to the students to have integrity and set these controls to avoid fooling themselves; Young and Karr, I believe rightly, suggest the process be changed so that everything is in the open.

In other words, rather than the bank (and customers) strongly urging employees to be honest and not steal, strong controls are put in place and everything is done in the open.

I'd guess the National Academies, NIH, NSF, etc major funding agencies would have to drive this for the work done for them, and that would generate enough attention to swing many of the publications that care about their reputation into some form of lip-service for this model.

But there are major professions (psychology being one, nutrition being another--probably many of the same culprits Feynman mentioned) which I think would be profoundly upset at having their routine disturbed (collect data, fish for surprising conclusion, publish, $$$!).

Permalink to Comment

14. Esteban on July 10, 2014 7:25 AM writes...

@12,13: I agree that there is no reason to think this is only a problem with observational studies.

Permalink to Comment

15. Darren on July 10, 2014 9:21 AM writes...

Another xkcd comic springs to mind,

http://xkcd.com/552/

Permalink to Comment

16. Oblarg on July 10, 2014 9:33 AM writes...

That nutritional "science" generates almost nothing but false results is already known to any scientifically literate person. It is the same issue that we have with "big data" (though strangely that one seems to get more of a pass from people who really ought to know better - maybe it sounds more impressive?): it is statistically impossible to generate true results by looking at massive data sets confounded by large numbers of unknown variables without solid motivation for the effects being investigated.

As good as the proposed measures would be for cleaning up the literature (though, sadly, I think they're likely unfeasible as they'd spell the demise of essentially the entire field and a lot of careers - people's livelihood, unfortunately, depends on peddling this crap), they'd have no real end-effect on public gullibility. No one who isn't a scientist can be bothered to look at the data were it public, nor would they have the tools to understand what conclusions to draw from it even if they did. Hell, if we can't expect people who spend time studying "science" at actual universities to understand why you can't double-dip on your data sets, what hope is there for the general public?

The root cause of all of this is statistical illiteracy on the part of almost everyone. The only way to fix this is significant investment in practical math education for all students. It is a travesty that anyone graduates high school without a working knowledge of statistics, when it is probably the single most important tool needed to be an informed human being. Until we address this, no amount of reformation in our publication methods will do a thing.

Permalink to Comment

17. db on July 11, 2014 9:58 AM writes...

@16,

It is similar to the idea that open source software inherently is more secure because it's code can be inspected by anyone. Of course it can be, but so few people have both the capability and time to actually do so, that the ideal is far from the reality.

Permalink to Comment

18. db on July 11, 2014 10:00 AM writes...

@17


Autocorrect again shows its shortcomings. It's really shameful.

Permalink to Comment

19. Joseph Hertzlinger on July 13, 2014 6:20 PM writes...

There also the Science News Cycle according to PhD comics.

Permalink to Comment

20. Vader on July 14, 2014 10:18 AM writes...

As a mere interested layman (on biological sciences; I'm a qualified scholar regarding world-shattering giant lasers) I simply apply a number of filters to these studies.

1. Is the relative risk less than 3? Ignore for at least the first three or four papers.

2. Is the study retrospective? Ignore.

3. Is the sample size less than three digits? Ignore.

I end up not feeling obligated to read very many observational papers this way.

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
The Worst Seminar
Conference in Basel
Messed-Up Clinical Studies: A First-Hand Report
Pharma and Ebola
Lilly Steps In for AstraZeneca's Secretase Inhibitor
Update on Alnylam (And the Direction of Things to Come)
There Must Have Been Multiple Chances to Catch This
Weirdly, Tramadol Is Not a Natural Product After All