About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« How Not to Do It: Distilling HMPA | Main | Linkorama »

April 9, 2006

New Frontiers in Self-Deception

Email This Entry

Posted by Derek

One of the big uses for human gene chips has been the search for biomarkers: genes that are up- or down-regulated in disease states. The hope is that gene expression changes will be the early warning signs of diseases, and could also help refine their diagnosis beyond what can be done by traditional means.

Cancer is the obvioius place to start. As I've said in the past, there's no one disease by that name - just thousands of broadly similar diseases that we don't adequately distinguish between. We'll have to get down to the genetic and protein-expression levels to see the important differences. The process has already begun, as a look at the Iressa story shows.

Nothing good comes easy, though, and the field may have gotten ahead of itself. That's what a new paper in PNAS maintains, anyway. The authors, from the Weizmann Institute in Israel, point out that the various predictive gene lists proposed for different kinds of cancers have a lot of disquieting problems. For one thing, the lists for the same types of cancer don't seem to overlap very much. And in those cases, if you take the two and switch them (applying one group's list to the other group's patients) their success rates fall sharply.

The problems remain after all attempts to massage them away. Some possible reasons for them might be the different gene chips technologies used by different groups or different methods of analyzing the data, but these don't seem to be nearly enough to account for all the trouble. A bigger problem is the dependence of the results of these studies on the particular patients who were entered into them. There seems to be a major problem with unstable results based on the training set used to generate the lists.

This latest paper lays this out in mathematical terms, and the results aren't real pretty. The published gene lists were derived from dozens, or at most a couple of hundred patients. But in order to have an overlap of at least 50% between two lists of candidate marker genes, with a confidence of 95%, the authors calculate that the number of patients needs to be in the low thousands. The existing proposals are almost certainly completely inadequate. The search goes on, but it just got harder, and a lot more expensive.

Comments (4) + TrackBacks (0) | Category: Cancer


1. JSinger on April 9, 2006 10:05 PM writes...

Here, we introduce a previously undescribed mathematical method, probably approximately correct (PAC) sorting, for evaluating the robustness of such lists.

Errr, I think that algorithm could use a more confidence-inspiring name...

Never having had the energy to read those papers past the red and blue blob in Figure 2, I'm surprised it's not routine to test previously reported sets for similar classifications, especially if one's own results look good by comparison. Is there a gentleman's agreement that everyone is best off if it's not done?

Permalink to Comment

2. TWAndrews on April 9, 2006 10:39 PM writes...

To be dead honest, I don't think that biomarker detection has a hope of being effective until the detection technologies produce signal/noise ratios about 10 times better than what is currently available.

I've got colleagues who do very little beyond analysis of data for biomarker detection and typically they find that batch effects or related process issues dominate any sort of true effect.

Usually this goes unrecognized (or unacknowledged) by the bioinformatics groups who do the in-house data analysis, but the data that our firm sees rarely has much of anything usable in it.

Permalink to Comment

3. Epigenetics News on April 9, 2006 11:41 PM writes...

"Some possible reasons for them might be the different gene chips technologies used by different groups or different methods of analyzing the data, but these don't seem to be nearly enough to account for all the trouble."

Actually this problem can dramatically affect the results. I would go as far to say that comparing different technologies or chips at all is impossible. That is one of the key limitations, but if they're all compared on the same chip, the ability to reproduce results increases.

Permalink to Comment

4. RKN on April 10, 2006 8:43 AM writes...

I'm convinced you need to combine a proteomics approach with RNA microarray data to have a chance at discovering biomarkers.

I recently worked on a relatively small yeast project that involved comparing expression changes determined using microarray and proteomics (2D-Gel followed by ms/ms). There was very little overlap between the two. Not all transcripts get translated, and not at the same copy number. Less than adequate experimental precision could account for the difference as well.

As for the statistics, it's been established that even for the same cancer, at the same stage (say colon cancer), inter-individual differences of gene expression can be significant. Those subtle differences may be lost in the noise of the present-day experimental techniques used to measure them.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry