About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Science Gifts: Running Experiments at Home | Main | More Vaccine Fearmongering »

December 4, 2013

Cancer Cell Line Assays: You Won't Like Hearing This

Email This Entry

Posted by Derek

Here's some work that gets right to the heart of modern drug discovery: how are we supposed to deal with the variety of patients we're trying to treat? And the variety in the diseases themselves? And how does that correlate with our models of disease?

This new paper, a collaboration between eight institutions in the US and Europe, is itself a look at two other recent large efforts. One of these, the Cancer Genome Project, tested 138 anticancer drugs against 727 cell lines. Its authors said at the time (last year) that "By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies". The other study, the Cancer Cell Line Encyclopedia, tested 24 drugs against 1,036 cell lines. That one appeared at about the same time, and its authors said ". . .our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of ‘personalized’ therapeutic regimens."

Well, will they? As the latest paper shows, the two earlier efforts overlap to the extent of 15 drugs, 471 cell lines, 64 genes and the expression of 12,153 genes. How well do they match up? Unfortunately, the answer is "Not too well at all". The discrepancies really come out in the drug sensitivity data. The authors tried controlling for all the variables they could think of - cell line origins, dosing protocols, assay readout technologies, methods of estimating IC50s (and/or AUCs), specific mechanistic pathways, and so on. Nothing really helped. The two studies were internally consistent, but their cross-correlation was relentlessly poor.

It gets worse. The authors tried the same sort of analysis on several drugs and cell lines themselves, and couldn't match their own data to either of the published studies. Their take on the situation:

Our analysis of these three large-scale pharmacogenomic studies points to a fundamental problem in assessment of pharmacological drug response. Although gene expression analysis has long been seen as a source of ‘noisy’ data, extensive work has led to standardized approaches to data collection and analysis and the development of robust platforms for measuring expression levels. This standardization has led to substantially higher quality, more reproducible expression data sets, and this is evident in the CCLE and CGP data where we found excellent correlation between expression profiles in cell lines profiled in both studies.

The poor correlation between drug response phenotypes is troubling and may represent a lack of standardization in experimental assays and data analysis methods. However, there may be other factors driving the discrepancy. As reported by the CGP, there was only a fair correlation (rs < 0.6) between camptothecin IC50 measurements generated at two sites using matched cell line collections and identical experimental protocols. Although this might lead to speculation that the cell lines could be the source of the observed phenotypic differences, this is highly unlikely as the gene expression profiles are well correlated between studies.

Although our analysis has been limited to common cell lines and drugs between studies, it is not unreasonable to assume that the measured pharmacogenomic response for other drugs and cell lines assayed are also questionable. Ultimately, the poor correlation in these published studies presents an obstacle to using the associated resources to build or validate predictive models of drug response. Because there is no clear concordance, predictive models of response developed using data from one study are almost guaranteed to fail when validated on data from another study, and there is no way with available data to determine which study is more accurate. This suggests that users of both data sets should be cautious in their interpretation of results derived from their analyses.

"Cautious" is one way to put it. These are the sorts of testing platforms that drug companies are using to sort out their early-stage compounds and projects, and very large amounts of time and money are riding on those decisions. What if they're gibberish? A number of warning sirens have gone off in the whole biomarker field over the last few years, and this one should be so loud that it can't be ignored. We have a lot of issues to sort out in our cell assays, and I'd advise anyone who thinks that their own data are totally solid to devote some serious thought to the possibility that they're wrong.

Here's a Nature News summary of the paper, if you don't have access. It notes that the authors of the two original studies don't necessarily agree that they conflict! I wonder if that's as much a psychological response as a statistical one. . .

Comments (21) + TrackBacks (0) | Category: Biological News | Cancer | Chemical Biology | Drug Assays


1. Tuck on December 4, 2013 9:51 AM writes...

Given this and the failure to find presumed reliable genetic markers for cancer, it's beginning to feel like the drug/gene approach to treating cancer is nearing a dead end. Or a cliff...

Permalink to Comment

2. a. nonymaus on December 4, 2013 10:21 AM writes...

The quoted part is most eyebrow-raising. If two sites can't get good reproducibility in measuring the IC50 of the same compound on the same cell line, there's a problem. In that context, it makes me wonder if two attempts to measure the IC50 value by the same site will show similar levels of error. Is it a systematic bias or does it need more signal averaging? If it is systematic, is there any way to calibrate one site versus the other?
I mean, either adding these substances to cultures does something or it does not.

Permalink to Comment

3. Ollie Rehn on December 4, 2013 11:32 AM writes...

I don't have access to the paper at the present time, but speaking from experience, one variable that is often overlooked in studies like these is the stability/purity of the chemotherapeutic agent.

Many, if not most, chemos have a limited lifetime in dmso solution; cisplatin is a prime example with a half life of only a few minutes. For camptothecin, it's known that hydrolysis of the lactone leads to a compound that is both inactive and binds very tightly to any serum protein that might be present in the assay.

What if one group used freshly dissolved camptothecin and the other group used a sample that's sat in dmso solution for many months/years undergoing multiple freeze-thaw cycles?

Permalink to Comment

4. Neo on December 4, 2013 1:17 PM writes...

Ultimately this seems to be the result of bad experimental practices. Nowadays, it is more important to be the first than to be correct. More so in large-scale studies. The journals share the responsability for accuracy: the better known are the groups, the lighter are the reviews. But hey, they can always accept a 3rd Nature paper on how difficult is the problem once the inconsistencies arise.

Permalink to Comment

5. JB on December 4, 2013 1:26 PM writes...

Doesn't their supplementary section 3 pretty significantly weaken the point they're trying to make? They note that Sanger used CellTiterGlo (ATP quantitation) and CCLE used a resazurin equivalent (mitochondrial metabolism) and also note that another study comparing the two concluded that they are often discordant:
"The study shows that the ATP-dependent luminescence assay is prone to underestimation of drug potency and efficacy, which was particularly problematic for assessing efficacy of DNA synthesis-targeting agents26. The ATP-dependent luminescence and fluorescent DNA-binding assays are measuring different aspects of the drug response phenotype, and therefore it is not surprising that the assays show only moderate correlation in the CGP/CCLE analysis. Given the limitations of each assay, it has been suggested that multi-parameter testing, incorporating multiple, complementary cell-viability assays yields the most robust and informative phenotypic measures"
Then they list a table of even more differences- 1536 vs. 384 and 96, pre-adherence time, use of a poscon compound or cell-free wells as 100% (which if you're going to be comparing IC50s ie when you reach 50% of your 100% standard is a REALLY HUGE DIFFERENCE.)
So I think their totally valid message is that there should be a standardization of methods for this sort of thing around protocols that are shown to best align with inferences about biomarkers and compound sensitivities- and that someone should figure out what those best practices are- or at least there should be a recognition that certain experimental methods might make you miss correlations to genetic features. Instead the message everyone's getting from it is that people are bad at running experiments like this and we should give up.

Permalink to Comment

6. Anonymous on December 4, 2013 1:29 PM writes...

#3 that's a good point. I also don't have access, but I would assume that they would compare the quality of the compounds used in both studies (and if they were prepared by the same protocol, purchased from the same vendor, stored the same way (in solution versus neat), etc.). Does anyone know?

Permalink to Comment

7. MDACC Alum on December 4, 2013 2:15 PM writes...

This annoys me to no end. Buy a [cell biologist] postdoc a couple beers and get him talking. He'll tell you how unreliable even the same cell line can be if one isolates out random populations of cells and grows them up. IMO, the conclusion in the article has been a long time coming...and may actually still be a long way off seeing as everyone wants to vigorously defend their results.
It just goes back to your big names having the ability to pull in a large sum of money for busy work that they market as "innovation."
Why can't the NIH just give people with good ideas money?

Permalink to Comment

8. MDACC Alum on December 4, 2013 2:19 PM writes...

@2. a. nonymaus
" there any way to calibrate one site versus the other?"

These projects are run by the cheapest lab techs you can find (postdocs and grad students aren't going to touch something they can't get a tangible authorship out of). Pay techs 30k and you get what you pay for. They aren't exactly doing industry-quality method transfers and validation.

Permalink to Comment

9. anon the II on December 4, 2013 2:49 PM writes...

@ Ollie and Anonymous

Your point is valid but probably not particularly relevant. Most small molecules are stable under these conditions. You don't destroy small molecules with freeze-thaw cycles. You may pick up water which causes precipitation. If these assays are run with the wrong compounds (or no compound), then this is truly crap science. With only 34 different compounds, surely they got that part right. ;)

Permalink to Comment

10. Kip Guy on December 4, 2013 5:23 PM writes...

We've done a lot of this type of work over the last 8 years and the basic learning is that you need to put about six weeks worth of process validation into each cell line in order to get internally reproducible potency numbers (systematically taking into account seeding density, growth rate, DMSO sensitivity, etc). Doubling time and seeding density seem to be the biggest variable. The other thing to keep in mind is the QC of the cell lines themselves. We quit taking lines from other laboratories and shifted to sourcing everything directly from ATCC because it was routine to have incorrectly identified lines. We've never tried quantitative comparisons between sites but find if you take this level of care you can routinely get within 3 to 10-fold consistency and good correlation between laboratories. I guess the long and the short of this is I would be very cautious of such data if the lab cannot cough up a SOP at this level for each cell line and QC data with control compounds in dose-response for each run with each cell line.

Permalink to Comment

11. Ollie Rehn on December 4, 2013 8:54 PM writes...

@9 anon the II

Most chemotherapeutic agents are not stable for long periods in dmso. Many are highly reactive. We learned this the hard way, there are companies out there who collect as many known therapeutic and chemotherapeutic agents as possible and contract their services to other companies to test them in cellular assays. Our biologists set up an agreement with such a company, and once the data came it, it didn't make sense that many of the chemotherapeutics did nothing against both our cell lines of interest and the control cells lines as well. After some effort, we figured out than many of these fastidiously collected compounds had decomposed in dmso.
Camptothecin is a very interesting case. At neutral pH in the presence of water, it has a half life of about 20 minutes. Wet dmso is certainly going to open the lactone. In addition, there is no trivial way to assay for the open form. The absorbance spectra of the lactone and the hydroxy acid are almost identical. On an HPLC, you will probalby convert some or all of it back to the lactone. Perhaps there is a clever way to determine if your sample is intact, but the only way that I am aware of is fluroescence spectroscopy. Not something most cell biologists are going to do.
And of course, the lactone is active and the hydroxy acid is completly inactive and highly bound to serum protein

Permalink to Comment

12. No one in particular on December 4, 2013 9:12 PM writes...

Another cautionary point about personalized medicine that seems underappreciated (perhaps because it is depressing) is that there is a lot of heterogenaity in tumors. Even in a sample from the same tumor from the same patient you will find cells with different genetic lesions. And the range will be different if you look at a met, or the same tumor after time passes. Cloned cell lines are not really representative of the actual diseaase! We have to work with the research tools we have, but don't ignore the limitations.

Permalink to Comment

13. barry on December 5, 2013 2:22 AM writes...

and after you've found an agent that works against a cell line or cell lines, you must remember that these clones were highly selected to prosper in vitro, and are not representative of primary clinical isolates. Real tumors in real animals/patients are heterogeneous, representing many stages in the "multifactorial aetiology" of cancer and only those cells that do well in cell-culture plates will be represented in the assays described.

Permalink to Comment

14. CAprof on December 5, 2013 7:45 PM writes...

Many years ago, Al Gilman's "Cell Signaling Alliance" tried a conceptually similar sort analysis using many distributed sites. The lack of consistency was worrying, and when the boffins did a PCA of the data the primary determinant was the identify of the person doing the expt.

Permalink to Comment

15. Anonymous on December 6, 2013 10:25 PM writes...

@Kip Guy:

If you're doing 1000 cell lines, I can guarantee that there will not be 6 weeks of rigorous QC per cell line. That's 115 man-years (without vacations!), if you're curious. It simply cannot be done with the available resources of any group.

When the broad or sanger do this, they do have rigorous SOPs, but those SOPs state that you plate X number of cells per well, and it's the same for every line. Some lines will perform well, some won't, but that's what the other 800 cell lines are for. Both groups are careful to assign the correct cell line annotation, by SNP genotyping, and the vast majority of both their stocks are from ATCC.

This type of screening isn't meant to produce highly reliable IC50 values with each compound in each line. It's meant to give rough partitioning into sensitive/insensitive buckets for a given compound. For that purpose, it's pretty good with most of the targeted agents I've ever seen. Cytotoxics like camptothecin... maybe not so much. There the therapeutic windows are much smaller, so the correlations are bound to be much worse, as measurement errors will dominate. The thing is, does anyone still really care how camptothecin performs in one lineage vs. another? I sure don't. I care about what cell lines a super-clean inhibitor of kinase X kills.

Permalink to Comment

16. Nonymouse on December 7, 2013 9:19 AM writes...

Fortunately for the potential centenarian cell culture folks you can grow and QC lines in parallel.

Permalink to Comment

17. Anonymous on December 7, 2013 9:56 AM writes...

The point still stands. These aren't meant to be rigorous hypothesis testing assays. These are bulk population experiments, mostly for hypothesis generation. You would never pull the trigger on a clinical trial based on data generated this way, you'd use this data to tell you where you should look with secondary investigations.

Permalink to Comment

18. Kip Guy on December 9, 2013 11:02 AM writes...

I guess that I had two points

1) you won't get reliable EC50's from an experiment like this that allow rigorous comparison of potency and generation of quantitative hypotheses

2) I question how useful the entire approach is for hypothesis generation when you can't trust that inactivity is due to mechanism rather that poor growth of the cells, etc.

Permalink to Comment

19. Mike B. on December 24, 2013 12:32 AM writes...

Welcome to the world of cell biology. This problem has been known for a long time. We had a guest from NIST give a presentation on trying to simply standardize a very simple toxicity assay. NIST sent out the same cell line, with the same tox assay, and the same exact protocols to random labs all over the country. The variability in the results obtained was truly shocking. If a simple tox assay can't be repeated, how repeatable is 99% of all the other more advanced science on cells that is currently being done?

One should also keep in mind that in vitro cancer lines are also absolutely nothing like cells from primary tumors. The genomic data changes once cells are grown in culture and exhibit very little stem cell like characteristics that are so prominent in primary cancer cells. Just adding serum alone to your cell culture media radically alters the genomic behavior of cancer cell lines that makes them nothing like cancer cells in vivo.

Permalink to Comment

20. MIke B. on December 24, 2013 12:38 AM writes...

#7 is also correct. For example, doing something like passaging cells differently can significantly alter results. If you only take the cells that detach after 2 minutes of trypsinization, you'll select for a different set of cells with different characteristics than if say you trypsinized cells and used all of the cells that came off after 5 minutes of waiting. If you had to standardized every single step of doing cell work, nothing would ever get done because we'd be troubleshooting and standardizing for the next 200 years.

Permalink to Comment

21. MIke B. on December 24, 2013 12:45 AM writes...

#7 is also correct. For example, doing something like passaging cells differently can significantly alter results. If you only take the cells that detach after 2 minutes of trypsinization, you'll select for a different set of cells with different characteristics than if say you trypsinized cells and used all of the cells that came off after 5 minutes of waiting. If you had to standardized every single step of doing cell work, nothing would ever get done because we'd be troubleshooting and standardizing for the next 200 years.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry