About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« How Good (or Bad?) Are Patent Procedures, Anyway? | Main | Wyeth v. Levine: Pre-emption Goes Away »

March 4, 2009

Gene Expression: You Haven't Been Thinking Big Enough?

Email This Entry

Posted by Derek

Well, here’s another crack at open-source science. Stephen Friend, the previous head of Rosetta (before and after being bought by Merck), is heading out on his own to form a venture in Seattle called Sage. The idea is to bring together genomic studies from all sorts of laboratories into a common format and database, with the expectation that interesting results will emerge that couldn’t be found from just one lab’s data.

I’ll be interested to see if this does yield something worthwhile – in fact, I’ll be interested to see if it gets off the ground at all. As I’ve discussed before, the analogy with open-source software doesn’t hold up so well with most scientific research these days, since the entry barriers (facilities, equipment, and money) are significantly higher than they are in coding. Look at genomics – the cost of sequencing has been dropping, for sure, but it’s still very expensive to get into the game. That lowered cost is measured per base sequenced – today’s technology means that you sequence more bases, which means that the absolute cost hasn’t come down as much as you might think. I’m sure you can get ten-year-old equipment cheap, but it won’t let you do the kind of experiments you might want to do, at least not in the time you’ll be expected to do them in.

But even past that issue, once you get down to the many labs that can do high-level genomics (or to the even larger number that can do less extensive sequencing), the problems will be many. Sage is also going to look at gene expression levels, something that's easier to do (although we're still not in weekend-garage territory yet). Some people would say that it's a bit too easy to do: there are a lot of different techniques in this field, not all of which always yield comparable data, to put it mildly. There have been several attempts to standardize things, along with calls for more control experiments, but getting all these numbers together into a useful form will still not be trivial.

Then you've got the really hard issues: intellectual property, for one. If you do discover something by comparing all these tissues from different disease states, who gets to profit from it? Someone will want to, that's for sure, and if Sage itself isn't getting a cut, how will they keep their operation going? Once past that question (which is a whopper), and past all the operational questions, there's an even bigger one: is this approach going to tell us anything we can use at all?

At first thought, you'd figure that it has to. Gene sequences and gene expression are indeed linked to disease states, and if we're ever going to have a complete understanding of human biology, we're going to have to know how. But. . .we're an awful long way from that. Look at the money that's been poured into biomarker development by the drug industry. A reasonable amount of that has gone into gene expression studies, trying to find clear signs and correlations with disease, and it's been rough sledding.

So you can look at this two ways: you can say fine, that means that the correlations may well be there, but they're going to be hard to find, so we're going to have to pool as much data as possible to do it. Thus Sage, and good luck to them. Or the systems may be so complex that useful correlations may not even be apparent at all, at least at our current level of understanding. I'm not sure which camp I fall into, but we'll have to keep making the effort in order to find out who's right.

Comments (16) + TrackBacks (0) | Category: Biological News | Drug Development


1. Retread on March 4, 2009 10:16 AM writes...

It's not clear (to me) even after looking at their site, just what they mean by gene expression. The easiest thing to measure are messenger RNA (mRNA) levels -- not that measuring them was easy even 10 years ago. If so, they are at least 2 (and perhaps 3) steps removed from what might be called functional gene expression.

Not all mRNA hangs around for the same amount of time, and the rate of translation of a given mRNA into its protein varies with both the type of mRNA and the state of the cell the mRNA is found in. So by gene expression, do they also mean protein levels?

At the second remove, we have the various protein modifications, known to be crucial -- phosphorylation (think how much drug development is focused on kinase inhibition), sulfation, ubiquitination, sumoylation, conjugation with lipids -- the list goes on and on and I doubt that we presently know all the ways that proteins can be modified. Will each of these be measured and quantified? IMHO this is really what gene expression is all about.

At the third remove -- we know (at least in yeast) that > 90% of the genome is transcribed into RNA, and most estimates in man are over 50%. Will the noncoding (for protein that is) also be measured as gene expression? This certainly should include microRNAs and long RNAs such as HOTAIR.

Permalink to Comment

2. Palo on March 4, 2009 11:48 AM writes...

From the Sage project:
"An incubation period of three to five years is anticipated in which new project data are generated, critical tools for building and mining disease models are developed and governing rules for sharing, accessing, and contributing to the platform are established"

I think it answers most of your questions. Business model: pay to access the annotated database. Worth doing: we'll know in five years.

I'm with you Derek. I'm not sure we'll get anything out of it, but it seems worth trying.

Permalink to Comment

3. Mike on March 4, 2009 2:08 PM writes...

"An incubation period of three to five years is anticipated in which new project data are generated, critical tools for building and mining disease models are developed and governing rules for sharing, accessing, and contributing to the platform are established"

That describes an operational model, not necessarily a business (or commercial) model. I'm not saying it won't be pay-to-play, but you can't infer that from the quote.

Permalink to Comment

4. Jose on March 4, 2009 2:45 PM writes...

A programmer friend was telling me folks like him always try and figure out what the power law is for any computing process (linear, exponential, factorial, etc) to decide if it is even a tractable problem on human timescales. Presumably Sage has done similar analyses?

Permalink to Comment

5. darwin on March 4, 2009 2:50 PM writes...

Step right up. I have some wonderful snake oil I would like to sell you.

Permalink to Comment

6. AR on March 4, 2009 5:08 PM writes...

What surprises me is that anyone is still working from the basic assumption that the genome analysis is the end all in the cell. There is a lot more to this than differences in laboratory methodology.

Hello! Two cells, treated under identical conditions, side by side in the same flask. Cell one, gene of interest, expression goes up; cell two, same gene, expression goes down. When this can be explained, maybe then, the whole field will be useful.

Permalink to Comment

7. Mr. Gunn on March 4, 2009 7:12 PM writes...

Well, AR, there are general trends culture-wide and population-wide, as can be seen by comparing flow experiments with culture-based expression studies. I know what you mean, though, because I spent years puzzling over this stuff. I come down on the side of we'll know more if we have more data, but only if we have the proper tools to analyze it. The analysis project itself will be a genome-scale project, so they'd better have to right people or get them for that side of things.

Permalink to Comment

8. AR on March 5, 2009 12:41 PM writes...

Mr Gunn

Sub-cell population, cell cycling, all the typical hand-waving biologists do can not explain some of the gene data discrepant results I see. More data? It seems to me we are overloaded with data now and still can’t find more than a few casual links between phenotype and molecular gene expression. Better data? Before answering I would like to understand why the current databases are now considered to be of such poor quality? If it is methodological differences between labs – well, which one is right?

There is, of course, another answer. Perhaps I am premature but I tend to dump gene expression into the same category as combi-chem, gene-based targeting, in silico drug design and some other fads that seem like sure-bets but turned out useless or worse, just plain wrong. A lot of smart, well funded scientists and IT gurus have been looking for a decade or so at gene expression data. If the conclusion of all this effort is we can’t trust these results, what does that tell you?

Permalink to Comment

9. g on March 5, 2009 1:02 PM writes...

This type of approach could be useful in very specific situations. If there is some type of cancer that is resistant to the traditional therapy, it would be tremendous to be able to genotype the tumor from a biopsy and determine if it is resistant. On the other hand, if you were to try to determine gene expression patterns for depressive disorders, good luck trying to ascertain anything useful or not already known!

Permalink to Comment

10. bmp3 on March 5, 2009 6:06 PM writes...

amongst all the negative comments maybe one slightly optimistic: if anyone can pull it off it has to be Steven Friend/Eric Schadt. Not sure how many of the commenters have read a couple of their recent papers...but they are clearly not newcomers to the field and know all to well about the issues that are being mentioned by Derek and the commenters. Don't be surprised if they pull it off in the end!

Permalink to Comment

11. darwin on March 6, 2009 8:38 AM writes...

The only thing Friend pulled off was a gigantic snowjob in selling the ability of his company's technology to Merck. Merck finally figured out that genomics was just another tool whose utility needs to be assessed and used when appropriate and that it is cant' make alogrithm decisions to replace scientists; which is the word that was trickling down 5 years ago. But Stephen walked off with ginormous sums of money and he is off to his next informercial for the Ronco gene analyzer software. Hell I will bet he might even consider throwing in a Thighmaster for the first 50 people who call. Genomics clearly has its place in pharma, however it will likely take years to build the databases necessary to permit interpretation of data with confidence.

Permalink to Comment

12. Cellbio on March 7, 2009 12:41 PM writes...

darwin, thanks for the laugh. I like the analogy you make between the Genomics salesmen and Ron Popeil of Ronco fame. My favorite is the pocket fisherman; never know when you might stumble upon a trout filled stream.

I started my training as a cloner and found out early, mid-80s, that expression of genes under a certain state in vitro was not very useful in informing biology, so I stayed away from the microarray boom as I was skeptical. Certainly there is value, and things like the Kras test for EGFR abs makes sense and will be a growing trend in justifying drug use, and associated cost, in specific cases as opposed to all-comers.

AR, speaking openly about the variability in biology is not hand waving, but a truthful reflection of the complexity. I agree that simply getting more data doesn't solve the complexity, but certainly less data is not a productive route either.

If you are in the med chem business, you either have experienced the complexity and variability of biology, or will, when you start to collect clinical data. In fact, if one looks more broadly at the pharmacology of compounds within a series, that is look at a lot of biological impacts of hundreds to thousands of compounds within a series, the complexity of SAR is clearly evident. Our beloved compound's pharmacology is not so easy to understand if we open our analyses to reveal the complexity. All of us involved with a biology rich screening approach in one company have yielded our theoretical approaches to rely on empirical observations. So in short, the new technology, IMO, is best used to produce data to inform us about the next, rather traditional, empirical experiment rather than to meet the great promise of the salesmen. As darwin above said, these approaches are a tool, to be "used when appropriate".

Permalink to Comment

13. dnashave on March 17, 2009 3:02 PM writes...

derek, i recognize that you're a contrarian, but you're way off if you think sequencing cost has declined. human genome project: $10 billion. now:

Permalink to Comment

14. Anonymous on March 17, 2009 3:14 PM writes...

oops, my previous comment was truncated, maybe because of the "less than" symbol in it. doh!

sequencing cost is way down, a genome that cost $10 billion now costs $10,000, that's a million-fold decrease in under 20 years. it's beyond me that anyone would think this wasn't a tremendous decrease in price.

the other huge factor, often overlooked, is that so many of these genomes will be available for free online, with extensive clinical information attached. we'll have thousands within the next few years and millions in the next fifteen.

that's a jaw-dropping decrease in cost of information.

Permalink to Comment

15. Deadra Harner on July 20, 2012 12:29 AM writes...

My grand father constantly used to watch YouTube comical video clips, hehehehehe, for the reason that he wishes to be cheerful forever.

Permalink to Comment

16. Tiny Charette on March 29, 2014 12:53 PM writes...

Wonderful! The desert can be some day transformed into a lush land of fruits and vegetables. San Antonio Bankruptcy Attorney Reviews

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry