About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Does Baldness Get More Funding Than Malaria? | Main | GlaxoSmithKline's CEO on the Price of New Drugs »

March 15, 2013

More ENCODE Skepticism

Email This Entry

Posted by Derek

There's another paper out expressing worries about the interpretation of the ENCODE data. (For the last round, see here). The wave of such publications seems to be largely a function of how quickly the various authors could assemble their manuscripts, and how quickly the review process has worked at the various journals. You get the impression that a lot of people opened up new word processor windows and started typing furiously right after all the press releases last fall.

This one, from W. Ford Doolittle at Dalhousie, explicitly raises a thought experiment that I think has occurred to many critics of the ENCODE effort. (In fact, it's the very one that showed up in a comment here to the last post I did on the subject). Here's how it goes: The expensive, toxic, only-from-licensed-sushi-chefs puffer­fish (Takifugu rubripes) has about 365 million base pairs, with famously little of it looking like junk. By contrast, the marbled lungfish (Protopterus aethiopicus) has a humungous genome, 133 billion base pairs, which is apparently enough to code for three hundred different puffer fish with room to spare. Needless to say, the lungfish sequence features vast stretches of apparent junk DNA. Or does it need saying? If an ENCODE-style effort had used the marbled lungfish instead of humans as its template, would it have told us that 80% of its genome was functional? If it had done the pufferfish simultaneously, what would it have said about the difference between the two?

I'm glad that the new PNAS paper lays this out, because to my mind, that's a damned good question. One ENCODE-friendly answer is that the marbled lungfish has been under evolutionary pressure that the fugu pufferfish hasn't, and that it needs many more regulatory elements, spacers, and so on. But that, while not impossible, seems to be assuming the conclusion a bit too much. We can't look at a genome, decide that whatever we see is good and useful just because it's there, and then work out what its function must then be. That seems a bit too Panglossian: all is for the best in the best of all possible genomes, and if a lungfish needs one three hundreds times larger than the fugu fish, well, it must be three hundred times harder to be a lungfish? Such a disparity between the genomes of two organisms, both of them (to a first approximation) running the "fish program", could also be explained by there being little evolutionary pressure against filling your DNA sequence with old phone books.

Here's an editorial at Nature about this new paper:

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”

He's right - the ENCODE team could have presented their results differently, but doing that would not have made a gigantic splash in the world press. There wouldn't have been dozens of headlines proclaiming the "end of junk DNA" and the news that 80% of the genome is functional. "Scientists unload huge pile of genomic data analysis" doesn't have the same zing. And there wouldn't have been the response inside the industry that has, in fact, occurred. This comment from my first blog post on the subject is still very much worth keeping in mind:

With my science hat on I love this stuff, stepping into the unknown, finding stuff out. With my pragmatic, applied science, hard-nosed Drug Discovery hat on, I know that it is not going to deliver over the time frame of any investment we can afford to make, so we should stay away.

However, in my big Pharma, senior leaders are already jumping up and down, fighting over who is going to lead the new initiative in this exciting new area, who is going to set up a new group, get new resources, set up collaborations, get promoted etc. Oh, and deliver candidates within 3 years.

Our response to new basic science is dumb and we are failing our investors and patients. And we don't learn.

Comments (16) + TrackBacks (0) | Category: Biological News


1. Anonymous on March 15, 2013 7:20 AM writes...

What does the energy needed to replicate all those lungfish base pairs mean to the lungfish in practical terms? If its genome were 100x smaller, would the lungfish need to eat less to maintain normal cell turnover? 1% less? 10% less? I wonder how much pressure there is to maintain a small genome in simple terms.

Permalink to Comment

2. Anonymous BMS Researcher on March 15, 2013 7:24 AM writes...

I think small viral genomes show us what happens when there really is selection pressure on genome size: they have things like overlapping reading frames, funky codon slippage that makes alternate translations, etc.

Such genomes remind me of what programmers did back in the days when RAM came in kilobytes and people cared about conserving it. For instance, Bill Gates changed the prompt of his BASIC from "READY" to "OK" to save three bytes.

Eukaryote genomes remind me of current software, when RAM comes in gigabytes and nobody cares much about conserving it.

Permalink to Comment

3. Morten G on March 15, 2013 9:13 AM writes...

I heartily recommend The Logic of Chance: The Nature and Origin of Biological Evolution by Eugene Koonin. Even if you took some genetics courses 10 years ago so much has happened since that it really makes sense to read his book (it's from 2011).
He nicely explains the current understanding of genome expansion, regulatory RNAs etc.
Bit expensive so buy it through work - it's relevant. Even if it isn't written as a text book.

Permalink to Comment

4. MDA Student on March 15, 2013 9:33 AM writes...

As a non-evobiologist I ask, has there been any work into the idea that there times where it is "non functional" in terms of protein expression, but functional in terms of a macro-scale structure? Maybe the "extra" exposes or protect other sequences, kind of like a pseudo histone? I imagine a house on cinder blocks. It you remove the cinder blocks it stills looks and functions like a house. But if a specific type of event occurs (flooding), having those cinder blocks makes all the difference.

I took Dr. Graur's class some time ago as an undergrad. If you happen to be reading this, keep rocking those crazy glasses chains.

Permalink to Comment

5. David M. on March 15, 2013 9:55 AM writes...

One simple answer is that the lungfish and its ancestors have been overall pretty terrible at meiosis. Opps, my chromosomes didn't separate toward opposite poles.

another interesting thing would be to look at the velocity AND fidelity of each fish's polymerase enzymes, especially the lungfish. If >80% of your genome isn't useful at all, your polymerase can blaze through it while sacrificing some fidelity.

The mutations introduced by a less faithful polymerase will mostly end up in "junk" regions by sheer probability, but those that are created in coding or regulatory regions will then have the opportunity to be selected for under pressure. If they are bad, they are removed, neutral they may be lost or kept, or beneficial, selected for. The real question is with that massive a genome, what might having a less faithful polymerase do in the context of natural selection?

OR, does the lungfish need to have massive amounts of polymerase available at all times just to replicate its DNA, and this somehow creates another advantage, such as "error checking"?

Permalink to Comment

6. Bob on March 15, 2013 11:02 AM writes...

Off topic (apologies) but I hear rumours of white tents being erected at all AZ R&D sites for Monday meeting, all staff. Oh dear.

Permalink to Comment

7. Anonymous on March 15, 2013 1:17 PM writes...

He's right - the ENCODE team could have presented their results differently, but doing that would not have made a gigantic splash in the world press.

Does anything more really need to be said?

Permalink to Comment

8. Anonymous on March 15, 2013 1:19 PM writes...

If an ENCODE-style effort had used the marbled lungfish instead of humans as its template, would it have told us that 80% of its genome was functional?

The rebuttal (which I don't necessarily wholeheartedly endorse) is that almost 100% of the components of a Rube Goldberg machine ("Heath Robinson machine" for UK-ers) are functional and necessary - if you remove any one of them the machine no longer works. That doesn't mean, however, that you couldn't do the same thing as well or better with a machine that had fewer components.

So it could very well be that the marbled lungfish has an extremely convoluted gene expression system where epicycles on epicyles are needed in order to get normal gene function, whereas the puffer fish cut out the bell-bird-toy car-toaster circuit and replaced it with a ten-cent relay from Radio Shack. (Meaning that it's the *puffer* that was subject to evolutionary pressure that the lungfish hasn't been.)

Permalink to Comment

9. ptm on March 15, 2013 2:18 PM writes...

There can only be one answer to this puzzle. Lungfish are in fact the most advanced animals on this planet. They have tens of different advanced and specialized forms, the fish form only serves as a means of concealment so as not to draw our unwanted barbaric attention to their refined pursuits.

Permalink to Comment

10. Sam on March 15, 2013 2:27 PM writes...

Interesting question!

@David M, your logic is flawed... junk DNA is typical interspersed and actually recombines more with itself, so it can either be lost or expanded during meiosis...

I don't think anyone has looked at polymerase fidelity with respect to junk DNA content, but that's another interesting question. Also, do they have more/better DNA repair enzymes?

FWIW, trees have much bigger, polyploid genomes, than vertebrates, and a lot more junk DNA. Which might not make sense in terms of replication efficiency, if that was all that mattered. But DNA isn't just there to be replicated.

I like the cinder block analogy from MDA Student, and the Rube Goldberg one from Anonymous is also a smart one.

Permalink to Comment

11. Johannes on March 15, 2013 2:48 PM writes...

Non-coding, non-regulatory DNA: More resistance to mutations (less % chance to hit a functionel NT)

Permalink to Comment

12. MH on March 15, 2013 7:16 PM writes...

It strikes me that some of the comments here run the risk of making the same mistake the ENCODE team did. If it is a stretch to say that 80% of DNA is functional it is equally a stretch to claim that there is little information in the non-coding part. Non-coding RNAs are clearly important (think ribosomes, spliceosomes and miRNA) and epigenetic marks are also likely to be relevant on occasion. The trick is figuring out what part of the metagenome is meaningful and a global accounting of this is surely a logical place to start. How to leverage this in drug discovery is a challenge and care is clearly warranted in the near term (especially given the modest utility of exploratory transcriptome and proteomics analyses.)

Permalink to Comment

13. metaphysician on March 16, 2013 9:15 AM writes...

Here's a postulate, mostly uninformed: the evolutionary cost of greater DNA copying difficulty, and the evolutionary benefit of greater resistance to mutation, are roughly equal. If so, any change in overall genome length would be preserved, since the length wouldn't actually effect the fitness of the species. Thus, based on chance, you'd get some with really big genomes, and some with really small genomes ( out to certain limits, presumably ).

Permalink to Comment

14. Roger Shrubber on March 17, 2013 8:53 PM writes...

"What is the energy cost?" of all that extra DNA?

The metabolic cost of DNA replication and transcription is well below 1% of the a eukaryotic cell's energy budget. Basically, it's lost in the noise. And sequence specificity has to be understood as a broad continuum. Non-functional yet highly specific binding is expected. Non-functional yet tissue specific transcription is expected. And anyone trained in the fundamentals of biochemistry and enzymology show know this.

Permalink to Comment

15. MIMD on March 17, 2013 10:18 PM writes...

This might be of interest:

"Junk No More" - Yale Medicine -

Permalink to Comment

16. Brian Krueger, PhD on March 20, 2013 9:17 AM writes...

That evolutionary biologists have gotten stuck on one somewhat misguided soundbyte is very disappointing. There is so much valuable information in the ENCODE dataset. Adding ENCODE tracks to my own ChIP-Seq datasets has been invaluable for hypothesis generation. Other researchers are using those tracks to determine how non-coding regions of protein bound DNA contribute to disease. Why are evolutionary biologists arguing about the semantics of "what is functional" and smearing the entire project in the process? Many of us find value in the 99% of the ENCODE project that doesn't relate to this "tempest in a teapot" argument.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry