There's another paper out expressing worries about the interpretation of the ENCODE data. (For the last round, see here). The wave of such publications seems to be largely a function of how quickly the various authors could assemble their manuscripts, and how quickly the review process has worked at the various journals. You get the impression that a lot of people opened up new word processor windows and started typing furiously right after all the press releases last fall.
This one, from W. Ford Doolittle at Dalhousie, explicitly raises a thought experiment that I think has occurred to many critics of the ENCODE effort. (In fact, it's the very one that showed up in a comment here to the last post I did on the subject). Here's how it goes: The expensive, toxic, only-from-licensed-sushi-chefs pufferfish (Takifugu rubripes) has about 365 million base pairs, with famously little of it looking like junk. By contrast, the marbled lungfish (Protopterus aethiopicus) has a humungous genome, 133 billion base pairs, which is apparently enough to code for three hundred different puffer fish with room to spare. Needless to say, the lungfish sequence features vast stretches of apparent junk DNA. Or does it need saying? If an ENCODE-style effort had used the marbled lungfish instead of humans as its template, would it have told us that 80% of its genome was functional? If it had done the pufferfish simultaneously, what would it have said about the difference between the two?
I'm glad that the new PNAS paper lays this out, because to my mind, that's a damned good question. One ENCODE-friendly answer is that the marbled lungfish has been under evolutionary pressure that the fugu pufferfish hasn't, and that it needs many more regulatory elements, spacers, and so on. But that, while not impossible, seems to be assuming the conclusion a bit too much. We can't look at a genome, decide that whatever we see is good and useful just because it's there, and then work out what its function must then be. That seems a bit too Panglossian: all is for the best in the best of all possible genomes, and if a lungfish needs one three hundreds times larger than the fugu fish, well, it must be three hundred times harder to be a lungfish? Such a disparity between the genomes of two organisms, both of them (to a first approximation) running the "fish program", could also be explained by there being little evolutionary pressure against filling your DNA sequence with old phone books.
Here's an editorial at Nature about this new paper:
There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”
He's right - the ENCODE team could have presented their results differently, but doing that would not have made a gigantic splash in the world press. There wouldn't have been dozens of headlines proclaiming the "end of junk DNA" and the news that 80% of the genome is functional. "Scientists unload huge pile of genomic data analysis" doesn't have the same zing. And there wouldn't have been the response inside the industry that has, in fact, occurred. This comment from my first blog post on the subject is still very much worth keeping in mind:
With my science hat on I love this stuff, stepping into the unknown, finding stuff out. With my pragmatic, applied science, hard-nosed Drug Discovery hat on, I know that it is not going to deliver over the time frame of any investment we can afford to make, so we should stay away.
However, in my big Pharma, senior leaders are already jumping up and down, fighting over who is going to lead the new initiative in this exciting new area, who is going to set up a new group, get new resources, set up collaborations, get promoted etc. Oh, and deliver candidates within 3 years.
Our response to new basic science is dumb and we are failing our investors and patients. And we don't learn.