You'll have heard about the massive data wave that hit (30 papers!) courtesy of the ENCODE project. That stands for Encyclopedia of DNA Elements, and it's been a multiyear effort to go beyond the bare sequence of human DNA and look for functional elements. We already know that only around 1% of the human sequence is made up of what we can recognize as real, traditional genes: stretches that code for proteins, have start and stop codons, and so on. And it's not like that's so straightforward, either, what with all the introns and whatnot. But that leaves an awful lot of DNA that's traditionally been known by the disparaging name of "junk", and sure it can't just be that - can it?
Some of it does its best to make you think that way, for sure. Transposable elements like Alu sequences, which are repeated relentlessly hundreds of thousands of times throughout the human DNA sequence, must either be junk, inert spacer, or so wildly important that we just can't have too many copies of them. But DNA is three-dimensional (and how), and its winding and unwinding is crucial to gene expression. Surely a good amount of that apparently useless stuff is involved in these processes and other epigenetic phenomena.
And the ENCODE group has indeed discovered a lot of this sort of thing. But as this excellent overview from Brendan Maher at Nature shows, it hasn't discovered quite as many as the headlines might lead you to think. (And neither has it demolished the idea that all the 99% of noncoding DNA is junk, because you can't find anyone who believed that one, either). The figure that's in all the press writeups is that this work has assigned functions for 80% of the human genome, which would be an astonishing figure on several levels. For one thing, it would mean that we'd certainly missed an awful lot before, and for another, it would mean that the genome is a heck of a lot more information-rich than we ever thought it might be.
But neither of those quite seem to be the case. It all depends on what you mean by "functional", and opinions most definitely vary. See this post by Ed Yong for some of the categories. which range out to some pretty broad, inclusive definitions of "function". A better estimate is that maybe 20% of the genome can directly influence gene expression, which is very interesting and useful, but ain't no 80%, either. That Nature post provides a clear summary of the arguments about these figures.
But even that more-solid 20% figure is going to keep us all busy for a long time. Learning how to affect these gene transcription mechanisms is going should be a very important route to new therapies. If you remember all the hype about how the genome was going to unlock cures to everything - well, this is the level we're actually going to have to work at to make anything in that line come true. There's a lot of work to be done, though. Somehow, different genes are expressed at different times, in different people, in response to a huge variety of environmental cues. It's quite a tangle, but in theory, it's a tangle that can be unraveled, and as it does, it's going to provide a lot of potential targets for therapy. Not easy targets, mind you - those are probably gone - but targets nonetheless.
One of the best ways to get a handle on all this work is this very interesting literature experiment at Nature - a portal into the ENCODE project data, organized thematically, and with access to all the papers involved across the different journals. If you're interested in epigenetics at all, this is a fine place to read up on the results of this work. And if you're not, it's still worth exploring to see how the scientific literature might be presented and curated. This approach, it seems to me, potentially adds a great deal of value. Eventually, the PDF-driven looks-like-a-page approach to the literature will go extinct, and something else will replace it. Some of it might look a bit like this.
Note, just for housekeeping purposes - I wrote this post for last Friday, but only realized today that it didn't publish, thus the lack of an entry that day. So here it is, better late, I hope, than never. There's more to say about epigenetics, too, naturally. . .