Thanks to an alert reader, I was put on to this paper in PNAS. It's from a team at Washington U. in St. Louis, and my fellow Cardinals fans are definitely stirring things up in the debate over "junk DNA" function and the ENCODE results. (The most recent post here on the debate covered the "It's functional" point of view - for links to previous posts on some vigorous ENCODE-bashing publications, see here).
This new paper, blogged about here at Homologus and here by one of its authors, Mike White, is an attempt to run a null-hypothesis experiment on transcription factor function. There are a lot of transcription factor recognition sequences in the genome. They're short DNA sequences that serve as flags for the whole transcription machinery to land and start assembling at a particular spot. Transcription factors themselves are the proteins that do the primary recognition of these sequences, and that gives them plenty to do. With so many DNA motifs out there (and so many near-misses), some of their apparent targets are important and real and some of them may well be noise. TFs have their work cut out.
What this new paper did was look at a particular transcription factor, Crx. They took a set of 1,300 sequences that are (functionally) known to bind it - 865 of them with the canonical recognition motifs and 433 of them that are known to bind, but don't have the traditional motif. They compared that set to 3,000 control sequences, including 865 of them "specifically chosen to match the Crx motif content and chromosomal distribution" as compared to that first set. They also included a set of single-point mutations of the known binding sequences, along with sets of scrambled versions of both the known binding regions and the matched controls above, with dinucleotide ratios held constant - random but similar.
What they found, first, was that the known binding elements do indeed drive transcription, as advertised, while the controls don't. But the ENCODE camp has a broader definition of function than just this, and here's where the dinucleotides hit the fan. When they looked at gene repression activity, they found that the 865 binders and the 865 matched controls (with Crx recognition elements, but in unbound regions of the genome) both showed similar amounts of activity. As the paper says, "Overall, our results show that both bound and unbound Crx motifs, removed from their genomic context, can produce repression, whereas only bound regions can strongly activate".
So far, so good, and nothing that the ENCODE people might disagree with - I mean, there you are, unbound regions of the genome showing functional behavior and all. But the problem is, most of the 1,300 random sequences also showed regulatory effects:
Our results demonstrate the importance of comparing the activity of candidate CREs (cis-regulatory elements - DBL) against distributions of control sequences, as well as the value of using multiple approaches to assess the function of CREs. Although scrambled DNA elements are unlikely to drive very strong levels of activation or repression, such sequences can produce distinct levels of enhancer activity within an intermediate range that overlaps with the activity of many functional sequences. Thus, function cannot be assessed solely by applying a threshold level of activity; additional approaches to characterize function are necessary, such as mutagenesis of TF binding sites.
In other words, to put it more bluntly than the paper does, one could generate ENCODE-like levels of functionality with nothing but random DNA. These results will not calm anyone down, but it's not time to calm down just yet. There are some important issues to be decided here - from theoretical biology all the way down to how many drug targets we can expect to have. I look forward to the responses to this work. Responses will most definitely be forthcoming.