We chemists have always looked at the chemical machinery of living systems with a sense of awe. A billion years of ruthless pruning (work, or die) have left us with some bizarrely efficient molecular catalysts, the enzymes that casually make and break bonds with a grace and elegance that our own techniques have trouble even approaching. The systems around DNA replication are particularly interesting, since that's one of the parts you'd expect to be under the most selection pressure (every time a cell divides, things had better work).
But we're not content with just standing around envying the polymerase chain reaction and all the rest of the machinery. Over the years, we've tried to borrow whatever we can for our own purposes - these tools are so powerful that we can't resist finding ways to do organic chemistry with them. I've got a particular weakness for these sorts of ideas myself, and I keep a large folder of papers (electronic, these days) on the subject.
So I was interested to have a reader send along this work, which I'd missed when it came out on PLOSONE. It's from Pehr Harbury's group at Stanford, and it's in the DNA-linked-small-molecule category (which I've written about, in other cases, here and here). Here's a good look at the pluses and minuses of this idea:
However, with increasing library complexity, the task of identifying useful ligands (the ‘‘needles in the haystack’’) has become increasingly difficult. In favorable cases, a bulk selection for binding to a target can enrich a ligand from non-ligands by about 1000-fold. Given a starting library of 1010 to 1015 different compounds, an enriched ligand will be present at only 1 part in 107 to 1 part in 1012. Confidently detecting such rare molecules is hard, even with the application of next-generation sequencing techniques. The problem is exacerbated when biologically-relevant selections with fold-enrichments much smaller than 1000-fold are utilized.
Ideally, it would be possible to evolve small-molecule ligands out of DNA-linked chemical libraries in exactly the same way that biopolymer ligands are evolved from nucleic acid and protein libraries. In vitro evolution techniques overcome the ‘‘needle in the haystack’’ problem because they utilize multiple rounds of selection, reproductive amplification and library re-synthesis. Repetition provides unbounded fold-enrichments, even for inherently noisy selections. However, repetition also requires populations that can self-replicate.
That it does, and that's really the Holy Grail of evolution-linked organic synthesis - being able to harness the whole process. In this sort of system, we're talking about using the DNA itself as a physical prod for chemical reactivity. That's also been a hot field, and I've written about some examples from the Liu lab at Harvard here, here, and here. But in this case, the DNA chemistry is being done with all the other enzymatic machinery in place:
The DNA brings an incipient small molecule and suitable chemical building blocks into physical proximity and induces covalent bond formation between them. In so doing, the naked DNA functions as a gene: it orchestrates the assembly of a corresponding small molecule gene product. DNA genes that program highly fit small molecules can be enriched by selection, replicated by PCR, and then re-translated into DNA-linked chemical progeny. Whereas the Lerner-Brenner style DNA-linked small-molecule libraries are sterile and can only be subjected to selective pressure over one generation, DNA-programmed libraries produce many generations of offspring suitable for breeding.
The scheme below shows how this looks. You take a wide variety of DNA sequences, and have them each attached to some small-molecule handle (like a primary amine). You then partition these out into groups by using resins that are derivatized with oligonucleotide sequences, and you plate these out into 384-well format. While the DNA end is stuck to the resin, you do chemistry on the amine end (and the resin attachment lets you get away with stuff that would normally not work if the whole DNA-attached thing had to be in solution). You put a different reacting partner in each of the 384 wells, just like in the good ol' combichem split/pool days, just with DNA as the physical separation mechanism.
In this case, the group used 240-base-pair DNA sequences, two hundred seventeen billion of them. That sentence is where you really step off the edge into molecular biology, because without its tools, generating that many different species, efficiently and in usable form, is pretty much out of the question with current technology. That's five different coding sequences, in their scheme, with 384 different ones in each of the first four (designated A through D), and ten in the last one, E. How diverse was this, really? Get ready for more molecular biology tools:
We determined the sequence of 4.6 million distinct genes from the assembled library to characterize how well it covered ‘‘genetic space’’. Ninety-seven percent of the gene sequences occurred only once (the mean sequence count was 1.03), and the most abundant gene sequence occurred one hundred times. Every possible codon was observed at each coding position. Codon usage, however, deviated significantly from an expectation of random sampling with equal probability. The codon usage histograms followed a log-normal distribution, with one standard deviation in log- likelihood corresponding to two-to-three fold differences in codon frequency. Importantly, no correlation existed between codon identities at any pair of coding positions. Thus, the likelihood of any particular gene sequence can be well approxi- mated by the product of the likelihoods of its constituent codons. Based on this approximation, 36% of all possible genes would be present at 100 copies or more in a 10 picomole aliquot of library material, 78% of the genes would be present at 10 copies or more, and 4% of the genes would be absent. A typical selection experiment (10 picomoles of starting material) would thus sample most of the attainable diversity.
The group had done something similar before with 80-codon DNA sequences, but this system has 1546, which is a different beast. But it seems to work pretty well. Control experiments showed that the hybridization specificity remained high, and that the micro/meso fluidic platform being used could return products with high yield. A test run also gave them confidence in the system: they set up a run with all the codons except one specific dropout (C37), and also prepared a "short gene", containing the C37 codon, but lacking the whole D area (200 base pairs instead of 240). When they mixed that in with the drop-out library (in a ratio of 1 to 384), and split that out onto a C-codon-attaching array of beads. They then did the chemical step, attaching one peptoid piece onto all of them except the C37 binding well - that one got biotin hydrazide instead. Running the lot of them past streptavidin took the ratio of the C37-containing ones from 1:384 to something over 35:1, an enhancement of at least 13,000-fold. (Subcloning and sequencing of 20 isolates showed they all had the C37 short gene in them, as you'd expect).
They then set up a three-step coupling of peptoid building blocks on a specific codon sequence, and this returned very good yields and specificities. (They used a fluorescein-tagged gene and digested the product with PDE1 before analyzing them at each step, which ate the DNA tags off of them to facilitate detection). The door, then, would now seem to be open:
Exploration of large chemical spaces for molecules with novel and desired activities will continue to be a useful approach in academic studies and pharmaceutical investigations. Towards this end, DNA-programmed combinatorial chemistry facilitates a more rapid and efficient search process over a larger chemical space than does conventional high-throughput screening. However, for DNA-programmed combinatorial chemistry to be widely adopted, a high-fidelity, robust and general translation system must be available. This paper demonstrates a solution to that challenge.
The parallel chemical translation process described above is flexible. The devices and procedures are modular and can be used to divide a degenerate DNA population into a number of distinct sub-pools ranging from 1 to 384 at each step. This coding capacity opens the door for a wealth of chemical options and for the inclusion of diversity elements with widely varying size, hydrophobicity, charge, rigidity, aromaticity, and heteroatom content, allowing the search for ligands in a ‘‘hypothesis-free’’ fashion. Alternatively, the capacity can be used to elaborate a variety of subtle changes to a known compound and exhaustively probe structure-activity relationships. In this case, some elements in a synthetic scheme can be diversified while others are conserved (for example, chemical elements known to have a particular structural or electrostatic constraint, modular chemical fragments that independently bind to a protein target, metal chelating functional groups, fluorophores). By facilitating the synthesis and testing of varied chemical collections, the tools and methods reported here should accelerate the application of ‘‘designer’’ small molecules to problems in basic science, industrial chemistry and medicine.
Anyone want to step through? If GSK is getting some of their DNA-coded screening to work (or at least telling us about the examples that did?), could this be a useful platform as well? Thoughts welcome in the comments.