About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Too Many Metrics | Main | What's a CEO Worth? »

August 23, 2013

Chemistry On The End of DNA

Email This Entry

Posted by Derek

We chemists have always looked at the chemical machinery of living systems with a sense of awe. A billion years of ruthless pruning (work, or die) have left us with some bizarrely efficient molecular catalysts, the enzymes that casually make and break bonds with a grace and elegance that our own techniques have trouble even approaching. The systems around DNA replication are particularly interesting, since that's one of the parts you'd expect to be under the most selection pressure (every time a cell divides, things had better work).

But we're not content with just standing around envying the polymerase chain reaction and all the rest of the machinery. Over the years, we've tried to borrow whatever we can for our own purposes - these tools are so powerful that we can't resist finding ways to do organic chemistry with them. I've got a particular weakness for these sorts of ideas myself, and I keep a large folder of papers (electronic, these days) on the subject.

So I was interested to have a reader send along this work, which I'd missed when it came out on PLOSONE. It's from Pehr Harbury's group at Stanford, and it's in the DNA-linked-small-molecule category (which I've written about, in other cases, here and here). Here's a good look at the pluses and minuses of this idea:

However, with increasing library complexity, the task of identifying useful ligands (the ‘‘needles in the haystack’’) has become increasingly difficult. In favorable cases, a bulk selection for binding to a target can enrich a ligand from non-ligands by about 1000-fold. Given a starting library of 1010 to 1015 different compounds, an enriched ligand will be present at only 1 part in 107 to 1 part in 1012. Confidently detecting such rare molecules is hard, even with the application of next-generation sequencing techniques. The problem is exacerbated when biologically-relevant selections with fold-enrichments much smaller than 1000-fold are utilized.

Ideally, it would be possible to evolve small-molecule ligands out of DNA-linked chemical libraries in exactly the same way that biopolymer ligands are evolved from nucleic acid and protein libraries. In vitro evolution techniques overcome the ‘‘needle in the haystack’’ problem because they utilize multiple rounds of selection, reproductive amplification and library re-synthesis. Repetition provides unbounded fold-enrichments, even for inherently noisy selections. However, repetition also requires populations that can self-replicate.

That it does, and that's really the Holy Grail of evolution-linked organic synthesis - being able to harness the whole process. In this sort of system, we're talking about using the DNA itself as a physical prod for chemical reactivity. That's also been a hot field, and I've written about some examples from the Liu lab at Harvard here, here, and here. But in this case, the DNA chemistry is being done with all the other enzymatic machinery in place:

The DNA brings an incipient small molecule and suitable chemical building blocks into physical proximity and induces covalent bond formation between them. In so doing, the naked DNA functions as a gene: it orchestrates the assembly of a corresponding small molecule gene product. DNA genes that program highly fit small molecules can be enriched by selection, replicated by PCR, and then re-translated into DNA-linked chemical progeny. Whereas the Lerner-Brenner style DNA-linked small-molecule libraries are sterile and can only be subjected to selective pressure over one generation, DNA-programmed libraries produce many generations of offspring suitable for breeding.

The scheme below shows how this looks. You take a wide variety of DNA sequences, and have them each attached to some small-molecule handle (like a primary amine). You then partition these out into groups by using resins that are derivatized with oligonucleotide sequences, and you plate these out into 384-well format. While the DNA end is stuck to the resin, you do chemistry on the amine end (and the resin attachment lets you get away with stuff that would normally not work if the whole DNA-attached thing had to be in solution). You put a different reacting partner in each of the 384 wells, just like in the good ol' combichem split/pool days, just with DNA as the physical separation mechanism.
In this case, the group used 240-base-pair DNA sequences, two hundred seventeen billion of them. That sentence is where you really step off the edge into molecular biology, because without its tools, generating that many different species, efficiently and in usable form, is pretty much out of the question with current technology. That's five different coding sequences, in their scheme, with 384 different ones in each of the first four (designated A through D), and ten in the last one, E. How diverse was this, really? Get ready for more molecular biology tools:

We determined the sequence of 4.6 million distinct genes from the assembled library to characterize how well it covered ‘‘genetic space’’. Ninety-seven percent of the gene sequences occurred only once (the mean sequence count was 1.03), and the most abundant gene sequence occurred one hundred times. Every possible codon was observed at each coding position. Codon usage, however, deviated significantly from an expectation of random sampling with equal probability. The codon usage histograms followed a log-normal distribution, with one standard deviation in log- likelihood corresponding to two-to-three fold differences in codon frequency. Importantly, no correlation existed between codon identities at any pair of coding positions. Thus, the likelihood of any particular gene sequence can be well approxi- mated by the product of the likelihoods of its constituent codons. Based on this approximation, 36% of all possible genes would be present at 100 copies or more in a 10 picomole aliquot of library material, 78% of the genes would be present at 10 copies or more, and 4% of the genes would be absent. A typical selection experiment (10 picomoles of starting material) would thus sample most of the attainable diversity.

The group had done something similar before with 80-codon DNA sequences, but this system has 1546, which is a different beast. But it seems to work pretty well. Control experiments showed that the hybridization specificity remained high, and that the micro/meso fluidic platform being used could return products with high yield. A test run also gave them confidence in the system: they set up a run with all the codons except one specific dropout (C37), and also prepared a "short gene", containing the C37 codon, but lacking the whole D area (200 base pairs instead of 240). When they mixed that in with the drop-out library (in a ratio of 1 to 384), and split that out onto a C-codon-attaching array of beads. They then did the chemical step, attaching one peptoid piece onto all of them except the C37 binding well - that one got biotin hydrazide instead. Running the lot of them past streptavidin took the ratio of the C37-containing ones from 1:384 to something over 35:1, an enhancement of at least 13,000-fold. (Subcloning and sequencing of 20 isolates showed they all had the C37 short gene in them, as you'd expect).

They then set up a three-step coupling of peptoid building blocks on a specific codon sequence, and this returned very good yields and specificities. (They used a fluorescein-tagged gene and digested the product with PDE1 before analyzing them at each step, which ate the DNA tags off of them to facilitate detection). The door, then, would now seem to be open:

Exploration of large chemical spaces for molecules with novel and desired activities will continue to be a useful approach in academic studies and pharmaceutical investigations. Towards this end, DNA-programmed combinatorial chemistry facilitates a more rapid and efficient search process over a larger chemical space than does conventional high-throughput screening. However, for DNA-programmed combinatorial chemistry to be widely adopted, a high-fidelity, robust and general translation system must be available. This paper demonstrates a solution to that challenge.

The parallel chemical translation process described above is flexible. The devices and procedures are modular and can be used to divide a degenerate DNA population into a number of distinct sub-pools ranging from 1 to 384 at each step. This coding capacity opens the door for a wealth of chemical options and for the inclusion of diversity elements with widely varying size, hydrophobicity, charge, rigidity, aromaticity, and heteroatom content, allowing the search for ligands in a ‘‘hypothesis-free’’ fashion. Alternatively, the capacity can be used to elaborate a variety of subtle changes to a known compound and exhaustively probe structure-activity relationships. In this case, some elements in a synthetic scheme can be diversified while others are conserved (for example, chemical elements known to have a particular structural or electrostatic constraint, modular chemical fragments that independently bind to a protein target, metal chelating functional groups, fluorophores). By facilitating the synthesis and testing of varied chemical collections, the tools and methods reported here should accelerate the application of ‘‘designer’’ small molecules to problems in basic science, industrial chemistry and medicine.

Anyone want to step through? If GSK is getting some of their DNA-coded screening to work (or at least telling us about the examples that did?), could this be a useful platform as well? Thoughts welcome in the comments.

Comments (13) + TrackBacks (0) | Category: Chemical Biology | Chemical News | Drug Assays


1. Anonymous on August 23, 2013 8:48 AM writes...

Artificial evolution by replication of naturally selected ligand libraries: This is the future of drug discovery. With small ligands we are a long way off from connecting DNA code to diverse chemical structures, but we already have all the system established to do this for protein drugs, so why don't we do this with various protein scaffolds besides antibodies? Delivery challenges could be solved in parallel.

Permalink to Comment

2. JB on August 23, 2013 8:56 AM writes...

You misread a slight detail there- one type of resin is used for the DNA capture, then the chemistry is done on a generic anion exchange resin that is not sequence specific but is more broadly applicable to derivatization chemistry. It's somewhat complicated, the details are in reference 23 which oddly is missing part of the reference information like, you know, journal, but it's also PLOS one:
On another note, it's interesting but not surprising that you have crossover of people between the groups trying various approaches. One of the authors on this used to be in the Liu lab, one of the employees at Liu's company used to be at Praecis, one of Liu's postdocs went to Praecis at one point, there are others...

Permalink to Comment

3. med on August 23, 2013 10:11 AM writes...

This technology is awesome in terms of number of compouds screened. I run the target validation in cells after th einitial screen. But our chemists run into issues such as unexpected chemistries and poor solubility when you try to make larger amounts. I see a fair amount of toxicity too due to lack of an initial cell based screen. But based on what Ive seen so far it has a lot of potential if cytotox coudl be worked into a screen prior to making the bigger batch

Permalink to Comment

4. inthegame on August 23, 2013 2:28 PM writes...

I think harbury is mistaken regarding his claim that a binder may only be enriched 1000x. In a normal case k = complex/protein*ligand. A nanomolar binder will be enriched 1000 fold more than a micromolar binder. However, the praecis way used 3 rounds of selection. Each round leads to another 1000 fold enrichment assuming the use of nanomolar amounts of protein.

Permalink to Comment

5. noname on August 23, 2013 2:50 PM writes...

I agree with #4. With regular advances in sequencing throughput, and multiple selection rounds, finding the needle in the haystack isn't a problem. Maybe with libraries up to 10^15 it could be a problem, but libraries that large are not useful for other reasons (e.g. high MW).

Permalink to Comment

6. Nicolas Tilmans on August 23, 2013 4:05 PM writes...

I'm actually finishing up my thesis in the Harbury lab on this topic. We're working on a few papers to follow-up on this work that we are very excited about. It's a fantastic technology and I think that these DNA-Tethered library approaches are starting to really come into their own.

It's hard to know what's going on inside GSK/Praecis, but rumor has it they're still working on leads from their DNA-encoded libraries. Ensemble Therapeutics, the company based off of the Liu lab's work, seems to be signing co-development contracts with a number of companies so they seem to be doing OK as well. Again, it's hard to tell from the outside.

There are a few advantages to DNA Programmed Combinatorial Chemistry (DPCC) when compared to the Praecis and Liu Lab platforms.

Chemistry-wise, Praecis and DPCC can use off-the-shelf reagents for the chemistry, but the Liu system requires each monomer in the combinatorial scheme to be tethered to an oligo. This is cumbersome, so you'll notice they use relatively few monomers and generate relatively small libraries (13,800 members is the largest I know of). Nonetheless, they've gotten pretty cool results, including some reaction discovery work!

As was highlighted in the OP, DPCC and the Liu system can be iteratively selected, amplified, re-translated and selected again. The Praecis system can only go through one synthesis-selection cycle. They do take the selected material and re-subject it to selection a few times, but you loose material every time so there's a limit to the number of pseudo-cycles they can do without re-synthesizing the library. This might be enough for if you have sufficient starting material and a selection that provides massive enrichments, but as the library size increases and/or selective pressure weakens this method becomes more challenging.

This is especially true when the theoretical complexity of a library becomes larger than the number of molecules you can realistically synthesize. At that point you need to select a library, mix up the genes that make it through, then go for another round of selection. This is how nature makes better finches, and how macromolecule-based in-vitro evolution strategies (RNA aptamers, mRNA-display, Phage-display, etc.) generate tight binders. It stands to reason it might also work for small-molecules, but you can't do that if you can't iterate the synthesis-selection process.

med's comment on screening against cytotox raises the weak/noisy selection problem. You can use these systems to counter-select for undesirable properties, or string several selections together to ask for co-occurence of certain properties. For example, maybe you pass the whole library over an immobilized membrane column to enrich for things that aren't too greasy/too polar to get past a cell membrane. That's likely to be a weak and imperfect selection, so unless you can do it multiple times, you're not going to get meaningful results.

That said, the GSK/Praecis stuff is amazing. This stuff is very new, so we're all still figuring out how to run these technologies quickly and easily. They've really pushed the field forward in a big way, and maybe they can do these things "good-enough" without iteration, that's why we do experiments!

In short, DPCC combines the versatile chemistry of the Praecis method with the power of iterative selection cycles. I'm clearly biased, but I think this combination will change the way we look at small-molecule discovery. What if getting a decent small-molecule binder was as standard as getting a research-grade antibody/scFv peptide? This set of papers is a major step, there's still a lot of work to do, but I think DPCC definitely holds that promise.

Permalink to Comment

7. Nicolas Tilmans on August 23, 2013 5:01 PM writes...

Some thoughts on comments 4 and 5, in practice getting 1000x enrichment is hard. I know it shouldn't be, however in the lab's experience it's pretty easy to get 100x selections, but boosting that another factor of 10 is challenging. It's not just about Kd, it's about on-rates, off-rates and off-target binding events. For example if you have a diffusion limited on-rate, the off rate for a nanomolar binder is on the order of 10's of seconds. If your manipulation takes too long, as you wash off excess library, you'll also loose your binder. DNA (or the molecules you've made on the end of it) can stick to surfaces and proteins non-specifically so you're trying to get the selection for the desired property 1000x above that background. Maybe the DNA gets modified non-specifically such that some fraction of the library gets selected independently of the molecules it represents. It's not enough just to have a low Kd and a slow off-rate, you have to have very few off-target interactions.

Remember that 1:1000 enrichment means that if you start with 1:1 of good vs bad stuff, you end up with 99.9% pure good material. Not that many purification schemes get you that in one go. Obviously it's not impossible, but it's not that easy.

HTS is a required technology for these platforms to work. As you get more reads, you do need fewer rounds of selection, and a single round is clearly enough for some applications. However, it's not as easy to characterize the pre-selection library with 1 billion members or so, and other parts of the process could throw off your sequencing data. If there are variations in the initial DNA population, or sequences that don't amplify well, or don't sequence well, those will all change the result in a single round of selection such that the molecule with the most reads isn't necessarily the best molecule. Going through multiple rounds averages out those effects, so you get a better idea of what compounds to re-synthesize afterwards. Ideally, you'd get to the point where you just take the top 10 molecules from your sequencing results sorted by enrichment ratio and they're all awesome binders. We're not there yet, but I think iteration is part of making that happen down the line.

Finally, comment 5 is right about the library size, 10^15 gets hard to sequence, period. But see my previous comment about under sampled libraries. A tube with 10^15 possibilities where each molecule is present a single time contains about 1.6 nanomoles of material. To really be sure each molecule was represented once, you'd need quite a bit more than that. If you look at the macromolecular in-vitro evolution literature, 10^15 is just about the upper limit of how many molecules end up getting made. For peptides, that number is usually 10x or 100x lower than that. Basically, with that size of library, you're almost certainly under sampling the library. To have a chance at sampling everything you have to select once, shuffle the genes that make it through, select again. You can't do that without iteration.

That said, I could be wrong, and again, Praecis/GSK are getting really impressive results with just one round. Time will tell, and we're pretty excited to find out!

Permalink to Comment

8. a on August 25, 2013 9:00 AM writes...

Nicholas: thanks for your insightful comments.

Permalink to Comment

9. JB on August 26, 2013 10:18 AM writes...

Just a summary of the different pros and cons:
GSK/Lerner: Easiest operationally (Split, react, DNA precipitate, pool); very large hypothetical libraries; likely bias in reaction progress (although data interpretation can resolve); more limited reactions.
Harbury: Large libraries, more versatile chemical reaction toolkit, split-pool method more difficult.
Liu: Smaller libraries, difficult reagent prep; access molecules not possible with other methods (e.g. intramolecular reactions); probably best purity due to selection step for bond formation.
Vipergen? Nuevolution?

As far as the size of the Liu libraries, the largest published is 13.8k, but you don't think the company using the technology is going to report their libraries? They claim >10M on their website. But notice they're not even focused on the library size but the class of molecules enabled by that particular technology.

Permalink to Comment

10. Nicolas Tilmans on August 26, 2013 2:28 PM writes...

I think that summary is about right.

I would say that the GSK/Lerner and Harbury platforms should be capable of running the same chemistries. There's no reason you couldn't take a GSK library, put in on a support, do the chemistry in organic solvent, elute it and tag with your DNA at that point. Similarly, we're able to do solution-phase chemistry by just eluting a hybridized well into a 384 well plate and running the chemistry then. It's less ideal, but it works. The true advantage to the Harbury method is the ability to apply iterative selections while still using a wide variety of chemistries. As you point out, this comes at the expense of a more complex splitting step.

I'm not sure how Ensemble has put together their library. It's possible that they have invested the time in making a ton of DNA-monomer reagents. It's possible that they are just using longer syntheses with a smaller set of building blocks. Some of Liu's more recent publications show that you can use strand displacement to put together more building blocks. You're right, they're quite vague about this, focusing more on the nature of the molecules they've made.

Vipergen/Nuevolution I'm not so sure about. My understanding of their platforms is that they are similar to the Liu approach. As such, they would have many of the same pros/cons. Vipergen seems to require even more molecular biology than does the Harbury approach, which may be a downside. Neither has published very much, and I haven't heard many rumors so I don't know where they stand.

It will be interesting to see where this technology goes, I think there's room for many approaches to be successful!

Permalink to Comment

11. Jim Hartley on August 27, 2013 2:14 PM writes...

Can anyone comment on the use of competitors in screens against a purified protein? Does one ordinarily add something like BSA to select against hydrophobic interactions? Any other generic measures that are commonly used?

Permalink to Comment

12. JB on August 27, 2013 7:36 PM writes...

Free DNA, typically sheared genomic like salmon sperm. You can add a specific competitor if you want to bias for binders to alternate pockets but I heard that doesn't work too well.

Permalink to Comment

13. Nicolas Tilmans on September 2, 2013 1:35 AM writes...

We use a mix of BSA and yeast tRNA during selections. The tRNA is nice because it has no shot at amplifying in the PCR and you can get rid of it by base hydrolysis at any time, all while still blocking for "general nucleic acid interaction". We haven't experimented much with other blocking strategies but you can imagine how that would be done pretty easily. I'm curious about JB's point about specific competitors to block alternate pockets, why hasn't that been successful? Seems counterintuitive.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry