Corante

About this Author
Derek Lowe
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Pharma Blogs:
The Science Business
Org Prep Daily
Kilomentor
On Pharma
Kinase Pro
Chemical Quantum Images
The LouRoe
One in Ten Thousand
Periodic Tabloid
Chemical Musings
C&E News Blog
Chemiotics II
Noel O'Blog
In Vivo Blog
Chirality
BBSRC/Douglas Kell
Drug Discovery Opinion
The Chemblog
Realizations in Biostatistics
Heterocyclic Chemistry Blog
Molecule of the Day
Chemjobber
WSJ Health Blog
PK/PD
Social Detritus
ChemSpider Blog
Node in the Noosphere
Pharmagossip
Organometallic Current
Useful Chemistry
Great Molecular Crapshoot
No Name No Slogan
Post Doc Ergo Propter Doc
SimBioSys
Culture of Chemistry
The Curious Wavefunction
Chemical Sabbatical
Totally Synthetic
Molecular Philosophy
Zusammen
Pharma's Cutting Edge
My Chemical Journey
The F- Blog
Chemical Professionals
Generally Chemistry
Chemistry World Blog
Eigenfunction/Eigenvalue
Synthesizing Ideas
Carbon-Based Curiosities
Business|Bytes|Genes|Molecules
Eye on FDA
Sigma-Aldrich ChemBlogs
Peter Murray-Rust
Chemical Forums
Depth-First
Curly Arrow
ChemCafe
Power of Goo
Fetz the Chemist
Carbon Tet
Chemical Crosspatch
Sceptical Chymist
Atomchuxky
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa
Making Graphite Work
Realm of Organic Synthesis
Liquid Carbon
Pharma Blog Review


Science Blogs and News:
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Life of a Lab Rat
Nobel Intent
SciTech Daily
Is This Thing On?
Science Blog
Eastern Blot
FuturePundit
Flags and Lollipops
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Terra Sigillata
Transterrestrial Musings
Slashdot Science
A Scientist's Life
Living the Scientific Life
Humans in Science
Speculist
Science, Shrimp and Grits
Cosmic Variance
The Capsule
Zeroth Order Approximation
Science Library Blog
Biology News Net


Medical Blogs
Med Tech Sentinel
DB's Medical Rants
Science-Based Medicine
GruntDoc
The Health Care Blog
Respectful Insolence
Black Triangle
Diabetes Mine


Economics and Business
Marginal Revolution
Arnold Kling
The Volokh Conspiracy
Knowledge Problem
The Stalwart


Politics / Current Events
Virginia Postrel
Tinkerty Tonk
Instapundit
Megan McArdle
Mickey Kaus
Colby Cosh
Alien Corn
No Watermelons


Belles Lettres
Two Blowhards
Critical Mass
Arts and Letters Daily
God of the Machine
Armavirumque
About Last Night

In the Pipeline

Category Archives

July 16, 2009

The Further In You Go, The Bigger It Gets

Email This Entry

Posted by Derek

I had a printout of the structure of maitotoxin on my desk the other day, mostly as a joke to alarm anyone who came into my office. "Yep, here's the best hit from the latest screen. . .I hear that you're on the list to run the chemistry end. . .what's that you say?"
Maitotoxin.jpg
This is, needless to say, one of the largest and scariest marine natural product structures ever determined (and that determination has been no stroll past the dessert table, either).

But that' hasn't stopped people from messing around with it. And there's much speculation that other people are strongly considering messing around with it, too - you synthetic chemists can guess the sorts of people that this might be, and their names, and what it might be like to sit through the seminars that result, and so on.

I fear that a total synthesis of maitotoxin would be largely a waste of time, but I'm willing to hear arguments against that position. Just looking at it, though, inspires thought. This eldrich beastie has 98 chiral centers. So let's do some math. If you're interested in the SAR of such molecules, you have your choice of (two to the 98th) possible isomers, which comes out to a bit over (3 times ten to the 29th) compounds. This is. . .a pretty large number. If you're looking for 10mg of each isomer to add to your screening collection (no sense in going back and making them again), then you're looking at a good bit over half the mass of the entire Earth. And that's just in sheer compounds; we're not counting the weight of vials, which will, I'd say, safely move you up toward the planetary weight of a low-end gas giant. We will ignore shelving considerations in the interest of time.

Recall that yesterday's post gave a number of about 27 million compounds below 11 heavy atoms. You could toss 27 million compounds into a collection of ten to the 29th and never see them again, of course. But that brings up two points: one, that the small-compound estimate ignores stereochemistry, and we've been getting those insane maitotoxin numbers by considering nothing but. The thing is, with only 11 non-hydrogen atoms, there aren't quite as many chances for things to get out of control. The GDB compound set goes up only to 110 million or so if you consider stereoisomers, which actually isn't nearly as much as I'd thought.

But the second point is that this shows you why the Berne group stopped at 11 heavy atoms, because the problem becomes intractable really fast as you go higher. It's worth remembering that the GDB people actually threw out over 98% of their scaffolds because they represented potential ring structures that are too strained to be very stable. And they only considered C, N, O and F as heavy atoms (even adding sulfur was considered too much to deal with, computationally). Then they tossed out another 98 or 99% of the structures that emerged from that enumeration as reactive and/or unstable. Relax your standards a bit, allow another atom or two, bump up the molecular weight, do any of those and you're going to exceed anyone's computational capacity. Update: the Berne group has just taken a crack at it, and managed a reasonable set up to 13 heavy atoms, with various simplifying assumptions to ease the burden. If you want to mess around with it, it's here, free of charge).

No, there are a lot of compounds out there. And if you look at the really big ones - and maitotoxin is nothing if not a really big one - there are whole universes contained just in each of them. (Bonus points for guessing the source of the name of the post, by the way).

Comments (24) + TrackBacks (0) | Category: Chemical News | In Silico

July 15, 2009

Why Does Screening Work At All? (Free Business Proposal Included!)

Email This Entry

Posted by Derek

I've been meaning to get around to a very interesting paper from the Shoichet group that came out a month or so ago in Nature Chemical Biology. Today's the day! It examines the content of screening libraries and compares them to what natural products generally look like, and they turn up some surprising things along the way. The main question they're trying to answer is: given the huge numbers of possible compounds, and the relatively tiny fraction of those we can screen, why does high-throughput screening even work at all?

The first data set they consider is the Generated Database (GDB), a calculated set of all the reasonable structures with 11 or fewer nonhydrogen atoms, which grew out of this work. Neglecting stereochemistry, that gives you between 26 and 27 million compounds. Once you're past the assumptions of the enumeration (which certainly seem defensible - no multiheteroatom single-bond chains, no gem-diols, no acid chlorides, etc.), then there are no human bias involved: that's the list.

The second list is everything from the Dictionary of Natural Products and all the metabolites and natural products from the Kyoto Encyclopedia of Genes and Genomes. That gives you 140,000+ compounds. And the final list is the ZINC database of over 9 million commercially available compounds, which (as they point out) is a pretty good proxy for a lot of screening collections as well.

One rather disturbing statistic comes out early when you start looking at overlaps between these data sets. For example, how many of the possible GDB structures are commercially available? The answer: 25,810 of them - in other words, you can only buy fewer than 0.01% of the possible compounds with 11 heavy atoms or below, making the "purchasable GDB" a paltry list indeed.

Now, what happens when you compare that list of natural products to these other data sets? Well, for one thing, the purchasable part of the GDB turns out to be much more similar to the natural product list than the full set. Everything in the GDB has at least 20% Tanimoto similarity to at least one compound in the natural products set, not that 20% means much of anything in that scoring system. But only 1% of the GDB has a 40% Tanimoto similarity, and less than 0.005% has an 80% Tanimoto similarity. That's a pretty steep dropoff!

But the "purchasable GDB" holds up much better. 10% of that list has 100% Tanimoto similarity (that is, 10% of the purchasable compounds are natural products themselves). The authors also compare individual commercial screening collections. If you're interested, ChemBridge and Asinex are the least natural-product-rich (about 5% of their collections), whereas IBS and Otava are the most (about 10%).

So one answer to "why does HTS ever work for anything" is that compound collections seem to be biased toward natural-product type structures, which we can reasonably assume have generally evolved to have some sort of biological activity. It would be most interesting to see the results of such an analysis run from inside several drug companies against their own compound collections. My guess is that the natural product similarities would be even higher than the "purchasable GDB" set's, because drug company collections have been deliberately stocked with structural series that have shown activity in one project or another.

That's certainly looking at things from a different perspective, because you can also hear a lot of talk about how our compound files are too ugly - too flat, too hydrophobic, not natural-product-like enough. These viewpoints aren't contradictory, though - if Shoichet is right, then improving those similarities would indeed lead to higher hit rates. Compared to everything else, we're already at the top of the similarity list, but in absolute terms there's still a lot of room for improvement.

So how would one go about changing this, assuming that one buys into this set of assumptions? The authors have searched through the various databases for ring structures, taking those as a good proxy for structural scaffolds. As it turns out 83% of the ring scaffolds among the natural products are unrepresented among the commercially available molecules - a result that I assume that Asinex, ChemBridge, Life Chemicals, Otava, Bionet and their ilk are noting with great interest. In fact, the authors go even further in pointing out opportunities, with a table of rings from this group that closely resemble known drug-like ring systems.

But wait a minute. . .when you look at those scaffolds, a number of them turn out to be rather, well, homely. I'd be worried about elimination to form a Michael acceptor in compound 19, for example. I'm not crazy about the N,S acetal in 21 or the overall stability of the acetals in 15, 17 and 31. The propiolactone in 23 is surely reactive, as is the quinone in 25, and I'd be very surprised if that's not what they owe their biological activities to. And so on.
Shoichet%20scaffolds.jpg
All that said, there are still some structures in there that I'd be willing to check out, and there must be more of them in that 83%. No doubt a number of the rings that do sneak into the commercial list are not very well elaborated, either. I think that there is a real commercial opportunity here. A company could do quite well for itself by promoting its compound collection as being more natural-product similar than the competition, with tractable molecules, and a huge number of them unrepresented in any other catalog.

Now all you'd have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren't easy to make, or to work with. And as it so happens, there are quite a few good ones available these days. Anyone want to take this business model to heart?

Comments (12) + TrackBacks (0) | Category: Drug Assays | Drug Industry History | In Silico

July 7, 2009

What's So Special About Ribose?

Email This Entry

Posted by Derek

While we're on the topic of hydrogen bonds and computations, there's a paper coming out in JACS that attempts to answer an old question. Why, exactly, does every living thing on earth use so much ribose? It's the absolute, unchanging carbohydrate backbone to all the RNA on Earth, and like the other things in this category (why L amino acids instead of D?), it's attracted a lot of speculation. If you subscribe to the RNA-first hypothesis of the origins of life, then the question becomes even more pressing.

A few years ago, it was found that ribose, all by itself, diffuses through membranes faster than the other pentose sugars. This results holds up for several kinds of lipid bilayers, suggesting that it's not some property of the membrane itself that's at work. So what about the ability of the sugar molecules to escape from water and into the lipid layers?

Well, they don't differ much in logP, that's for sure, as the original authors point out. This latest paper finds, though, by using molecular dynamic simulations that there is something odd about ribose. In nonpolar environments, its hydroxy groups form a chain of hydrogen-bond-like interactions, particularly notable when it's in the beta-pyranose form. These aren't a factor in aqueous solution, and the other pentoses don't seem to pick up as much stabilization under hydrophobic conditions, either.

So ribose is happier inside the lipid layer than the other sugars, and thus pays less of a price for leaving the aqueous environment, and (both in simulation and experimentally) diffuses across membranes ten times as quickly as its closely related carboyhydate kin. (Try saying that five times fast!) This, as both the original Salk paper and this latest one note, leads to an interesting speculation on why ribose was preferred in the origins of life: it got there firstest with the mostest. (That's a popular misquote of Nathan Bedford Forrest's doctrine of warfare, and if he's ever come up before in a discussion of ribose solvation, I'd like to hear about it).

Comments (9) + TrackBacks (0) | Category: Biological News | In Silico | Life As We (Don't) Know It

Another Thing We Don't Know

Email This Entry

Posted by Derek

Hydrogen bonds are important. There, that should be an sweepingly obvious enough statement to get things started. But they really are - hydrogen bonding accounts for the weird properties of water, for one thing, and it's those weird properties that are keeping us alive. And leaving out the water (a mighty big step), internal hydrogen bonding is still absolutely essential to the structure of large biological molecules - proteins, complex carbohydrates, DNA and RNA, and so on.

But we don't understand hydrogen bonds all that well, dang it all. It's not like we're totally ignorant of them, for sure, but there are a lot of important things that we don't have a good handle on. One of these may just have been illustrated by this paper in Nature Structural and Molecular Biology by a group from Scripps. They've been working on understanding the fact that all hydrogen bonds are not created equal. By carefully going through a lot of protein mutants, they have evidence for the idea that H-bonds that form in polar environments are weaker than ones that form in nonpolar ones.

That makes sense, on the face of it. One way to think of it is that a hydrogen bond in a locally hydrophobic area is the only game in town, and counts for more. But this work claims that such bonds can be worth as much as 1.2 kcal/mole more than the wimpier ones, which is rather a lot. Those kinds of energy differences could add up very quickly when you're trying to understand why a protein folds up the way it does, or why one small molecule binds more tightly than another one.

Do we take such things into account when we're trying to compute these energies? Generally speaking, no, we do not - well, not yet. If these folks are right, though, we'd better start.

Update: note that the paper itself doesn't suggest that this is a new idea - they reference work going back to 1963 (!) on the topic. What they're trying to do is put more real numbers into the mix. And that's what my last paragraph above is trying to state (and perhaps overstate): it's difficult to account for these thing computationally, since they vary so widely, and since we don't have that good a computational handle on hydrogen bonds in general. The more real world data that can be fed back into the models, the better.

Comments (7) + TrackBacks (0) | Category: In Silico

July 2, 2009

Jargon Will Save Us All

Email This Entry

Posted by Derek

Moore's Law: number of semiconductors on a chip doubling every 18 months or so, etc. Everyone's heard of it. But can we agree that anyone who uses it as a metaphor or perscription for drug research doesn't know what they're talking about?

I first came across the comparison back during the genomics frenzy. One company that had bought into the craze in a big way press-released (after a rather interval) that they'd advanced their first compound to the clinic based on this wonderful genomics information. I remember rolling my eyes and thinking "Oh, yeah", but on a hunch I went to the Yahoo! stock message boards (often a teeming heap of crazy, then as now). And there I found people just levitating with delight at this news. "This is Moore's Law as applied to drug discovery!" shouted one enthusiast. "Do you people realize what this means?" What it meant, apparently, was not only that this announcement had come rather quickly. It also meant that this genomics stuff was going to discover twice as many drugs as this real soon. And real soon after that, twice as many more, and so on until the guy posting the comment was as rich as Warren Buffet, because he was a visionary who'd been smart enough to load himself into the catapult and help cut the rope. (For those who don't know how that story ended, the answer is Not Well: the stock that occasioned all this hyperventilation ended up dropping by a factor of nearly a hundred over the next couple of years. The press-released clinical candidate was never, ever, heard of again).

I bring this up because a reader in the industry forwarded me this column from Bio-IT World, entitled, yes, "Only Moore's Law Can Save Big Pharma". I've read it three times now, and I still have only the vaguest idea of what it's talking about. Let's see if any of you can do better.

The author starts off by talking about the pressures that the drug industry is under, and I have no problem with him there. That is, until he gets to the scientific pressures, which he sketches out thusly:

Scientifically, the classic drug discovery paradigm has reached the end of its long road. Penicillin, stumbled on by accident, was a bona fide magic bullet. The industry has since been organized to conduct programs of discovery, not design. The most that can be said for modern pharmaceutical research, with its hundreds of thousands of candidate molecules being shoveled through high-throughput screening, is that it is an organized accident. This approach is perhaps best characterized by the Chief Scientific Officer of a prominent biotech company who recently said, "Drug discovery is all about passion and faith. It has nothing to do with analytics."

The problem with faith-based drug discovery is that the low hanging fruit has already been plucked, driving would be discoverers further afield. Searching for the next miracle drug in some witch doctor's jungle brew is not science. It's desperation.

The only way to escape this downward spiral is new science. Fortunately, the fuzzy outlines of a revolution are just emerging. For lack of a better word, call it Digital Chemistry.

And when the man says "fuzzy outline", well, you'd better take him at his word. What, I know you're all asking, is this Digital Chemistry stuff? Here, wade into this:

Tomorrow's drug companies will build rationally engineered multi-component molecular machines, not small molecule drugs isolated from tree bark or bread mold. These molecular machines will be assembled from discrete interchangeable modules designed using hierarchical simulation tools that resemble the tool chains used to build complex integrated circuits from simple nanoscale components. Guess-and-check wet chemistry can't scale. Hit or miss discovery lacks cross-product synergy. Digital Chemistry will change that.

Honestly, if I start talking like this, I hope that onlookers will forgo taking notes and catch on quickly enough to call the ambulance. I know that I'm quoting too much, but I have to tell you more about how all this is going to work:

But modeling protein-protein interaction is computationally intractable, you say? True. But the kinetic behavior of the component molecules that will one day constitute the expanding design library for Digital Chemistry will be synthetically constrained. This will allow engineers to deliver ever more complex functional behavior as the drugs and the tools used to design them co-evolve. How will drugs of the future function? Intracellular microtherapeutic action will be triggered if and only if precisely targeted DNA or RNA pathologies are detected within individual sick cells. Normal cells will be unaffected. Corrective action shutting down only malfunctioning cells will have the potential of delivering 99% cure rates. Some therapies will be broad based and others will be personalized, programmed using DNA from the patient's own tumor that has been extracted, sequenced, and used to configure "target codes" that can be custom loaded into the detection module of these molecular machines.
.

Look, I know where this is coming from. And I freely admit that I hope that, eventually, a really detailed molecular-level knowledge of disease pathology, coupled with a really robust nanotechnology, will allow us to treat disease in ways that we can't even approach now. Speed the day! But the day is not sped by acting as if this is the short-term solution for the ills of the drug industry, or by talking as if we already have any idea at all about how to go about these things. We don't.

And what does that paragraph up there mean? "The kinetic behavior. . .will be synthetically constrained"? Honestly, I should be qualified to make sense of that, but I can't. And how do we go from protein-protein interactions at the beginning of all that to DNA and RNA pathologies at the end, anyway? If all the genomics business has taught us anything, it's that these are two very, very different worlds - both important, but separated by a rather wide zone of very lightly-filled-in knowledge.

Let's take this step by step; there's no other way. In the future, according to this piece, we will detect pathologies by detecting cell-by-cell variations in DNA and/or RNA. How will we do that? At present, you have to rip open cells and kill them to sequence their nucleic acids, and the sensitivities are not good enough to do it one cell at a time. So we're going to find some way to do that in a specific non-lethal way, either from the outside of the cells (by a technology that we cannot even yet envision) or by getting inside them (by a technology that we cannot even envision) and reading off their sequences in situ (by a technology that we cannot even envision). Moreover, we're going to do that not only with the permanent DNA, but with the various transiently expressed RNA species, which are localized to all sort of different cell compartments, present in minute amounts and often for short periods of time, and handled in ways that we're only beginning to grasp and for purposes that are not at all yet clear. Right.

Then. . .then we're going to take "corrective action". By this I presume that we're either going to selectively kill those cells or alter them through gene therapy. I should note that gene therapy, though incredibly promising as ever, is something that so far we have been unable, in most cases, to get to work. Never mind. We're going to do this cell by cell, selectively picking out just the ones we want out of the trillions of possibilities in the living organism, using technologies that, I cannot emphasize enough, we do not yet have. We do not yet know how to find most individual cells types in a complex living tissue; huge arguments ensue about whether certain rare types (such as stem cells) are present at all. We cannot find and pick out, for example, every precancerous cell in a given volume of tissue, not even by slicing pieces out of it, taking it out into the lab, and using all the modern techniques of instrumental analysis and molecular biology.

What will we use to do any of this inside the living organism? What will such things be made of? How will you dose them, whatever they are? Will they be taken up though the gut? Doesn't seem likely, given the size and complexity we're talking about. So, intravenous then, fine - how will they distribute through the body? Everything spreads out a bit differently, you know. How do you keep them from sticking to all kinds of proteins and surfaces that you're not interested in? How long will they last in vivo? How will you keep them from being cleared out by the liver, or from setting off a potentially deadly immune response? All of these could vary from patient to patient, just to make things more interesting. How will we get any of these things into cells, when we only roughly understand the dozens of different transport mechanisms involved? And how will we keep the cells from pumping them right back out? They do that, you know. And when it's time to kill the cells, how do you make absolutely sure that you're only killing the ones you want? And when it's time to do the gene therapy, what's the energy source for all the chemistry involved, as we cut out some sequences and splice in the others? Are we absolutely sure that we're only doing that in just the right places in just the right cells, or will we (disastrously) be sticking in copies into the DNA of a quarter of a per cent of all the others?

And what does all this nucleic acid focus have to do with protein expression and processing? You can't fix a lot of things at the DNA level. Misfolding, misglycosylation, defects in transport and removal - a lot of this stuff is post-genomic. Are we going to be able to sequence proteins in vivo, cell by cell, as well? Detect tertiary structure problems? How? And fix them, how?

Alright, you get the idea. The thing is, and this may be surprising considering those last few paragraphs, that I don't consider all of this to be intrinsically impossible. Many people who beat up on nanotechnology would disagree, but I think that some of these things are, at least in broad hazy theory, possibly doable. But they will require technologies that we are nowhere close to owning. Babbling, as the Bio-IT World piece does, about "detection modules" and "target codes" and "corrective action" is absolutely no help at all. Every one of those phrases unpacks into a gigantic tangle of incredibly complex details and total unknowns. I'm not ready to rule some of this stuff out. But I'm not ready to rule it in just by waving my hands.

Comments (46) + TrackBacks (0) | Category: Drug Industry History | General Scientific News | In Silico | Press Coverage

April 1, 2009

Mexican Lemons To the Rescue

Email This Entry

Posted by Derek

Thanks to a comment on this post, I’ve had a chance to read this interesting article from Stephen Johnson of Bristol-Myers Squibb, entitled “The Trouble with QSAR (Or How I Learned to Stop Worrying And Embrace Fallacy)”. (As a side note, it’s interesting to see that people still make references to the titling of Dr. Strangelove. I’ve never met Johnson, but I’d gather from that that he can’t be much younger than I am).

Lemongraph.jpg

The most arresting part of the article is the graph found in its abstract. No mention is made of it in the text, but none has to be. It’s a plot of the US highway fatality rate versus the tonnage of fresh lemons imported from Mexico, and I have to say, it’s a pretty darn straight line. I’ve seen a lot shakier plots used to justify some sweeping conclusions, and if those were justified, well, then I’m forced to conclude that Mexican lemons have improved highway safety a great deal. The vitamin C, maybe? The fragrance? Bioflavanoids?

None of the above, of course. Correlation, tiresomely, once again refuses to imply causation, even when you ask it nicely. And that’s the whole point of the article. QSAR, for those outside the business, stands for Quantitative Structure-Activity Relationship(s), an attempt to rationalize the behavior of a series of drug candidate compounds through computational means. The problem is, there are plenty of possible variables (size, surface area, molecular weight, polarity, solubility, charge, hydrogen bond donors and acceptors, and as many structural representation parameters as you can stand). As Johnson notes dryly:

” With such an infinite array of descriptions possible, each of which can be coupled with any of a myriad of statistical methods, the number of equivalent solutions is typically fairly substantial.”

That it is. And (as he rightly mentions) one of the other problems is that all these variables are discontinuous. Some region of the molecule can get larger, but only up to a point. When it’s too large to fit into the binding site any more, activity drops off steeply. Similarly, the difference between forming a crucial hydrogen bond and not forming one is a big difference, and it can be realized by a very small change in structure and properties. (Thus the “magic methyl” effect).

But that’s not the whole problem. Johnson takes many of his fellow computational chemists to task for what he sees as sloppy work. Too many models are advanced just because they’ve shown some (limited) correlations, and they’re not tested hard enough afterwards. Finding a model with a good “fitness score” becomes an end in itself:

”We can generate so many hypotheses, relating convoluted molecular factors to activity in such complicated ways, that the process of careful hypothesis testing so critical to scientific understanding has been circumvented in favor of blind validation tests with low resulting information content. QSAR disappoints so often, not only because the response surface is not smooth but because we have embraced the fallacy that correlation begets causation.”

Comments (30) + TrackBacks (0) | Category: In Silico

March 26, 2009

The Motions of a Protein

Email This Entry

Posted by Derek

So, people like me spend their time trying to make small molecules that will bind to some target protein. So what happens, anyway, when a small molecule binds to a target protein? Right, right, it interacts with some site on the thing, hydrogen bonds, hydrophobic interactions, all that – but what really happens?

That’s surprisingly hard to work out. The tools we have to look at such things are powerful, but they have limitations. X-ray crystal structures are great, but can lead you astray if you’re not careful. The biggest problem with them, though (in my opinion) is that you see this beautiful frozen picture of your drug candidate in the protein, and you start to think of the binding as. . .well, as this beautiful frozen picture. Which is the last thing it really is.

Proteins are dynamic, to a degree that many medicinal chemists have trouble keeping in mind. Looking at binding events in solution is more realistic than looking at them in the crystal, but it’s harder to do. There are various NMR methods (here's a recent review), some of which require specially labeled protein to work well, but they have to be interpreted in the context of NMR’s time scale limitations. “Normal” NMR experiments give you time-averaged spectra – if you want to see things happening quickly, or if you want to catch snapshots of the intermediate states along the way, you have a lot more work to do.

Here’s a recent paper that’s done some of that work. They’re looking at a well-known enzyme, dihydrofolate reductase (DHFR). It’s the target of methotrexate, a classic chemotherapy drug, and of the antibiotic trimethoprim. (As a side note, that points out the connections that sometimes exist between oncology and anti-infectives. DHFR produces tetrahydrofolate, which is necessary for a host of key biosynthetic pathways. Inhibiting it is espccially hard on cells that are spending a lot of their metabolic energy on dividing – such as tumor cells and invasive bacteria).

What they found was that both inhibitors do something similar, and it affects the whole conformational ensemble of the protein:

". . .residues lining the drugs retain their μs-ms switching, whereas distal loops stop switching altogether. Thus, as a whole, the inhibited protein is dynamically dysfunctional. Drug-bound DHFR appears to be on the brink of a global transition, but its restricted loops prevent the transition from occurring, leaving a “half-switching” enzyme. Changes in pico- to nanosecond (ps-ns) backbone amide and side-chain methyl dynamics indicate drug binding is “felt” throughout the protein.

There are implications, though, for apparently similar compounds having rather different effects out in the other loops:

. . .motion across a wide range of timescales can be regulated by the specific nature of ligands bound. Occupation of the active site by small ligands of different shapes and physical characteristics places differential stresses on the enzyme, resulting in differential thermal fluctuations that propagate through the structure. In this view, enzymes, through evolution, develop sensitivities to ligand properties from which mechanisms for organizing and building such fluctuations into useful work can arise. . .Because the affected loop structures are primarily not in contact with drug, it is reasonable to envision inhibitory small-molecule drugs that act by allosterically modulating dynamic motions."

There are plenty of references in the paper to other investigations of this kind, so if this is your sort of thing, you'll find plenty of material there. One thing to take home, though, is to remember that not only are proteins mobile beasts (with and without ligand bound to them), but that this mobility is quite different in each state. And keep in mind that the ligand-bound state can be quite odd compared to anything else the protein experiences otherwise. . .

Comments (3) + TrackBacks (0) | Category: Biological News | Cancer | Chemical News | In Silico

February 24, 2009

Structure-Activity: Lather, Rinse, and Repeat

Email This Entry

Posted by Derek

Medicinal chemists spend a lot of their time exploring and trying to make sense of structure-activity relationships (SARs). We vary our molecules in all kinds of ways, have the biologists run them through the assays, and then sit down to make sense of the results.

And then, like as not, we get up again after a few minutes, shaking our heads. Has anyone out there ever worked on a project where the entire SAR made sense? I’ve always considered it a triumph if even a reasonable majority of the compounds fit into an interpretable pattern. SAR development is a perfect example of things not quite working out the way that they do in textbooks.

The most common surprise when you get your results back, if that phrase “common surprise” makes any sense, is to find that you’ve pushed some trend a bit too far. Methyl was pretty good, ethyl was better, but anything larger drops dead. I don’t count that sort of thing – those are boundary conditions, for the most part, and one of the things you do in a med-chem program is establish the limits under which you can work. But there are still a number of cases where what you thought was a wall turns out to have a secret passage or two hidden in it. You can’t put any para-substituents on that ring, sure. . .unless you have a basic amine over on the other end of the molecule, and then you suddenly can.

I’d say that a lot of these get missed, because after a project’s been running a while, various SAR dogmas get propagated. There are features of the structure space that “everybody knows”, and that few people want to spend their time violating. But it’s worth devoting a small (but real) amount of effort to going back and checking some of these after the lead molecule has evolved a bit, since you can get surprised.

Some projects I’ve worked on have so many conditional clauses of this sort built into their SAR that you wonder whether there are any boundaries at all. This works, unless you have this, but if you have that over there it can be OK, although there is that other compound which didn’t. . .making sense of this stuff can just be impossible. The opposite situation, the fabled Perfectly Additive SAR, is something I’ve never encountered in person, although I’ve heard tales after the fact. That’s the closest we come to the textbooks, where you can mix and match groups and substituents any way you like, predicting as you go from the previous trends just how they’ll come out. I have to think that any time you can do this, that it has to be taking place in a fairly narrow structure space – surely we can always break any trend like this with a little imagination.

Another well-known bit of craziness is the Only Thing That Works There. You’ll have whole series of compounds that have to have a a methyl group at some position, or they’re all dead. Nothing smaller, nothing larger, nothing with a different electronic flavor: it’s methyl or death. (Or fluoro, or a thiazole, or what have you – I’ve probably seen this with methyl more than with other groups, but it can happen all over the place). A sharp SAR is certainly nothing to fear; it’s probably telling you that you really are making good close contacts with the protein target somewhere. But it can be unnerving, and sometimes there’s not a lot of room left on the ledge when you have more than one constraint like this.

Why does all this go on? Multiple binding modes, you have to think. Proteins are flexible beasts, and they've got lots of ways to react to ligands. And it's important never to forget that we can't predict their responses, at least not yet and not very well. And of course, in all this discussion, we've just been considering one target protein. When you think about the other things your molecule might be hitting in cells or in a whole animal, and that the SAR relationships for those off-target things are just as fluid and complicated as for your target, well. . .you can see why medicinal chemistry is not going away anytime soon. Or shouldn't, anyway.

Comments (40) + TrackBacks (0) | Category: Drug Assays | In Silico | Life in the Drug Labs

December 10, 2008

Floppiness Is Not Your Friend: Who Knew?

Email This Entry

Posted by Derek

There’s a trick that every medicinal chemist learns very early, and continues to apply every time its feasible: take two parts of your compound, and tie them together into a ring.

The reason that works so well may not be immediately obvious if you’re not a medicinal chemist, so let me expand on them a bit. The first thing to know is that this method tends to work either really well or not at all – it’s a “death or glory” move. And that gives you a clue as to what’s going on. The idea is that the rotatable bonds in your molecule are, under normal conditions, doing just that: rotating. Any molecule the size of a normal drug has all kinds of possible shapes and rotational isomers, and room temperature is an energetic enough environment to populate a lot of them.

But there’s only one of them that’s the best for fitting into your drug target, most likely. So what are the odds? As your molecule approaches its binding pocket, there’s a complicated energetic dance going on. Different parts of your drug candidate will start interacting with the target (usually a protein), and that starts to tie down all that floppy rotation. The question is, does the gain resulting from these interactions cancel out the energetic price that has to be paid for them? Is there a pathway that leads to a favorable tight-binding situation, or is your molecule going to approach, flop around a bit, and dance away?

Several things are at work during that shall-we-dance period. The different conformations of your compound vary in energy, depending on how much its parts are starting to bang into each other, and how much you’re asking the bonds to twist around. The closer that desired drug-binding shape is to the shape your molecule wants to be in anyway, the better off you are, from that perspective. So tying back the molecule and making a ring in the structure does one thing immediately: it cuts down on the range of conformations it can take, in the same way that tying a rope between your ankles cuts down on your ability to dance. You’ve handcuffed your molecule, which would probably be cruel if they were sentient, but then, a lot of organic chemistry would be pretty unspeakable if molecules had feelings.

That’s why this method tends to be either a big winner or a big loser. If the preferred binding mode of your compound is close to the shape it takes when you tie it down, then you’ve suddenly zeroed in on just the thing you want, and the binding affinity is going to take a big leap. But if it’s not, well, you’ve now probably made it impossible for the thing to adopt the conformation it needs, and the binding affinity is going to take a big leap over a cliff.

There’s another effect to reducing the flexibility of your compound, and that has to do with entropy. All that favorable-interaction business is one component of the energy involved, namely the enthalpy, but entropy is the other. Loosely speaking, the more disordered a system, the higher its entropy. A floppy molecule, when it binds to a drug target, has to settle down into a much tighter fit, and entropically, that’s unfavorable. Energetically, you’re paying to do that. But if your molecule is already much less flexible, there’s not much of a toll as it fits into the pocket. If loss-of-floppiness is a bad thing, then don’t start out with so much of it.

So, how much do I and my medicinal chemistry colleagues think about this stuff, day to day? A fair amount, but there are parts of it that we probably don’t pay enough attention to. Entropy gets less respect from us than it deserves, I think. It’s easy to imagine molecules bumping into each other, sticking and unsticking, but the more nebulous change-in-disorder part of the equation is just as important. And it doesn’t just apply to our drug molecules – proteins get less disordered as they bind those molecules (or more disordered, in some cases), and those entropic changes can mean a lot, too.

I also mentioned molecules finding a pathway to binding, and that’s something that we don’t think about as much, either. We probably make things all the time that would be potent binders, if they just could get past some energetic hump and wedge themselves into place. But there are no crowbars available; our drug candidates have to be able to work their way in on their own. The can’t-get-there-from-here cases come back from the assays as inactive. The tendency is to imagine these in the binding site already, and to try to think of what could be going wrong in there – but it may be that they’d be fine, but that their structures won’t allow them to come in for a landing.

Picturing this accurately is very hard indeed. We have enough trouble with good representations of static pictures of our molecules bound to their targets, so making a movie of the process is a whole different story. Each frame is on a femtosecond scale – molecules flip around rather quickly – and every frame would have to be computed accurately (drug structure, protein structure, and the energetics of the whole system) for the resulting video clip to make sense. It’s been done, but not all that often, and we’re not good at it.

Comments (13) + TrackBacks (0) | Category: In Silico | Pharma 101

September 25, 2008

Protein Folding: Complexity to Make More Complexity?

Email This Entry

Posted by Derek

Want a hard problem? Something to really keep you challenged? Try protein folding. That'll eat up all those spare computational cycles you have lounging around and come back to ask for more. And it'll do the same for your brain cells, too, for that matter.

The reason is that a protein of any reasonable size has a staggering number of shapes it can adopt. If you hold a ball-and-stick model of one, you realize pretty quickly that there are an awful lot of rotatable bonds in there (not least because they flop around while you're trying to hold the model in your hands). My daughter was playing around with a toy once that was made of snap-together parts that looked like elbow macaroni pieces, and I told her that this was just like a lot of molecules inside her body. We folded and twisted the thing around very quickly to a wide variety of shapes, even though it only had ten links or so, and I then pointed out to her that real proteins all had different things sticking off at right angles in the middle of each piece, making the whole situation even crazier.

There's a new (open access) paper in PNAS that illustrates some of the difficulties. The authors have been studying man-made proteins that have substantially similar sequences of amino acids, but still have different folding and overall shape. In this latest work, they've made it up to two proteins (56 amino acids each) that have 95% sequence identity, but still have very different folds. It's just a few key residues that make the difference and kick the overall protein into a different energetic and structural landscape. The other regions of the proteins can be mutated pretty substantially without affecting their overall folding, on the other hand. (In the picture, the red residues are the key ones and the blue areas are the identical/can-be-mutated domains).
PNAS%20proteins.jpg
This ties in with an overall theme of biology - it's nonlinear as can be. The systems in it are huge and hugely complicated, but the importance of the various parts varies enormously. There are small key chokepoints in many physiological systems that can't be messed with, just as there are some amino acids that can't be touched in a given protein. (Dramatic examples include the many single-amino-acid based genetic disorders).

But perhaps the way to look at it is that the complexity is actually an attempt to overcome this nonlinearity. Otherwise the system would be too brittle to work. All those overlapping, compensating, inter-regulating feedback loops that you find in biochemistry are, I think, a largely successful attempt to run a robust organism out of what are fundamentally not very robust components. Evolution is a tinkerer, most definitely, and there sure is an awful lot of tinkering that's been needed.

Comments (8) + TrackBacks (0) | Category: General Scientific News | In Silico

September 4, 2008

X-Ray Structures: Handle With Care

Email This Entry

Posted by Derek

X-ray crystallography is wonderful stuff – I think you’ll get chemists to generally agree on that. There’s no other technique that can provide such certainty about the structure of a compound – and for medicinal chemists, it has the invaluable ability to show you a snapshot of your drug candidate bound to its protein target. Of course, not all proteins can be crystallized, and not all of them can be crystallized with drug ligands in them. But an X-ray structure is usually considered the last word, when you can get one – and thanks to automation, computing power, and to brighter X-ray sources, we get more of them than ever.

But there are a surprising number of ways that X-ray data can mislead you. For an excellent treatment of these, complete with plenty of references to the recent literature, see an excellent paper coming out in Drug Discovery Today from researchers at Astra-Zeneca (Andy Davis and Stephen St.-Gallay) and Uppsala University (Gerard Kleywegt). These folks all know their computational and structural biology, and they’re willing to tell you how much they don’t know, either.

For starters, a small (but significant) number of protein structures derived from X-ray data are just plain wrong. Medicinal chemists should always look first at the resolution of an X-ray structure, since the tighter the data, the better the chance there is of things being as they seem. The authors make the important point that there’s some subjective judgment involved on the part of a crystallographer interpreting raw electron-density maps, and the poorer the resolution, the more judgment calls there are to be made:

Nevertheless, most chemists who undertake structure-based design treat a protein crystal structure reverently as if it was determined at very high resolution, regardless of the resolution at which the structure was actually determined (admittedly, crystallographers themselves are not immune to this practice either). Also, the fact that the crystallographer is bound to have made certain assumptions, to have had certain biases and perhaps even to have made mistakes is usually ignored. Assumptions, biases, ambiguities and mistakes may manifest themselves (even in high-resolution structures) at the level of individual atoms, of residues (e.g. sidechain conformations) and beyond.

Then there’s the problem of interpreting how your drug candidate interacts with the protein. The ability to get an X-ray structure doesn’t always correlate well with the binding potency of a given compound, so it’s not like you can necessarily count on a lot of clear signals about why the compound is binding. Hydrogen bonds may be perfectly obvious, or they can be rather hard to interpret. Binding through (or through displacement of) water molecules is extremely important, too, and that can be hard to get a handle on as well.

And not least, there’s the assumption that your structure is going to do you good once you’ve got it nailed down:

It is usually tacitly assumed that the conditions under which the complex was crystallised are relevant, that the observed protein conformation is relevant for interaction with the ligand (i.e. no flexibility in the active-site residues) and that the structure actually contributes insights that will lead to the design of better compounds. While these assumptions seem perfectly reasonable at first sight, they are not all necessarily true. . .

That’s a key point, because that’s the sort of error that can really lead you into trouble. After all, everything looks good, and you can start to think that you really understand the system, that is until none of your wonderful X-ray-based analogs work out they way you thought they would. The authors make the point that when your X-ray data and your structure-activity data seem to diverge, it’s often a sign that you don’t understand some key points about the thermodynamics of binding. (An X-ray is a static picture, and says nothing about what energetic tradeoffs were made along the way). Instead of an irritating disconnect or distraction, it should be looked at as a chance to find out what’s really going on. . .

Comments (15) + TrackBacks (0) | Category: Analytical Chemistry | Drug Assays | In Silico

May 23, 2008

Up Close and Personal

Email This Entry

Posted by Derek

Something that’s come up in the last few posts around here is the way that we chemists think about the insides of enzymes. It’s a tricky subject, because when you picture things on that scale, the intuition you have for objects starts to betray you.

Consider water. We humans have a pretty good practical understanding of how water behaves in the bulk phase; we have the experience. But what about five water molecules sitting in the pocket of an enzyme? That’s not exactly a glass from the tap. These guys are interacting with the protein as much (or more) than they’re interacting with each other, and our intuition about water molecules is based on how they act when it’s surrounded by plenty of their own.

And if five water molecules are hard to handle, how about one? There’s no hope of seeing any bulk properties now, because there’s no bulk. We’re more used to having trouble in the other direction, predicting group behavior from individuals: you can’t tell much about a thousand-piece jigsaw puzzle from one piece that you found under the couch, and you wouldn’t be able to say much about the behavior of an ant colony from observing one ant in a jar. And neither of those are worth very much, compared to their group. But with molecules, the single-ant-in-a-jar situation is very important (that’s a single water molecule sitting in the active site of an enzyme), and knowledge of ant social behavior or water’s actions in a glass doesn’t help much.

Larger molecules than water are our business, of course, and those are tricky, too. We can study the shape and flexibility of our drug candidates in solution (by NMR, to pick the easiest method), and in the solid phase, surrounded by packed arrays of themselves (X-ray crystal structures). But the way that they look inside an enzyme's active site doesn't have to be related to either of those, although you might as well start there.

As single-molecule (and single-atom) techniques have become more possible, we're starting to get an idea of how small clusters of them have to be before they stop acting like tiny pieces of what we're used to, and starts acting like something else. But these experiments are usually done in isolation, in the gas phase or on some inert surface. The inside of a protein is another thing entirely; molecules there are the opposite of isolated. And studying them in those small spaces is no small task.

Comments (4) + TrackBacks (0) | Category: In Silico

May 1, 2008

O Pioneers!

Email This Entry

Posted by Derek

Drug Discovery Today has the first part of an article on the history of the molecular modeling field, this one covering about 1960 to 1990. It’s a for-the-record document, since as time goes on it’ll be increasingly hard to unscramble all the early approaches and players. I think this is true for almost any technology; the early years are tangled indeed.

As you would imagine, the work from the 1960s and 1970s has an otherwordly feel to it, considering the hardware that was available. And that brings up another thing common to the early years of new technologies: when you look back on them from their later years, you wonder how these people could possibly have even tried to do these things.

I mean, you read about, say, Richard Cramer establishing the computer-aided drug design program at Smith, Kline and French in nineteen-flipping-seventy-one, and on one level you feel like congratulating his group for their farsightedness. But mainly you just feeling like saying “Oh, you poor people. I am so sorry.” Because from today's perspective, there is just no way that anyone could have done any meaningful molecular modeling for drug design in 1971. I mean, we have enough trouble doing it for a lot of projects in 2008.

Think about it: big ol’ IBM mainframe, with those tape drives that for many years were visual shorthand for Computer System but now look closer to steam engines and water wheels. Punch cards: riffling stacks of them, and whole mechanical devices with arrays of rods to make and troubleshoot stiff pieces of paper with holes in them. And the software – written in what, FORTRAN? If they were lucky. And written in a time when people were just starting to say, well, yes, I suppose that you could, in fact, represent attractive and repulsive molecular forces in terms that could be used by a computer program. . .hmm, let’s see about hydrogen bonds, then. . .

It gives a person the shudders. But that must be inevitable – you get the same feeling when you see an early TV set and wonder how anyone could have derived entertainment from a fuzzy four-inch-wide grey screen. Or see the earliest automobiles, which look to have been quite a bit more trouble than a horse. How do people persevere?

Well, for one thing, by knowing that they’re the first. Even if technology isn’t what you might dream of it being some day, you’re still the one out on the cutting edge, with what could be the best in the world as it is. They also do it by not being able to know just what the limits to their capabilities are, not having the benefit of decades of hindsight. The molecular modelers of the early 1970s did not, I’m sure, see themselves as tentatively exploring something that would probably be of no use for years to come. They must have thought that there was something good just waiting right there to be done with the technology they had (which was, as just mentioned, the best ever seen). They may well have been wrong about that, but who was to know until it was tried?

And all of this – the realizations that there’s something new in the world, that there are new things that can be done with it, and (later) that there’s more to it (both its possibilities and difficulties) than was first apparent – all of this comes on gradually. If it were to hit you all at once, you’d be paralyzed with indecision. But the gap in the trees turns into a trail, and then into a dirt path before you feel the gravel under your feet, speeding up before you realize that you’re driving down a huge highway that branches off to destinations you didn’t even know existed.

People are seeing their way through to some of those narrow footpaths right now, no doubt. With any luck, in another thirty years people will look back and pity them for what they didn’t and couldn’t know. But the people doing it today don’t feel worthy of pity at all – some of them probably feel as if they’re the luckiest people alive. . .

Comments (8) + TrackBacks (0) | Category: Drug Industry History | In Silico | Who Discovers and Why

March 27, 2008

Start Small, Start Right

Email This Entry

Posted by Derek

There’s an excellent paper in the most recent issue of Chemistry and Biology that illustrates some of what fragment-based drug discovery is all about. The authors (the van Aalten group at Dundee) are looking at a known inhibitor of the enzyme chitinase, a natural product called argifin. It’s an odd-looking thing – five amino acids bonded together into a ring, with one of them (an arginine) further functionalized with a urea into a sort of side-chain tail. It’s about a 27 nM inhibitor of the enzyme.

(For the non-chemists, that number is a binding affinity, a measure of what concentration of the compound is needed to shut down the enzyme. The lower, the better, other things being equal. Most drugs are down in the nanomolar range – below that are the ulta-potent picomolar and femtomolar ranges, where few compounds venture. And above that, once you get up to 1000 nanomolar, is micromolar, and then 1000 micromolar is one millimolar. By traditional med-chem standards, single-digit nanomolar = good, double-digit nanomolar = not bad, triple-digit nanomolar or low micromolar = starting point to make something better, high micromolar = ignore, and millimolar = can do better with stuff off the bottom of your shoe.

What the authors did was break this argifin beast up, piece by piece, measuring what that did to the chitinase affinity. And each time they were able to get an X-ray structure of the truncated versions, which turned out to be a key part of the story. Taking one amino acid out of the ring (and thus breaking it open) lowered the binding by about 200-fold – but you wouldn’t have guessed that from the X-ray structure. It looks to be fitting into the enzyme in almost exactly the same way as the parent.

And that brings up a good point about X-ray crystal structures. You can’t really tell how well something binds by looking at one. For one thing, it can be hard to see how favorable the various visible interactions might actually be. And for another, you don’t get any information at all about what the compound had to pay, energetically, to get there.

In the broken argifin case, a lot of the affinity loss can probably be put down to entropy: the molecule now has a lot more freedom of movement, which has to be overcome in order to bind in the right spot. The cyclic natural product, on the other hand, was already pretty much there. This fits in with the classic med-chem trick of tying back side chains and cyclizing structures. Often you’ll kill activity completely by doing that (because you narrowed down on the wrong shape for the final molecule), but when you hit, you hit big.

The structure was chopped down further. Losing another amino acid only hurt the activity a bit more, and losing still another one gave a dipeptide that was still only about three times less potent than the first cut-down compound. Slicing that down to a monopeptide, basically just a well-decorated arginine, sent the activity down another sixfold or so – but by now we’re up to about 80 micromolar, which most medicinal chemists would regard as the amount of activity you could get by testing the lint in your pocket.

But they went further, making just the little dimethylguanylurea that’s hanging off the far end. That thing is around 500 micromolar, a level of potency that would normally get you laughed at. But wait. . .they have the X-ray structures all along the way, and what becomes clear is that this guanylurea piece is binding to the same site on the protein, in the same manner, all the way down. So if you’re wondering if you can get an X-ray structure of some 500 micromolar dust bunny, the answer is that you sure can, if it has a defined binding site.

And the value of these various derivatives almost completely inverts if you look at them from a binding efficiency standpoint. (One common way to measure that is to take the minus log of the binding constant and divide by the molecular weight in kilodaltons). That’s a “bang for the buck” index, a test of how much affinity you’re getting for the weight of your molecule. As it turns out, argifin – 27 nanomolar though it be – isn’t that efficient a binder, because it weighs a hefty 676. The binding efficiency index comes out to just under 12, which is nothing to get revved up about. The truncated analogs, for the most part, aren’t much better, ranging from 9 to 15.

But that guanylurea piece is another story. It doesn’t bind very tightly, but it bats way above its scrawny size, with a BEI of nearly 28. That’s much more impressive. If the whole argifin molecule bound that efficiently, it would be down in the ten-to-the-minus nineteenth range, and I don’t even know the name of that order of magnitude. If you wanted to make a more reasonably sized molecule, and you should, a compound of MW 400 would be about ten femtomolar with a binding efficiency like that. There’s plenty of room to do better than argifin.

So the thing to do, clearly, is to start from the guanylurea and build out, checking the binding efficiency along the way to make sure that you’re getting the most out of your additions. And that is exactly the point of fragment-based drug discovery. You can do it this way, cutting down a larger molecule to find what parts of it are worth the most, or you can screen to find small fragments which, though not very potent in the absolute sense, bind very efficiently. Either way, you take that small, efficient piece as your anchor and work from there. And either way, some sort of structural read on your compounds (X-ray or NMR) is very useful. That’ll give you confidence that your important binding piece really is acting the same way as you go forward, and give you some clues about where to build out in the next round of analogs.

This particular story may be about as good an illustration as one could possibly find - here's hoping that there are more that can work out this way. Congratulations to van Aalten and his co-workers at Dundee and Bath for one of the best papers I've read in quite a while.

Comments (12) + TrackBacks (0) | Category: Analytical Chemistry | Drug Assays | In Silico

March 5, 2008

Smaller, Wetter, Harder to Work With

Email This Entry

Posted by Derek

There’s an interesting article coming out in J. Med. Chem. on antibiotic compounds, which highlights something that’s pretty clear if you spend some time looking at the drugs in that area. We make a big deal (or have made one over the last ten years) about drug-like properties – all that Rule-of-Five stuff and its progeny. Well, take a look at the historically best-selling antibiotic drugs: you’ve never seen such a collection of Rule of Five violators in your life.

That’s partly because a lot of structures in that area have come from natural products, but hey, natural products are drugs, too. Erythromycin, the aminoglycosides, azithromycin, tetracycline: what a crew! But they’ve helped an untold number of people over the years. It’s true that the fluoroquinolones are much more normal-looking, but those are balanced out by weirdo one-shots like fosfomycin. I mean, look at that thing – would you ever believe that that’s a marketed drug? (And with decent bioavailability, too?)

No, you have to be broad-minded if you’re going to beat up on bacteria, and I think some broad-mindedness would do us all good in other therapeutic areas, too. I don’t mean we should ignore what we’ve learned about drug-like properties: our problem is that we tend to make allowances and exceptions on the greasy high-molecular weight end of the scale, since that’s where too many of our compounds end up. It wouldn’t hurt to push things on the other end, because I think that you have a better chance of getting away with too much polarity than you have of getting away with too little.

One reason for that might be that there are a lot of transporter proteins in vivo that are used to dealing with such groups. It’s easy to forget, but a great number of proteins are decorated with carbohydrate residues, and they’re on there for a lot of reasons. And a lot of extremely important small molecules in biochemistry are polar as well – right off the top of my head, I don’t know what the logD or polar surface area of things like ATP or NAD are, but I’ll bet that they’re far off the usual run of drugs. Admittedly, those aren’t going to reach good blood levels if you dose them orally; we’re trying to do something that’s rather unnatural as far as the body’s concerned. But we could still usefully take advantage of some of the transport and handling systems for such molecules.

But that’s not always easy to do. We all talk about making our compounds more polar and more soluble, but we balk at some of the things that will do that for us. Sure, you can slap a couple of methoxyethoxys on your ugly flat molecule, or hang a morpholine off the end of a chain to drag things into the water layer. But slap five or six hydroxyls on your molecule, and you’ll be lucky not to have the security guards show up at your desk.

There are, to be sure, some good reasons why they might. Hydroxyls and such tend to introduce chiral centers, which can make your synthesis difficult and dramatically increase the amount of work needed to fill out the structural possibilities of your lead series. That’s why these things tend to be (or derive from) natural products. Some bacterium or fungus has done most of the heavy lifting already, both in terms of working out the most active isomers and in synthesizing them for you. Erythromycin’s a fine starting material when you can get it by fermentation, but no one would ever, ever consider it if it had to be made by pure total synthesis.

There’s another consideration, which gets you right at the bench level. For an organic chemist, working with charged, water-soluble compounds is no fun. A lot of our lab infrastructure is built for things that would rather dissolve in ethyl acetate than water. A constant run of things with low logD values would mean that we’d all have to learn some new skills (and that we’d all probably have to spend a lot of time on the lyophilizer). Ion-exchange resins, gel chromatography, desalting columns – you might as well be a biochemist if you’re going to work with that stuff. But in the end, perhaps we might be better off, at least part of the time, if we were.

Comments (13) + TrackBacks (0) | Category: Drug Industry History | In Silico | Infectious Diseases

April 22, 2007

Melting Keys and Squishy Locks

Email This Entry

Posted by Derek

Pretty much the only thing that an interested lay person has heard about ligand binding is the "lock and key" metaphor. I'm not saying that you could walk down the sidewalk getting nods of recognition with it, but if someone's heard anything about how enzymes or receptors work (well, anything correct), that's probably what they've heard.

And there's a lot to it. Many proteins are really, really good at picking out their ligands from crowds of similar compounds. (If they were perfect at it, on the other hand, we drug company types would be out of business). But the lock-and-key metaphor makes the listener believe that both the ligand and the protein are rigid objects, which they most definitely are not. There's no everyday analog to the way that two conformationally mobile objects fit to each other - well, OK, maybe there is, but it's not one that you can safely use for illustrative purposes. Ahem.

The other big breakdown of the lock and key is that it doesn't deal well with the numerous proteins that can recognize more than one ligand for their binding sites. Particularly impressive are the nuclear receptors and the CYP metabolizing enzymes. Both those classes bind a bewildering number of not-very-similar compounds, and they can do it impressively well. They manage the trick by having binding pockets that can drastically change their shapes and charge distributions, as parts of the proteins themselves slide, twist, and flip around. I can't come up with even a vulgar metaphor for that process.

I'm thinking of doing several posts on the limits of metaphor and simplification in science, and if I do, this will be the first. It's a constant struggle not to mistake the picture for the real thing, particularly if the simplification is a pretty useful one. But eventually, no matter how good, the metaphor will thin out on you, and you'll be in the position of a Greek bird pecking at some painted fruit and wondering why it's still hungry.

Comments (29) + TrackBacks (0) | Category: In Silico | Metaphors, Good and Bad

March 12, 2007

No Shortcuts

Email This Entry

Posted by Derek

I wanted to link tonight to the "Milkshake Manifesto" over at OrgPrep Daily. It's a set of rules for med-chem, and looking them over, I agree with them pretty much across the board. There's a general theme in them of getting as close to the real system as you can, which is a theme I've sounded many times.

That applies to things like "Rule of Five" approximations and docking scores - useful, perhaps if you're sorting through a huge pile of compounds that you have to prioritize, not so useful if you've already got animal data.

He also takes a shot at Caco-2 cells and other such approximations to figure out membrane and tissue penetration. I've never yet seen an in vitro assay for permeability that I would trust - it's just too complicated, and it may never yield to a reductionist approach.

I'm a big fan of reductionism, don't get me wrong, but it's not the tool for every job. Living systems are especially tricky to pare down, and you can simplify yourself right out of any useful data if you're not very careful. The closer to the real world, the better off you are. It isn't easy, and it isn't cheap, but nothing good ever came easy or cheap, did it?

Comments (6) + TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico

February 27, 2007

Wrong, But Still Convincing

Email This Entry

Posted by Derek

SciTheory has a post, complete with links to the relevant articles in Science, etc., on a recent batch of trouble in structural biology. Geoffrey Chang and his group at Scripps have been working on the structures of transporter proteins, which sit in the cell membrane and actively move nonpermeable molecules in and out. There are a heap of these things, since (as any medicinal chemist will tell you) a lot of reasonable-looking molecules just won't get into cells without help. It's even tougher at a physiological level, because (from a chemist's perspective) many of the things that need to be shuttled around aren't very reasonable-looking at all - they're too small and polar or too large and greasy.

Many of these transportersm especially in bacteria, fall into a large group known as the ABC transporters, which have an ATP binding site in them for fuel. (For the non-scientists in the audience, ATP is the molecule used for energy storage in everything living on Earth. Thinking of an ATP-binding site as a NiCad battery pack gets you remarkably close to the real situation). Chang solved the structure of one of these, the bacterial protein MsbA, by X-ray crystallography back in 2001, and it was quite an accomplishment. Getting good X-ray diffraction data on proteins which spend their lives stuck in the cell membrane is rather a black art.

How dark an art is now apparent - here's the original paper's abstract in PubMed, but if you look just above the abstract, you'll see a retraction notice, and it's not alone. Five papers on various structures have been withdrawn. As SciTheory says, anyone who doubted the original MsbA structure had some real food for thought last year when another bacterial transporter was solved at the ETH in Zurich. These two should have looked more similar than they did, to most ways of thinking, but they were quite divergent.

And now we know why. Chang's group was done in by some homebrew software which swapped two columns of data. In a structure this large and complicated, you can have such disruptive things happen and still be able to settle down on a final protein picture - it's just that it'll be completely wrong. And so it was. The same software seems to have undermined the other determinations, too.

This is important (as well as sad and painful) on several levels. For one thing, transporters are essential to understanding resistance to antibiotics and cancer therapies, and they're vital parts of a lot of poorly understood processes in normal cells. We're not going to be able to get a handle on the often-inscrutable distribution of drug candidates in living systems until we know more about these proteins, but now some of what we thought we knew has evaporated on us.

Another point that people shouldn't miss is the trouble with relying too much on computational methods. There's really no alternative to them in protein crystallography, of course, but there always has to be a final "Does that make sense?" test. The difficulty is that many perfectly valid protein structures show up with odd and surprising features. Alternately, it's unnerving that the data for these things can be so thoroughly hosed and still give you a valid-looking structure, but that just serves to underline how careful you have to be.

And we're talking about X-ray data, which (done properly) is considered to be pretty solid stuff. So what does this say about basing research programs on the higher levels of abstraction found in molecular modeling and docking progams?

Comments (21) + TrackBacks (0) | Category: In Silico

December 14, 2006

Love and Anger

Email This Entry

Posted by Derek

Glenn Reynolds gave the pharma industry a much-appreciated thank-you card over at Instapundit:

Only a moron would want to live in a society where people are ashamed to work for drug companies. And yet, I'm not surprised to see that resulting from the demagogy that abounds among politicians and "public interest" types who are not serving the public interest whatsoever.

I'm thinking of having that first sentence engraved on something expensive. Glenn's post prompted Dean Esmay to write a short post on the ethics of drug companies, though, and he's rather less positive. I suppose I shouldn't be surprised, given some of the things he's gone in for in the past. As usual, some of the problem is the difficulty that people have coming to terms with the fact that drug discovery is a for-profit industry.

One comment on his post came from Jerry Kindall, which is mostly favorable to the industry, but nonetheless contains this paragraph:

Drug discovery used to be a total crap-shoot but it's getting more and more targeted as the years go by thanks to ever more sophisticated computer modeling. They are now able to say "okay, this is the chemical receptor that we think we need to address, let's design a molecule that fits into it." This is essentially a nanotechnology, although not the type most people think of when they hear the term.

Ay, would that it were true. As my industry readers know, and as I've been ranting abouit here fairly often, drug discovery is just as much of a crap-shoot as it's ever been. And wouldn't it be great if "sophisticated computer modeling" helped that much? Instead, we get things like this. No, I think what's happening here is that we're being underestimated by our enemies and overestimated by our friends. . .

Comments (32) + TrackBacks (0) | Category: In Silico | Why Everyone Loves Us

September 11, 2006

Enzymes Do Whatever They Want To

Email This Entry

Posted by Derek

It's been a while since I wrote about the neuraminidase inhibitors (Tamiflu and Relenza, oseltamavir and zanamivir). As we start to head into fall, though, I'm sure that avian flu will invade the headlines again, if nothing else (and I hope it's nothing else).

There's an interesting report in Nature (subscriber link) on how these drugs work. Bird flu is a Type A influenza, but there are two broad groups inside that class, which are defined by what variety of neuraminidase enzyme they express. (There are actually nine enzyme variants known, but four of them fall into one group and five into the other).

The drugs were developed against group-2 enzymes, but they're also effective against group-1 influenzas. Since the X-ray crystal structures showed the the drugs bound in the same way to all the group-2 neuraminidases, and since the active sites of all the subtypes across the two groups are extremely similar, no one ever thought that their binding modes would be different. Well, until last month, anyway, when the X-ray crystallographic data came in.

And what it showed was that the active sites of the group-1 enzymes, sequence homology be damned, have a much different structure than the group-2s. As it turns out, though, they can adopt a similar shape when an inhibitor binds to them, which is why the marketed inhibitors still work on them, but they're fundamentally quite different.

I can't resist the urge to use this example to illustrate some of the real problems in our current state of the art for computation and modeling. The differences between these two enzymes are due to their different amino acid residues far away from the active site, which makes modeling them much, much more difficult (and makes the error bars much, much wider when you do). That's why no one realized how far off the group-1 and group-2 neuraminidases were until the X-ray structure was available: modeling couldn't tell you. Any modeling efforts that tried would probably have decided, incorrectly, that the two groups were nearly identical. Why shouldn't they be?

But if we'd had that X-ray data from the start, modeling would very likely have told you, incorrectly, that there was little chance that either Relenza or Tamiflu would work on the group-1 enzyme variants. Why should they? The "induced fit" binding modes, where the enzyme changes shape significantly as the ligand binds, are understandably very difficult to model. There are just too many possibilities, too many of which are within each other's computational error bars.

Now, it's true that this latest work isn't based on molecular modeling at all. (You have to wonder how close these guys got, though). But plenty of projects that are using it are just as much in the dark as a neuraminidase team would have been, and they may not even realize it. Most molecular modelers are well aware of these limitations, but not all of them - or all of the managers over them - are willing to accept them. And when you get out to investors or the general public, it's all too easy for modelers or managers to act as if things are perfectly under control, when in reality they're lurching around in the dark. Like the rest of us. . .

Comments (11) + TrackBacks (0) | Category: In Silico | Infectious Diseases

March 23, 2006

Crystals of Doubt

Email This Entry

Posted by Derek

Here's a limits-to-knowledge post for you. On Wednesday, when I was cranking out a batch of an intermediate we're using these days, I needed to separate two fairly closely related compounds (which I'll call A and B) from each other. One surefire way to have done that was chromatography, but I just didn't have time for that. While I was rota-vapping down the mixture, I noticed that some white crystals were starting to come out of the methylene chloride solution, so I took the flask off and checked a small sample of the solid. Sure enough, it was pretty pure A, so I filtered that off and continued.

Taking out all the solvent left me with more white stuff, which was mostly B, with some A still hanging in there. In the past, we'd purified B by crystallizing it from another solvent mixture (ethyl acetate/hexane, the first combination the lazy - or just plain experienced - organic chemist reaches for). So I tried that out, dissoving the solid in a small-to medium amount of hot ethyl acetate, then adding hexane while it was still warm. I cooled the solution down by dipping the flask in ice water until it had come down to about room temperature, and was swirling it around when suddenly it starting snowing white powder. Ta-daa! A check of this stuff showed that it was almost completely pure B. The solution, for its part, was now a majority of A with some B left around. I took what I had and ran with it - this was one of the bird-in-the-hand situations, because people were waiting on this stuff.

My point is that such things are almost completely empirical. I've never heard of anyone who could predict from first principles what solvent system to use to get something to crystallize. I'd be tremendously impressed if anyone could take the structures of my two compounds, feed them into a dissolvo-matic program and announce "Yep, methylene chloride for A, and ethyl acetate-hexane for B. That'll do the trick."

As far as I know, there's no such thing, and no one is even close. I'd be glad to hear if I'm wrong. But if we can't predict, even just in rank order, what solvents will dissolve (or crash out) a given molecule, just how good is our molecular modeling, anyway?

Comments (8) + TrackBacks (0) | Category: In Silico

November 1, 2005

Molecular Modeling Cage Match

Email This Entry

Posted by Derek

I mentioned an interesting paper that's coming out in the Journal of Medicinal Chemistry on molecular modeling. It's a long one from a large group of people scattered across GlaxoSmithKline's worldwide research facilities, entitled "A Critical Assessment of Docking Programs and Scoring Functions." And that's what it is, all right.

For the non-med-chem readers, those are two of the key techniques in computational molecular modeling. Docking refers to taking a modeled version of your small molecule and trying to fit it into a similarly modeled version of the binding site of your protein target. The program ties to take into account the size and shape of the molecule and the binding site, of course, as well as more subtle interactions between the various functional groups. Scoring functions are what the programs use to try to rate how well the docking procedure went for a given compound, and to compare it to others in a given data set.

The GSK team did a very thorough job, evaluating ten different docking programs. They started with seven varying types of protein targets, mostly different classes of enzymes, all of which are known drug targets. An expert computational chemist took each one and polished up the model of the binding site. At the same time, lists of between one and two hundred potential binding compounds were put together for each target, including several series of related compounds. Another modeling chemist took these structures and got them ready for docking. They made sure that a crystal structure of each structural class was known for each case (to check the accuracy of the modeling later on), and also made sure that the binding affinity of the compounds ranged over at least four orders of magnitude (from pretty darn good, in other words, to pretty darn awful). The goal was to make the whole exercise as real-world as possible. Then each of those binding site models and their associated lists of potential ligands were turned over to separate chemists with experience in the various docking programs, and they told them to have at it. As the paper puts it:

"To optimize the performance of each docking program, computational chemists with expertise in a particular program were identified from the worldwide GSK computational chemistry community. Each program expert was given complete freedom and sufficient time to maximize the performance of the docking program. . .No time deadlines were imposed so that even low-throughput docking programs could be evaluated. Indeed, no constraints whatsoever were placed on the level of agonizing over details of how each docking program was applied."

It's important to remember that the results of this paper come from experienced users who had a great deal of knowledge about the targets, and all the time they needed to mess with them. The aformentioned agonizing was devoted to three typical kinds of question that such software is designed to answer: The first was: what is the conformation (the 3-D physical "pose") of a small molecule once it's in a binding site? This is why they picked all these things with known crystal structures, since those provide a check with real data. Results of this test were OK, in some cases fairly good. Some of the target proteins seemed to have binding sites that were more suited for the capabilities of the programs, which could take the majority of the compounds in their list and fit them pretty close (within two angstroms) to the known crystal structures.

And every target had at least one program that could take at least a third or so of the test compounds and dock them fairly well. But the problem was, no one program could do that for more than 35% of the binding modes. The best performances were scattered among the different software packages, and there seems to be absolutely no way to know in advance whether a given program is going to perform well on a new target. The other problem, and it's a big one, was that the scoring functions couldn't reliably identify when the program had hit on one of the good answers. There wasn't much correlation between what the program thought was a well-docked conformation and its resemblance to the known crystal structure.

The second question they looked at was: given a list of molecules (some active, some inactive), how well can the software pick out some active ones? This process is often known as "virtual screening". Again, the results were fairly good, but with some significant problems. For all but one of the targets, at least one of the programs could find at least half of the top 10% of the active compounds. (I know, that sounds like a lot of defensive hedging compared to what some people think these programs can do, but that's the real world for you). The programs also did pretty well at pulling a variety of structures out, and not just making their total by grabbing only the members of one particular class.

But that fairly-decent performance is for the programs as a group. As before, though, the best performances were scattered through all the software packages, with no real standout. Most of the programs, at one point or another, had to grind through a significant amount of a compound lists to do the job, too, which is something you really don't want in real-world use. Another disturbing result was that some of the scoring functions seemed to be picking the right compounds for the wrong reasons – that is, based on incorrect binding modes.

Now we're ready for the third question, a hard one which (in my experience) is one of the ones that medicinal chemists most would like molecular modeling software to answer: given a list of compounds, can the program rank-order them according to their expected affinity for the target? Unfortunately, the answer is "absolutely not." No scoring function in any of the software packages could even come close. The compounds that the programs ranked as winners were just as likely to stink, and the ones that they put into the discard heap were just as likely to be fine.

My way of looking at the first two tests is to say that if you have just one molecular modeling package, it is guaranteed to mislead you a fair amount of the time. And you have no way of knowing when it's doing that. If you have more than one program to work with, though, then they are guaranteed to disagree with each other a fair amount of the time, and you have no way of knowing which one of them is right – if either. I'll let the authors have last word on the third test, and on the software in general:

". . .in the area of rank-ordering or affinity prediction, reliance on a scoring function alone will not provide broadly reliable or useful information. . .This study demonstrates unequivocally that significant improvements are needed before compound scoring by docking algorithms will routinely have a consistent and major impact on lead optimization. . .it is not completely obvious by what means these improvements will arise. . ."

Comments (5) + TrackBacks (0) | Category: In Silico

September 29, 2005

The Hazards of Molecular Modeling

Email This Entry

Posted by Derek

A comment to the last post really gave me the shivers:

"I like to think of modelling as the "silent killer". It is easy to rely on it for quick answers, and easy to forget that there is no substitute for an actual experiment. . .

I remember asking a fellow scientist if a particular molecule performed as hypothesized, the response was: " I don't know. It did not dock well into the enzyme, so I didn't make it."

I've made this point before, but it needs to be made again: molecular modeling is not reality. Most models are not that good, or only good around a limited group of rather similar compounds. If you as a medicinal chemist are crossing out easy-to-make compounds in unexplored chemical space just because the software doesn't like it, you are handcuffing yourself and tying your thumbs together. Stop it, stop it for your own good, or you may never discover anything unexpected or useful.

"The silent killer": I like that phrase a lot. I get the occasional testy e-mail from the computational types when I talk like this, but I'm sticking to my beliefs here. Molecular models based on numerous high-resolution X-ray structures are, I think, sort of useful, sometimes. Models based on only one X-ray structure are to be approached with great caution. And binding models that are just calculated up de novo should be treated as hazardous to your scientific health, unless you have a great deal of evidence to make you think otherwise.

OK, you silicon jockeys, go ahead and flood my in-box. I've earned it.

Comments (7) + TrackBacks (0) | Category: In Silico

September 28, 2005

Clamping Down, or Loosening Up?

Email This Entry

Posted by Derek

We medicinal chemists spend our days trying to make small molecules that bind to targets in living systems. Almost all of those targets are proteins of one sort or another, and most of them have binding pockets already built into them, which we're trying to hijack for our own purposes. Molecular modelers try to figure out how these things fit together, but there are still a lot of unknowns in what would seem so basic a process.

I'm willing to bet that most chemists and biologists have a mental picture of a small molecule ligand fitting into a binding site which involves the protein sort of folding down around things - gently biting down on the ligand, as it were. It seems intuitively obvious that a protein's motions would settle down once it complexes with its target molecule.

And like a lot of intuitively obvious things in drug research, that idea appears to be mistaken. There's a recent study in the Journal of Medicinal Chemistry from a group at Michigan that tackles this question in a rigorous manner. They looked through the X-ray crystal structure data banks for proteins that have had high-quality structures determined both with and without small molecules bound in them. After controlling for experimental conditions (the temperature that the X-ray structure was taken at, among other things) and for the way the data were processed, they still had a few dozen closely matched pairs.

What they found was that in most of these structures, at least some of the atoms in and near the binding site are more mobile when there's a ligand bound. At times, the effect was pretty dramatic, with the entire binding site becoming more flexible, weirdly enough. Examples where everything got less mobile were found, but that only happened in a minority of the cases. The proteins the authors studied were scattered across a wide range of structural and functional classes, and there's no reason to think that they hit on an anomalous data set.

So, we're going to have to adjust our mental pictures, and the molecular modelers will have to adjust their simulations. I'd like to know just how many of those in silico models of binding would have predicted this greater flexibility. I fear that the answer is "darn near none of them". We have a long way to go.

Comments (9) + TrackBacks (0) | Category: In Silico

September 6, 2005

Crossing Your Fingers, Authoritatively

Email This Entry

Posted by Derek

I recall a project earlier in my career where we'd all been beating on the same molecular series for quite a while. Many regions of the molecule had been explored, and my urge was often to leave the reservation. I put some time into extending the areas we knew about, but I wanted to go off and make something that didn't look like anything that we'd done before.

Which I did sometimes, and then I'd often get asked: "Why did you make that compound?" My answer was simply "Because no one had ever messed with that area before, and I wanted to see what would happen." Reactions to that approach varied. Some folks found that a perfectly reasonable answer, sufficient by itself. Others didn't care for it much. "You have to have a hypothesis in mind," they'd say. "Are you trying to improve the pharmacokinetics? Fix a metabolic problem? Pick up a binding interaction that you think is out there in the XYZ loop of the protein? You can't just. . .make stuff."

I respected the people in that first group a lot more than I did the ones in the second. I thought then, and think now, that you can just go make stuff. In fact, you not only can, but you should. You probably don't want to spend all your time doing that, but if you never do it at all, you're going to miss the best surprises.

I take issue with the idea that there has to be a specific hypothesis behind every compound. That supposes amounts of knowledge that we just don't have. Most of the time, we don't know why our PK is acting weird, and we're not sure about the metabolic fate of the compounds. And we sure don't know their binding mode well enough to sit at our desks and talk about what amino acids in the protein backbone we're reaching out for. (OK, if you've got half a dozen X-ray structures of your ligands bound in the active site of your target, you have a much better idea. But if your next compound breaks new structural ground, off you may well go into a different binding mode, and half your presuppositions will go, too.)

I like to think that I've come to realize just how ignorant I am in issues of drug discovery. (In case you have any doubt, I'm very ignorant indeed.) But I still hear people confidently sizing up new analog ideas on the blackboard, though: No, that one won't bind well in the Whoozat region. Doesn't have the right spacing. And that one should be able to reach out to that hydrophobic pocket we all know about. Let's make that one first. (These folks are talking without X-ray structures in hand, mind you.)

Well, if it makes you feel better, then go ahead, I suppose. But this kind of thing is one tiny step up from lucky rabbit feet, for which there is still a market.

Comments (4) + TrackBacks (0) | Category: In Silico | Who Discovers and Why

August 17, 2004

Kinases and Their Komplications

Email This Entry

Posted by Derek

I'm going to take off from another comment, this one from Ron, who asks (in reference to the post two days ago): "would it not be fair to say that cellular biochemistry gets even more complicated the more we learn about it?

It would indeed be fair. I think that as a scientific field matures it goes through several stages. Brute-force collection of facts and observations comes early on, as you'd figure. Then the theorizing starts, with better and better theories being honed by more targeted experiments. This phase can be mighty lengthy, depending on the depth of the field and the number of outstanding problems it contains. A zillion inconsistent semi-trivialities can take a long time to sort out (think of the mathematical proof of the Four-Color Theorem), as can a smaller number of profound headscratchers (like, say, a reconciliation of quantum mechanics with relativity as they deal with gravity.)

If the general principles discovered are powerful enough, things can get simpler to understand. Think of the host of problems that early 20th-century physics had, many of which resolved themselves as applications of quantum mechanics. Earlier, chemistry went through something similar earlier, on a smaller scale, with the adoption of the stereochemical principles of van't Hoff. Suddenly, what seemed to be several separate problems turned out to be facets of one explanation: that atoms had regular three-dimensional patterns of bonding to other atoms. (If that sounds too obvious for such emphasis, keep in mind that this notion was fiercely ridiculed at resisted at the time.)

Cell biology is up to its pith helmet in hypotheses, and is nowhere near out of the swamps of fact collection. As in all molecular biology, the sheer number of different systems is making for a real fiesta. Your average cell is a morass of interlocking positive and negative feedback loops, many of which only show up fleetingly, under certain conditions, and in very defined locations. Some general principles have been established, but the number of things that have to be dealt with is still increasing, and I'm not sure when it's going to level out.

For example, the other day a group at Sugen (now Pfizer) published a paper establishing just how many genes there are in mice that code for protein kinase enzymes. Through adding phosphoryl groups, these enzymes are extremely important actors in the activation, transport, and modulation of the activities of thousands upon thousands of other proteins, and it turns out that there are exactly 540 of them. (Doubtless there are some variations as they get turned into proteins, but that's how many genes there are.) And that's that.

Now, that earlier discovery of protein phosphorylation as a signaling mechanism was a huge advance, and it has been appropriately rewarded. And knowing just how many different kinase enzymes there are is a step forward, too. But figuring out all the proteins they interact with, and when, and where, and what happens when they do - well, that's first cousin to hard work.

Comments (0) + TrackBacks (0) | Category: Biological News | In Silico

August 15, 2004

FullCell 1.0?

Email This Entry

Posted by Derek

Reader Maynard Handley, in a comment to the most recent post below, asks:

". . .how far are we from doing at least a substantial fraction of this stuff in silico? I've read that some amazing computational models of full cells now exist, but even so, this author didn't expect that drugs could be usefully tested computationally until 2030 which seems awfully far out."

I don't know the article that he's referring to, but "awfully far out" pretty much sums up my reaction, too. I just don't think we have enough data to do any real whole-cell modeling yet. It's coming, and perhaps for a few very well-worked-out subsystems we could do it now, but I'm sceptical even of that.

A few days reading the current cell biology literature will illustrate the problem. All sorts of proteins are found, all the time, to be players in systems that no one suspected them of being involved it. Kinases are found to phosphorylate things that no one had seen them do before, lipases are found to accept substrates that no one had realized they could. A given signaling peptide is gradually found to have more uses than a Swiss army knife. We don't even really understand the basic mechanisms (like G-protein-coupled receptor signaling) enough to model them to any useful level.

The process of finding these things out doesn't seem like it's going to end soon, and there have to be many fundamental surprises waiting for us. Modeling the system in their absence is going to be risky - interesting, no doubt, and potentially lucrative (if you find a useful approximation), but risky. It's going to take some pretty convincing stuff for the drug industry to ever depend on it.

And all of this applies to single cells, which come in, naturally, an uncounted variety, each with its own peculiarities, the great majority of which we don't have any clue about. And then you come to the interactions between cells, which are highly significant and (in many ways) a closed book to us at present. If we knew more about these things, we'd be able, for example, to culture human cell lines that acted just like their primary tissue progenitors - but we can't do it, not yet.

No, although I have every belief that these things are susceptible to modeling, I just don't think we'll see it (on a useful scale) any time soon. Over the next twenty years, I'd expect to have some of the easier-to-handle cellular subsystems worked out to give robust in silico treatments, but a whole cell? And all the types of whole cells? Much longer than that. More than that I can't guess.

Comments (3) + TrackBacks (0) | Category: In Silico

April 15, 2004

The March of Folly

Email This Entry

Posted by Derek

Thinking about molecular modeling, as I did in the last post, brings up another topic: when you go back to the late 1980s, in the real manic phase of the technological hype, what brings you up short is realizing that these folks were planning on doing all this with 1980s hardware.

That puts things in perspective. Here we are in 2004, and we still can't just sit down and design a drug from first principles. Don't believe anyone who tells you that we can, either - if that were possible, there would be a lot more drugs out there. I'm not saying that molecular modeling never makes a contribution (I know better, and from personal experience.) It's just that it hasn't (yet) caught up to the hallucinations of fifteen or twenty years ago, which is entirely the fault of the people who were doing the hallucinating.

You can make the same comments about other waves of hype that have broken over the pharmaceutical world (combinatorial chemistry comes immediately to mind.) What I'm wondering is: what's the hype of today? There's bound to be a hot new idea that's going to solve our problems, but will end up changed beyond recognition after twenty years of the real world. Any votes on what's going to look faintly ridiculous to us in 2024? As you'd guess, I have some candidates of my own. . .

Comments (2) | Category: Drug Industry History | In Silico

April 14, 2004

Reality's Revenge

Email This Entry

Posted by Derek

Molecular modeling is a technology with a past. Specifically, it's a past of overoptimistic predictions (often made, to be fair, by people who didn't understand what they were talking about.) Back in the late 1980s, when I started in the drug industry, modeling was going to take over the world and pretty darn soon, too. Several companies were founded to take advantage of this brave new world that had such software in it, and they raised serious money with tales of how they were just going to zzzzzip right to the drug structures. No dead ends, no detours, no cast of thousands - just a few chemists standing by to make the structure as it printed out for them. This has not quite worked out.

For those not in the business, modeling is the attempt to figure out molecular shapes, properties, and interactions by computation. There are many levels, some more successful than others. The ones I'm speaking of involve predicting three-dimensional shapes of molecules (and their target binding sites), and deciding which ones are more likely to fit well. It sounds like just what we need. It also sounds reasonably doable, in the same way that Hercules was probably told at first that he was going to just have to round up a few stray animals.

Predicting the shapes involves modeling the individual chemical bonds, and the interactions as the atoms and functional groups rotate around them, banging into each other or sticking through various forces. Originally, these things were calculated as if they were in interstellar space, with nothing around them. Later (and ever since) a number of methods to add some real-world solvent effects have been tried.

Another set of programs evaluates intermolecular fits, trying to work out the energies in play when a drug molecule slides into its binding site. Many tricky refinements have been added to those packages over the years, too, taking advantage of the latest insights into how various groups stack, pack, and interact.

And often enough, it just isn't enough. Many times the structures we have for our binding sites aren't accurate - the best ones are from X-ray crystallography, and plenty of good stuff just doesn't crystallize. (There are other cases where the crystal structure doesn't bear much relation to what's going on inside the real system, too, just to keep everyone on their toes.) Modeling goes haywire for all kinds of reasons.

One of the companies that emerged back in the change-the-world era of modeling was Vertex, up in Cambridge. It was founded by Joshua Boger, a Merck chemist who wanted a piece of the new thing and wasn't sure that Merck was taking it seriously enough. Well, coming soon in the Journal of Medicinal Chemistry (it's in the web preprint section now) is a paper from Vertex which gives us all some idea of why things didn't work out quite as planned.

The Vertex guys went back over about 150 cases, and found that in the majority of them, the structure of the small molecule in its binding pocket wasn't the structure you would have predicted as the best (read: lowest-energy.) In many of them, it isn't even close. You'd literally never have picked some of these conformations to start a modeling effort - they look very disfavored, and if you're going to pick things that far from the ground state then there's no end to it. The number of structures gets worse very rapidly as you move away from the local energy minima.

We in the business had suspected as much, and everyone knew of an example or two, but this is a quantitative look at just how bad the situation is. When you add in the cases where the binding site changes its conformation unexpectedly in response to the ligand, it's a wonder that any modeling efforts work at all. (Frankly, in my experience, they mostly don't, but I'm willing to stipulate that my experience has been more negative than the average.)

I like to say that molecular modeling is a magic wand, one that we keep waving in the hope that sparks will eventually start to shoot out of it. Someday they will. But there's a lot more hard work ahead, and no shortcuts in sight.

Comments (0) | Category: Drug Industry History | In Silico