About this Author
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
|

Category Archives
May 23, 2008
Posted by Derek
Something that’s come up in the last few posts around here is the way that we chemists think about the insides of enzymes. It’s a tricky subject, because when you picture things on that scale, the intuition you have for objects starts to betray you.
Consider water. We humans have a pretty good practical understanding of how water behaves in the bulk phase; we have the experience. But what about five water molecules sitting in the pocket of an enzyme? That’s not exactly a glass from the tap. These guys are interacting with the protein as much (or more) than they’re interacting with each other, and our intuition about water molecules is based on how they act when it’s surrounded by plenty of their own.
And if five water molecules are hard to handle, how about one? There’s no hope of seeing any bulk properties now, because there’s no bulk. We’re more used to having trouble in the other direction, predicting group behavior from individuals: you can’t tell much about a thousand-piece jigsaw puzzle from one piece that you found under the couch, and you wouldn’t be able to say much about the behavior of an ant colony from observing one ant in a jar. And neither of those are worth very much, compared to their group. But with molecules, the single-ant-in-a-jar situation is very important (that’s a single water molecule sitting in the active site of an enzyme), and knowledge of ant social behavior or water’s actions in a glass doesn’t help much.
Larger molecules than water are our business, of course, and those are tricky, too. We can study the shape and flexibility of our drug candidates in solution (by NMR, to pick the easiest method), and in the solid phase, surrounded by packed arrays of themselves (X-ray crystal structures). But the way that they look inside an enzyme's active site doesn't have to be related to either of those, although you might as well start there.
As single-molecule (and single-atom) techniques have become more possible, we're starting to get an idea of how small clusters of them have to be before they stop acting like tiny pieces of what we're used to, and starts acting like something else. But these experiments are usually done in isolation, in the gas phase or on some inert surface. The inside of a protein is another thing entirely; molecules there are the opposite of isolated. And studying them in those small spaces is no small task.
Comments (4)
+ TrackBacks (0) | Category: In Silico
May 1, 2008
Posted by Derek
Drug Discovery Today has the first part of an article on the history of the molecular modeling field, this one covering about 1960 to 1990. It’s a for-the-record document, since as time goes on it’ll be increasingly hard to unscramble all the early approaches and players. I think this is true for almost any technology; the early years are tangled indeed.
As you would imagine, the work from the 1960s and 1970s has an otherwordly feel to it, considering the hardware that was available. And that brings up another thing common to the early years of new technologies: when you look back on them from their later years, you wonder how these people could possibly have even tried to do these things.
I mean, you read about, say, Richard Cramer establishing the computer-aided drug design program at Smith, Kline and French in nineteen-flipping-seventy-one, and on one level you feel like congratulating his group for their farsightedness. But mainly you just feeling like saying “Oh, you poor people. I am so sorry.” Because from today's perspective, there is just no way that anyone could have done any meaningful molecular modeling for drug design in 1971. I mean, we have enough trouble doing it for a lot of projects in 2008.
Think about it: big ol’ IBM mainframe, with those tape drives that for many years were visual shorthand for Computer System but now look closer to steam engines and water wheels. Punch cards: riffling stacks of them, and whole mechanical devices with arrays of rods to make and troubleshoot stiff pieces of paper with holes in them. And the software – written in what, FORTRAN? If they were lucky. And written in a time when people were just starting to say, well, yes, I suppose that you could, in fact, represent attractive and repulsive molecular forces in terms that could be used by a computer program. . .hmm, let’s see about hydrogen bonds, then. . .
It gives a person the shudders. But that must be inevitable – you get the same feeling when you see an early TV set and wonder how anyone could have derived entertainment from a fuzzy four-inch-wide grey screen. Or see the earliest automobiles, which look to have been quite a bit more trouble than a horse. How do people persevere?
Well, for one thing, by knowing that they’re the first. Even if technology isn’t what you might dream of it being some day, you’re still the one out on the cutting edge, with what could be the best in the world as it is. They also do it by not being able to know just what the limits to their capabilities are, not having the benefit of decades of hindsight. The molecular modelers of the early 1970s did not, I’m sure, see themselves as tentatively exploring something that would probably be of no use for years to come. They must have thought that there was something good just waiting right there to be done with the technology they had (which was, as just mentioned, the best ever seen). They may well have been wrong about that, but who was to know until it was tried?
And all of this – the realizations that there’s something new in the world, that there are new things that can be done with it, and (later) that there’s more to it (both its possibilities and difficulties) than was first apparent – all of this comes on gradually. If it were to hit you all at once, you’d be paralyzed with indecision. But the gap in the trees turns into a trail, and then into a dirt path before you feel the gravel under your feet, speeding up before you realize that you’re driving down a huge highway that branches off to destinations you didn’t even know existed.
People are seeing their way through to some of those narrow footpaths right now, no doubt. With any luck, in another thirty years people will look back and pity them for what they didn’t and couldn’t know. But the people doing it today don’t feel worthy of pity at all – some of them probably feel as if they’re the luckiest people alive. . .
Comments (8)
+ TrackBacks (0) | Category: Drug Industry History | In Silico | Who Discovers and Why
March 27, 2008
Posted by Derek
There’s an excellent paper in the most recent issue of Chemistry and Biology that illustrates some of what fragment-based drug discovery is all about. The authors (the van Aalten group at Dundee) are looking at a known inhibitor of the enzyme chitinase, a natural product called argifin. It’s an odd-looking thing – five amino acids bonded together into a ring, with one of them (an arginine) further functionalized with a urea into a sort of side-chain tail. It’s about a 27 nM inhibitor of the enzyme.
(For the non-chemists, that number is a binding affinity, a measure of what concentration of the compound is needed to shut down the enzyme. The lower, the better, other things being equal. Most drugs are down in the nanomolar range – below that are the ulta-potent picomolar and femtomolar ranges, where few compounds venture. And above that, once you get up to 1000 nanomolar, is micromolar, and then 1000 micromolar is one millimolar. By traditional med-chem standards, single-digit nanomolar = good, double-digit nanomolar = not bad, triple-digit nanomolar or low micromolar = starting point to make something better, high micromolar = ignore, and millimolar = can do better with stuff off the bottom of your shoe.
What the authors did was break this argifin beast up, piece by piece, measuring what that did to the chitinase affinity. And each time they were able to get an X-ray structure of the truncated versions, which turned out to be a key part of the story. Taking one amino acid out of the ring (and thus breaking it open) lowered the binding by about 200-fold – but you wouldn’t have guessed that from the X-ray structure. It looks to be fitting into the enzyme in almost exactly the same way as the parent.
And that brings up a good point about X-ray crystal structures. You can’t really tell how well something binds by looking at one. For one thing, it can be hard to see how favorable the various visible interactions might actually be. And for another, you don’t get any information at all about what the compound had to pay, energetically, to get there.
In the broken argifin case, a lot of the affinity loss can probably be put down to entropy: the molecule now has a lot more freedom of movement, which has to be overcome in order to bind in the right spot. The cyclic natural product, on the other hand, was already pretty much there. This fits in with the classic med-chem trick of tying back side chains and cyclizing structures. Often you’ll kill activity completely by doing that (because you narrowed down on the wrong shape for the final molecule), but when you hit, you hit big.
The structure was chopped down further. Losing another amino acid only hurt the activity a bit more, and losing still another one gave a dipeptide that was still only about three times less potent than the first cut-down compound. Slicing that down to a monopeptide, basically just a well-decorated arginine, sent the activity down another sixfold or so – but by now we’re up to about 80 micromolar, which most medicinal chemists would regard as the amount of activity you could get by testing the lint in your pocket.
But they went further, making just the little dimethylguanylurea that’s hanging off the far end. That thing is around 500 micromolar, a level of potency that would normally get you laughed at. But wait. . .they have the X-ray structures all along the way, and what becomes clear is that this guanylurea piece is binding to the same site on the protein, in the same manner, all the way down. So if you’re wondering if you can get an X-ray structure of some 500 micromolar dust bunny, the answer is that you sure can, if it has a defined binding site.
And the value of these various derivatives almost completely inverts if you look at them from a binding efficiency standpoint. (One common way to measure that is to take the minus log of the binding constant and divide by the molecular weight in kilodaltons). That’s a “bang for the buck” index, a test of how much affinity you’re getting for the weight of your molecule. As it turns out, argifin – 27 nanomolar though it be – isn’t that efficient a binder, because it weighs a hefty 676. The binding efficiency index comes out to just under 12, which is nothing to get revved up about. The truncated analogs, for the most part, aren’t much better, ranging from 9 to 15.
But that guanylurea piece is another story. It doesn’t bind very tightly, but it bats way above its scrawny size, with a BEI of nearly 28. That’s much more impressive. If the whole argifin molecule bound that efficiently, it would be down in the ten-to-the-minus nineteenth range, and I don’t even know the name of that order of magnitude. If you wanted to make a more reasonably sized molecule, and you should, a compound of MW 400 would be about ten femtomolar with a binding efficiency like that. There’s plenty of room to do better than argifin.
So the thing to do, clearly, is to start from the guanylurea and build out, checking the binding efficiency along the way to make sure that you’re getting the most out of your additions. And that is exactly the point of fragment-based drug discovery. You can do it this way, cutting down a larger molecule to find what parts of it are worth the most, or you can screen to find small fragments which, though not very potent in the absolute sense, bind very efficiently. Either way, you take that small, efficient piece as your anchor and work from there. And either way, some sort of structural read on your compounds (X-ray or NMR) is very useful. That’ll give you confidence that your important binding piece really is acting the same way as you go forward, and give you some clues about where to build out in the next round of analogs.
This particular story may be about as good an illustration as one could possibly find - here's hoping that there are more that can work out this way. Congratulations to van Aalten and his co-workers at Dundee and Bath for one of the best papers I've read in quite a while.
Comments (12)
+ TrackBacks (0) | Category: Analytical Chemistry | Drug Assays | In Silico
March 5, 2008
Posted by Derek
There’s an interesting article coming out in J. Med. Chem. on antibiotic compounds, which highlights something that’s pretty clear if you spend some time looking at the drugs in that area. We make a big deal (or have made one over the last ten years) about drug-like properties – all that Rule-of-Five stuff and its progeny. Well, take a look at the historically best-selling antibiotic drugs: you’ve never seen such a collection of Rule of Five violators in your life.
That’s partly because a lot of structures in that area have come from natural products, but hey, natural products are drugs, too. Erythromycin, the aminoglycosides, azithromycin, tetracycline: what a crew! But they’ve helped an untold number of people over the years. It’s true that the fluoroquinolones are much more normal-looking, but those are balanced out by weirdo one-shots like fosfomycin. I mean, look at that thing – would you ever believe that that’s a marketed drug? (And with decent bioavailability, too?)
No, you have to be broad-minded if you’re going to beat up on bacteria, and I think some broad-mindedness would do us all good in other therapeutic areas, too. I don’t mean we should ignore what we’ve learned about drug-like properties: our problem is that we tend to make allowances and exceptions on the greasy high-molecular weight end of the scale, since that’s where too many of our compounds end up. It wouldn’t hurt to push things on the other end, because I think that you have a better chance of getting away with too much polarity than you have of getting away with too little.
One reason for that might be that there are a lot of transporter proteins in vivo that are used to dealing with such groups. It’s easy to forget, but a great number of proteins are decorated with carbohydrate residues, and they’re on there for a lot of reasons. And a lot of extremely important small molecules in biochemistry are polar as well – right off the top of my head, I don’t know what the logD or polar surface area of things like ATP or NAD are, but I’ll bet that they’re far off the usual run of drugs. Admittedly, those aren’t going to reach good blood levels if you dose them orally; we’re trying to do something that’s rather unnatural as far as the body’s concerned. But we could still usefully take advantage of some of the transport and handling systems for such molecules.
But that’s not always easy to do. We all talk about making our compounds more polar and more soluble, but we balk at some of the things that will do that for us. Sure, you can slap a couple of methoxyethoxys on your ugly flat molecule, or hang a morpholine off the end of a chain to drag things into the water layer. But slap five or six hydroxyls on your molecule, and you’ll be lucky not to have the security guards show up at your desk.
There are, to be sure, some good reasons why they might. Hydroxyls and such tend to introduce chiral centers, which can make your synthesis difficult and dramatically increase the amount of work needed to fill out the structural possibilities of your lead series. That’s why these things tend to be (or derive from) natural products. Some bacterium or fungus has done most of the heavy lifting already, both in terms of working out the most active isomers and in synthesizing them for you. Erythromycin’s a fine starting material when you can get it by fermentation, but no one would ever, ever consider it if it had to be made by pure total synthesis.
There’s another consideration, which gets you right at the bench level. For an organic chemist, working with charged, water-soluble compounds is no fun. A lot of our lab infrastructure is built for things that would rather dissolve in ethyl acetate than water. A constant run of things with low logD values would mean that we’d all have to learn some new skills (and that we’d all probably have to spend a lot of time on the lyophilizer). Ion-exchange resins, gel chromatography, desalting columns – you might as well be a biochemist if you’re going to work with that stuff. But in the end, perhaps we might be better off, at least part of the time, if we were.
Comments (12)
+ TrackBacks (0) | Category: Drug Industry History | In Silico | Infectious Diseases
April 22, 2007
Posted by Derek
Pretty much the only thing that an interested lay person has heard about ligand binding is the "lock and key" metaphor. I'm not saying that you could walk down the sidewalk getting nods of recognition with it, but if someone's heard anything about how enzymes or receptors work (well, anything correct), that's probably what they've heard.
And there's a lot to it. Many proteins are really, really good at picking out their ligands from crowds of similar compounds. (If they were perfect at it, on the other hand, we drug company types would be out of business). But the lock-and-key metaphor makes the listener believe that both the ligand and the protein are rigid objects, which they most definitely are not. There's no everyday analog to the way that two conformationally mobile objects fit to each other - well, OK, maybe there is, but it's not one that you can safely use for illustrative purposes. Ahem.
The other big breakdown of the lock and key is that it doesn't deal well with the numerous proteins that can recognize more than one ligand for their binding sites. Particularly impressive are the nuclear receptors and the CYP metabolizing enzymes. Both those classes bind a bewildering number of not-very-similar compounds, and they can do it impressively well. They manage the trick by having binding pockets that can drastically change their shapes and charge distributions, as parts of the proteins themselves slide, twist, and flip around. I can't come up with even a vulgar metaphor for that process.
I'm thinking of doing several posts on the limits of metaphor and simplification in science, and if I do, this will be the first. It's a constant struggle not to mistake the picture for the real thing, particularly if the simplification is a pretty useful one. But eventually, no matter how good, the metaphor will thin out on you, and you'll be in the position of a Greek bird pecking at some painted fruit and wondering why it's still hungry.
Comments (29)
+ TrackBacks (0) | Category: In Silico | Metaphors, Good and Bad
March 12, 2007
Posted by Derek
I wanted to link tonight to the "Milkshake Manifesto" over at OrgPrep Daily. It's a set of rules for med-chem, and looking them over, I agree with them pretty much across the board. There's a general theme in them of getting as close to the real system as you can, which is a theme I've sounded many times.
That applies to things like "Rule of Five" approximations and docking scores - useful, perhaps if you're sorting through a huge pile of compounds that you have to prioritize, not so useful if you've already got animal data.
He also takes a shot at Caco-2 cells and other such approximations to figure out membrane and tissue penetration. I've never yet seen an in vitro assay for permeability that I would trust - it's just too complicated, and it may never yield to a reductionist approach.
I'm a big fan of reductionism, don't get me wrong, but it's not the tool for every job. Living systems are especially tricky to pare down, and you can simplify yourself right out of any useful data if you're not very careful. The closer to the real world, the better off you are. It isn't easy, and it isn't cheap, but nothing good ever came easy or cheap, did it?
Comments (6)
+ TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico
February 27, 2007
Posted by Derek
SciTheory has a post, complete with links to the relevant articles in Science, etc., on a recent batch of trouble in structural biology. Geoffrey Chang and his group at Scripps have been working on the structures of transporter proteins, which sit in the cell membrane and actively move nonpermeable molecules in and out. There are a heap of these things, since (as any medicinal chemist will tell you) a lot of reasonable-looking molecules just won't get into cells without help. It's even tougher at a physiological level, because (from a chemist's perspective) many of the things that need to be shuttled around aren't very reasonable-looking at all - they're too small and polar or too large and greasy.
Many of these transportersm especially in bacteria, fall into a large group known as the ABC transporters, which have an ATP binding site in them for fuel. (For the non-scientists in the audience, ATP is the molecule used for energy storage in everything living on Earth. Thinking of an ATP-binding site as a NiCad battery pack gets you remarkably close to the real situation). Chang solved the structure of one of these, the bacterial protein MsbA, by X-ray crystallography back in 2001, and it was quite an accomplishment. Getting good X-ray diffraction data on proteins which spend their lives stuck in the cell membrane is rather a black art.
How dark an art is now apparent - here's the original paper's abstract in PubMed, but if you look just above the abstract, you'll see a retraction notice, and it's not alone. Five papers on various structures have been withdrawn. As SciTheory says, anyone who doubted the original MsbA structure had some real food for thought last year when another bacterial transporter was solved at the ETH in Zurich. These two should have looked more similar than they did, to most ways of thinking, but they were quite divergent.
And now we know why. Chang's group was done in by some homebrew software which swapped two columns of data. In a structure this large and complicated, you can have such disruptive things happen and still be able to settle down on a final protein picture - it's just that it'll be completely wrong. And so it was. The same software seems to have undermined the other determinations, too.
This is important (as well as sad and painful) on several levels. For one thing, transporters are essential to understanding resistance to antibiotics and cancer therapies, and they're vital parts of a lot of poorly understood processes in normal cells. We're not going to be able to get a handle on the often-inscrutable distribution of drug candidates in living systems until we know more about these proteins, but now some of what we thought we knew has evaporated on us.
Another point that people shouldn't miss is the trouble with relying too much on computational methods. There's really no alternative to them in protein crystallography, of course, but there always has to be a final "Does that make sense?" test. The difficulty is that many perfectly valid protein structures show up with odd and surprising features. Alternately, it's unnerving that the data for these things can be so thoroughly hosed and still give you a valid-looking structure, but that just serves to underline how careful you have to be.
And we're talking about X-ray data, which (done properly) is considered to be pretty solid stuff. So what does this say about basing research programs on the higher levels of abstraction found in molecular modeling and docking progams?
Comments (21)
+ TrackBacks (0) | Category: In Silico
December 14, 2006
Posted by Derek
Glenn Reynolds gave the pharma industry a much-appreciated thank-you card over at Instapundit:
Only a moron would want to live in a society where people are ashamed to work for drug companies. And yet, I'm not surprised to see that resulting from the demagogy that abounds among politicians and "public interest" types who are not serving the public interest whatsoever.
I'm thinking of having that first sentence engraved on something expensive. Glenn's post prompted Dean Esmay to write a short post on the ethics of drug companies, though, and he's rather less positive. I suppose I shouldn't be surprised, given some of the things he's gone in for in the past. As usual, some of the problem is the difficulty that people have coming to terms with the fact that drug discovery is a for-profit industry.
One comment on his post came from Jerry Kindall, which is mostly favorable to the industry, but nonetheless contains this paragraph:
Drug discovery used to be a total crap-shoot but it's getting more and more targeted as the years go by thanks to ever more sophisticated computer modeling. They are now able to say "okay, this is the chemical receptor that we think we need to address, let's design a molecule that fits into it." This is essentially a nanotechnology, although not the type most people think of when they hear the term.
Ay, would that it were true. As my industry readers know, and as I've been ranting abouit here fairly often, drug discovery is just as much of a crap-shoot as it's ever been. And wouldn't it be great if "sophisticated computer modeling" helped that much? Instead, we get things like this. No, I think what's happening here is that we're being underestimated by our enemies and overestimated by our friends. . .
Comments (32)
+ TrackBacks (0) | Category: In Silico | Why Everyone Loves Us
September 11, 2006
Posted by Derek
It's been a while since I wrote about the neuraminidase inhibitors (Tamiflu and Relenza, oseltamavir and zanamivir). As we start to head into fall, though, I'm sure that avian flu will invade the headlines again, if nothing else (and I hope it's nothing else).
There's an interesting report in Nature (subscriber link) on how these drugs work. Bird flu is a Type A influenza, but there are two broad groups inside that class, which are defined by what variety of neuraminidase enzyme they express. (There are actually nine enzyme variants known, but four of them fall into one group and five into the other).
The drugs were developed against group-2 enzymes, but they're also effective against group-1 influenzas. Since the X-ray crystal structures showed the the drugs bound in the same way to all the group-2 neuraminidases, and since the active sites of all the subtypes across the two groups are extremely similar, no one ever thought that their binding modes would be different. Well, until last month, anyway, when the X-ray crystallographic data came in.
And what it showed was that the active sites of the group-1 enzymes, sequence homology be damned, have a much different structure than the group-2s. As it turns out, though, they can adopt a similar shape when an inhibitor binds to them, which is why the marketed inhibitors still work on them, but they're fundamentally quite different.
I can't resist the urge to use this example to illustrate some of the real problems in our current state of the art for computation and modeling. The differences between these two enzymes are due to their different amino acid residues far away from the active site, which makes modeling them much, much more difficult (and makes the error bars much, much wider when you do). That's why no one realized how far off the group-1 and group-2 neuraminidases were until the X-ray structure was available: modeling couldn't tell you. Any modeling efforts that tried would probably have decided, incorrectly, that the two groups were nearly identical. Why shouldn't they be?
But if we'd had that X-ray data from the start, modeling would very likely have told you, incorrectly, that there was little chance that either Relenza or Tamiflu would work on the group-1 enzyme variants. Why should they? The "induced fit" binding modes, where the enzyme changes shape significantly as the ligand binds, are understandably very difficult to model. There are just too many possibilities, too many of which are within each other's computational error bars.
Now, it's true that this latest work isn't based on molecular modeling at all. (You have to wonder how close these guys got, though). But plenty of projects that are using it are just as much in the dark as a neuraminidase team would have been, and they may not even realize it. Most molecular modelers are well aware of these limitations, but not all of them - or all of the managers over them - are willing to accept them. And when you get out to investors or the general public, it's all too easy for modelers or managers to act as if things are perfectly under control, when in reality they're lurching around in the dark. Like the rest of us. . .
Comments (11)
+ TrackBacks (0) | Category: In Silico | Infectious Diseases
March 23, 2006
Posted by Derek
Here's a limits-to-knowledge post for you. On Wednesday, when I was cranking out a batch of an intermediate we're using these days, I needed to separate two fairly closely related compounds (which I'll call A and B) from each other. One surefire way to have done that was chromatography, but I just didn't have time for that. While I was rota-vapping down the mixture, I noticed that some white crystals were starting to come out of the methylene chloride solution, so I took the flask off and checked a small sample of the solid. Sure enough, it was pretty pure A, so I filtered that off and continued.
Taking out all the solvent left me with more white stuff, which was mostly B, with some A still hanging in there. In the past, we'd purified B by crystallizing it from another solvent mixture (ethyl acetate/hexane, the first combination the lazy - or just plain experienced - organic chemist reaches for). So I tried that out, dissoving the solid in a small-to medium amount of hot ethyl acetate, then adding hexane while it was still warm. I cooled the solution down by dipping the flask in ice water until it had come down to about room temperature, and was swirling it around when suddenly it starting snowing white powder. Ta-daa! A check of this stuff showed that it was almost completely pure B. The solution, for its part, was now a majority of A with some B left around. I took what I had and ran with it - this was one of the bird-in-the-hand situations, because people were waiting on this stuff.
My point is that such things are almost completely empirical. I've never heard of anyone who could predict from first principles what solvent system to use to get something to crystallize. I'd be tremendously impressed if anyone could take the structures of my two compounds, feed them into a dissolvo-matic program and announce "Yep, methylene chloride for A, and ethyl acetate-hexane for B. That'll do the trick."
As far as I know, there's no such thing, and no one is even close. I'd be glad to hear if I'm wrong. But if we can't predict, even just in rank order, what solvents will dissolve (or crash out) a given molecule, just how good is our molecular modeling, anyway?
Comments (8)
+ TrackBacks (0) | Category: In Silico
November 1, 2005
Posted by Derek
I mentioned an interesting paper that's coming out in the Journal of Medicinal Chemistry on molecular modeling. It's a long one from a large group of people scattered across GlaxoSmithKline's worldwide research facilities, entitled "A Critical Assessment of Docking Programs and Scoring Functions." And that's what it is, all right.
For the non-med-chem readers, those are two of the key techniques in computational molecular modeling. Docking refers to taking a modeled version of your small molecule and trying to fit it into a similarly modeled version of the binding site of your protein target. The program ties to take into account the size and shape of the molecule and the binding site, of course, as well as more subtle interactions between the various functional groups. Scoring functions are what the programs use to try to rate how well the docking procedure went for a given compound, and to compare it to others in a given data set.
The GSK team did a very thorough job, evaluating ten different docking programs. They started with seven varying types of protein targets, mostly different classes of enzymes, all of which are known drug targets. An expert computational chemist took each one and polished up the model of the binding site. At the same time, lists of between one and two hundred potential binding compounds were put together for each target, including several series of related compounds. Another modeling chemist took these structures and got them ready for docking. They made sure that a crystal structure of each structural class was known for each case (to check the accuracy of the modeling later on), and also made sure that the binding affinity of the compounds ranged over at least four orders of magnitude (from pretty darn good, in other words, to pretty darn awful). The goal was to make the whole exercise as real-world as possible. Then each of those binding site models and their associated lists of potential ligands were turned over to separate chemists with experience in the various docking programs, and they told them to have at it. As the paper puts it:
"To optimize the performance of each docking program, computational chemists with expertise in a particular program were identified from the worldwide GSK computational chemistry community. Each program expert was given complete freedom and sufficient time to maximize the performance of the docking program. . .No time deadlines were imposed so that even low-throughput docking programs could be evaluated. Indeed, no constraints whatsoever were placed on the level of agonizing over details of how each docking program was applied."
It's important to remember that the results of this paper come from experienced users who had a great deal of knowledge about the targets, and all the time they needed to mess with them. The aformentioned agonizing was devoted to three typical kinds of question that such software is designed to answer: The first was: what is the conformation (the 3-D physical "pose") of a small molecule once it's in a binding site? This is why they picked all these things with known crystal structures, since those provide a check with real data. Results of this test were OK, in some cases fairly good. Some of the target proteins seemed to have binding sites that were more suited for the capabilities of the programs, which could take the majority of the compounds in their list and fit them pretty close (within two angstroms) to the known crystal structures.
And every target had at least one program that could take at least a third or so of the test compounds and dock them fairly well. But the problem was, no one program could do that for more than 35% of the binding modes. The best performances were scattered among the different software packages, and there seems to be absolutely no way to know in advance whether a given program is going to perform well on a new target. The other problem, and it's a big one, was that the scoring functions couldn't reliably identify when the program had hit on one of the good answers. There wasn't much correlation between what the program thought was a well-docked conformation and its resemblance to the known crystal structure.
The second question they looked at was: given a list of molecules (some active, some inactive), how well can the software pick out some active ones? This process is often known as "virtual screening". Again, the results were fairly good, but with some significant problems. For all but one of the targets, at least one of the programs could find at least half of the top 10% of the active compounds. (I know, that sounds like a lot of defensive hedging compared to what some people think these programs can do, but that's the real world for you). The programs also did pretty well at pulling a variety of structures out, and not just making their total by grabbing only the members of one particular class.
But that fairly-decent performance is for the programs as a group. As before, though, the best performances were scattered through all the software packages, with no real standout. Most of the programs, at one point or another, had to grind through a significant amount of a compound lists to do the job, too, which is something you really don't want in real-world use. Another disturbing result was that some of the scoring functions seemed to be picking the right compounds for the wrong reasons – that is, based on incorrect binding modes.
Now we're ready for the third question, a hard one which (in my experience) is one of the ones that medicinal chemists most would like molecular modeling software to answer: given a list of compounds, can the program rank-order them according to their expected affinity for the target? Unfortunately, the answer is "absolutely not." No scoring function in any of the software packages could even come close. The compounds that the programs ranked as winners were just as likely to stink, and the ones that they put into the discard heap were just as likely to be fine.
My way of looking at the first two tests is to say that if you have just one molecular modeling package, it is guaranteed to mislead you a fair amount of the time. And you have no way of knowing when it's doing that. If you have more than one program to work with, though, then they are guaranteed to disagree with each other a fair amount of the time, and you have no way of knowing which one of them is right – if either. I'll let the authors have last word on the third test, and on the software in general:
". . .in the area of rank-ordering or affinity prediction, reliance on a scoring function alone will not provide broadly reliable or useful information. . .This study demonstrates unequivocally that significant improvements are needed before compound scoring by docking algorithms will routinely have a consistent and major impact on lead optimization. . .it is not completely obvious by what means these improvements will arise. . ."
Comments (3)
+ TrackBacks (0) | Category: In Silico
September 29, 2005
Posted by Derek
A comment to the last post really gave me the shivers:
"I like to think of modelling as the "silent killer". It is easy to rely on it for quick answers, and easy to forget that there is no substitute for an actual experiment. . .
I remember asking a fellow scientist if a particular molecule performed as hypothesized, the response was: " I don't know. It did not dock well into the enzyme, so I didn't make it."
I've made this point before, but it needs to be made again: molecular modeling is not reality. Most models are not that good, or only good around a limited group of rather similar compounds. If you as a medicinal chemist are crossing out easy-to-make compounds in unexplored chemical space just because the software doesn't like it, you are handcuffing yourself and tying your thumbs together. Stop it, stop it for your own good, or you may never discover anything unexpected or useful.
"The silent killer": I like that phrase a lot. I get the occasional testy e-mail from the computational types when I talk like this, but I'm sticking to my beliefs here. Molecular models based on numerous high-resolution X-ray structures are, I think, sort of useful, sometimes. Models based on only one X-ray structure are to be approached with great caution. And binding models that are just calculated up de novo should be treated as hazardous to your scientific health, unless you have a great deal of evidence to make you think otherwise.
OK, you silicon jockeys, go ahead and flood my in-box. I've earned it.
Comments (7)
+ TrackBacks (0) | Category: In Silico
September 28, 2005
Posted by Derek
We medicinal chemists spend our days trying to make small molecules that bind to targets in living systems. Almost all of those targets are proteins of one sort or another, and most of them have binding pockets already built into them, which we're trying to hijack for our own purposes. Molecular modelers try to figure out how these things fit together, but there are still a lot of unknowns in what would seem so basic a process.
I'm willing to bet that most chemists and biologists have a mental picture of a small molecule ligand fitting into a binding site which involves the protein sort of folding down around things - gently biting down on the ligand, as it were. It seems intuitively obvious that a protein's motions would settle down once it complexes with its target molecule.
And like a lot of intuitively obvious things in drug research, that idea appears to be mistaken. There's a recent study in the Journal of Medicinal Chemistry from a group at Michigan that tackles this question in a rigorous manner. They looked through the X-ray crystal structure data banks for proteins that have had high-quality structures determined both with and without small molecules bound in them. After controlling for experimental conditions (the temperature that the X-ray structure was taken at, among other things) and for the way the data were processed, they still had a few dozen closely matched pairs.
What they found was that in most of these structures, at least some of the atoms in and near the binding site are more mobile when there's a ligand bound. At times, the effect was pretty dramatic, with the entire binding site becoming more flexible, weirdly enough. Examples where everything got less mobile were found, but that only happened in a minority of the cases. The proteins the authors studied were scattered across a wide range of structural and functional classes, and there's no reason to think that they hit on an anomalous data set.
So, we're going to have to adjust our mental pictures, and the molecular modelers will have to adjust their simulations. I'd like to know just how many of those in silico models of binding would have predicted this greater flexibility. I fear that the answer is "darn near none of them". We have a long way to go.
Comments (9)
+ TrackBacks (0) | Category: In Silico
September 6, 2005
Posted by Derek
I recall a project earlier in my career where we'd all been beating on the same molecular series for quite a while. Many regions of the molecule had been explored, and my urge was often to leave the reservation. I put some time into extending the areas we knew about, but I wanted to go off and make something that didn't look like anything that we'd done before.
Which I did sometimes, and then I'd often get asked: "Why did you make that compound?" My answer was simply "Because no one had ever messed with that area before, and I wanted to see what would happen." Reactions to that approach varied. Some folks found that a perfectly reasonable answer, sufficient by itself. Others didn't care for it much. "You have to have a hypothesis in mind," they'd say. "Are you trying to improve the pharmacokinetics? Fix a metabolic problem? Pick up a binding interaction that you think is out there in the XYZ loop of the protein? You can't just. . .make stuff."
I respected the people in that first group a lot more than I did the ones in the second. I thought then, and think now, that you can just go make stuff. In fact, you not only can, but you should. You probably don't want to spend all your time doing that, but if you never do it at all, you're going to miss the best surprises.
I take issue with the idea that there has to be a specific hypothesis behind every compound. That supposes amounts of knowledge that we just don't have. Most of the time, we don't know why our PK is acting weird, and we're not sure about the metabolic fate of the compounds. And we sure don't know their binding mode well enough to sit at our desks and talk about what amino acids in the protein backbone we're reaching out for. (OK, if you've got half a dozen X-ray structures of your ligands bound in the active site of your target, you have a much better idea. But if your next compound breaks new structural ground, off you may well go into a different binding mode, and half your presuppositions will go, too.)
I like to think that I've come to realize just how ignorant I am in issues of drug discovery. (In case you have any doubt, I'm very ignorant indeed.) But I still hear people confidently sizing up new analog ideas on the blackboard, though: No, that one won't bind well in the Whoozat region. Doesn't have the right spacing. And that one should be able to reach out to that hydrophobic pocket we all know about. Let's make that one first. (These folks are talking without X-ray structures in hand, mind you.)
Well, if it makes you feel better, then go ahead, I suppose. But this kind of thing is one tiny step up from lucky rabbit feet, for which there is still a market.
Comments (4)
+ TrackBacks (0) | Category: In Silico | Who Discovers and Why
August 17, 2004
Posted by Derek
I'm going to take off from another comment, this one from Ron, who asks (in reference to the post two days ago): "would it not be fair to say that cellular biochemistry gets even more complicated the more we learn about it?
It would indeed be fair. I think that as a scientific field matures it goes through several stages. Brute-force collection of facts and observations comes early on, as you'd figure. Then the theorizing starts, with better and better theories being honed by more targeted experiments. This phase can be mighty lengthy, depending on the depth of the field and the number of outstanding problems it contains. A zillion inconsistent semi-trivialities can take a long time to sort out (think of the mathematical proof of the Four-Color Theorem), as can a smaller number of profound headscratchers (like, say, a reconciliation of quantum mechanics with relativity as they deal with gravity.)
If the general principles discovered are powerful enough, things can get simpler to understand. Think of the host of problems that early 20th-century physics had, many of which resolved themselves as applications of quantum mechanics. Earlier, chemistry went through something similar earlier, on a smaller scale, with the adoption of the stereochemical principles of van't Hoff. Suddenly, what seemed to be several separate problems turned out to be facets of one explanation: that atoms had regular three-dimensional patterns of bonding to other atoms. (If that sounds too obvious for such emphasis, keep in mind that this notion was fiercely ridiculed at resisted at the time.)
Cell biology is up to its pith helmet in hypotheses, and is nowhere near out of the swamps of fact collection. As in all molecular biology, the sheer number of different systems is making for a real fiesta. Your average cell is a morass of interlocking positive and negative feedback loops, many of which only show up fleetingly, under certain conditions, and in very defined locations. Some general principles have been established, but the number of things that have to be dealt with is still increasing, and I'm not sure when it's going to level out.
For example, the other day a group at Sugen (now Pfizer) published a paper establishing just how many genes there are in mice that code for protein kinase enzymes. Through adding phosphoryl groups, these enzymes are extremely important actors in the activation, transport, and modulation of the activities of thousands upon thousands of other proteins, and it turns out that there are exactly 540 of them. (Doubtless there are some variations as they get turned into proteins, but that's how many genes there are.) And that's that.
Now, that earlier discovery of protein phosphorylation as a signaling mechanism was a huge advance, and it has been appropriately rewarded. And knowing just how many different kinase enzymes there are is a step forward, too. But figuring out all the proteins they interact with, and when, and where, and what happens when they do - well, that's first cousin to hard work.
Comments (0)
+ TrackBacks (0) | Category: Biological News | In Silico
August 15, 2004
Posted by Derek
Reader Maynard Handley, in a comment to the most recent post below, asks:
". . .how far are we from doing at least a substantial fraction of this stuff in silico? I've read that some amazing computational models of full cells now exist, but even so, this author didn't expect that drugs could be usefully tested computationally until 2030 which seems awfully far out."
I don't know the article that he's referring to, but "awfully far out" pretty much sums up my reaction, too. I just don't think we have enough data to do any real whole-cell modeling yet. It's coming, and perhaps for a few very well-worked-out subsystems we could do it now, but I'm sceptical even of that.
A few days reading the current cell biology literature will illustrate the problem. All sorts of proteins are found, all the time, to be players in systems that no one suspected them of being involved it. Kinases are found to phosphorylate things that no one had seen them do before, lipases are found to accept substrates that no one had realized they could. A given signaling peptide is gradually found to have more uses than a Swiss army knife. We don't even really understand the basic mechanisms (like G-protein-coupled receptor signaling) enough to model them to any useful level.
The process of finding these things out doesn't seem like it's going to end soon, and there have to be many fundamental surprises waiting for us. Modeling the system in their absence is going to be risky - interesting, no doubt, and potentially lucrative (if you find a useful approximation), but risky. It's going to take some pretty convincing stuff for the drug industry to ever depend on it.
And all of this applies to single cells, which come in, naturally, an uncounted variety, each with its own peculiarities, the great majority of which we don't have any clue about. And then you come to the interactions between cells, which are highly significant and (in many ways) a closed book to us at present. If we knew more about these things, we'd be able, for example, to culture human cell lines that acted just like their primary tissue progenitors - but we can't do it, not yet.
No, although I have every belief that these things are susceptible to modeling, I just don't think we'll see it (on a useful scale) any time soon. Over the next twenty years, I'd expect to have some of the easier-to-handle cellular subsystems worked out to give robust in silico treatments, but a whole cell? And all the types of whole cells? Much longer than that. More than that I can't guess.
Comments (3)
+ TrackBacks (0) | Category: In Silico
April 15, 2004
Posted by Derek
Thinking about molecular modeling, as I did in the last post, brings up another topic: when you go back to the late 1980s, in the real manic phase of the technological hype, what brings you up short is realizing that these folks were planning on doing all this with 1980s hardware.
That puts things in perspective. Here we are in 2004, and we still can't just sit down and design a drug from first principles. Don't believe anyone who tells you that we can, either - if that were possible, there would be a lot more drugs out there. I'm not saying that molecular modeling never makes a contribution (I know better, and from personal experience.) It's just that it hasn't (yet) caught up to the hallucinations of fifteen or twenty years ago, which is entirely the fault of the people who were doing the hallucinating.
You can make the same comments about other waves of hype that have broken over the pharmaceutical world (combinatorial chemistry comes immediately to mind.) What I'm wondering is: what's the hype of today? There's bound to be a hot new idea that's going to solve our problems, but will end up changed beyond recognition after twenty years of the real world. Any votes on what's going to look faintly ridiculous to us in 2024? As you'd guess, I have some candidates of my own. . .
Comments (2)
| Category: Drug Industry History | In Silico
April 14, 2004
Posted by Derek
Molecular modeling is a technology with a past. Specifically, it's a past of overoptimistic predictions (often made, to be fair, by people who didn't understand what they were talking about.) Back in the late 1980s, when I started in the drug industry, modeling was going to take over the world and pretty darn soon, too. Several companies were founded to take advantage of this brave new world that had such software in it, and they raised serious money with tales of how they were just going to zzzzzip right to the drug structures. No dead ends, no detours, no cast of thousands - just a few chemists standing by to make the structure as it printed out for them. This has not quite worked out.
For those not in the business, modeling is the attempt to figure out molecular shapes, properties, and interactions by computation. There are many levels, some more successful than others. The ones I'm speaking of involve predicting three-dimensional shapes of molecules (and their target binding sites), and deciding which ones are more likely to fit well. It sounds like just what we need. It also sounds reasonably doable, in the same way that Hercules was probably told at first that he was going to just have to round up a few stray animals.
Predicting the shapes involves modeling the individual chemical bonds, and the interactions as the atoms and functional groups rotate around them, banging into each other or sticking through various forces. Originally, these things were calculated as if they were in interstellar space, with nothing around them. Later (and ever since) a number of methods to add some real-world solvent effects have been tried.
Another set of programs evaluates intermolecular fits, trying to work out the energies in play when a drug molecule slides into its binding site. Many tricky refinements have been added to those packages over the years, too, taking advantage of the latest insights into how various groups stack, pack, and interact.
And often enough, it just isn't enough. Many times the structures we have for our binding sites aren't accurate - the best ones are from X-ray crystallography, and plenty of good stuff just doesn't crystallize. (There are other cases where the crystal structure doesn't bear much relation to what's going on inside the real system, too, just to keep everyone on their toes.) Modeling goes haywire for all kinds of reasons.
One of the companies that emerged back in the change-the-world era of modeling was Vertex, up in Cambridge. It was founded by Joshua Boger, a Merck chemist who wanted a piece of the new thing and wasn't sure that Merck was taking it seriously enough. Well, coming soon in the Journal of Medicinal Chemistry (it's in the web preprint section now) is a paper from Vertex which gives us all some idea of why things didn't work out quite as planned.
The Vertex guys went back over about 150 cases, and found that in the majority of them, the structure of the small molecule in its binding pocket wasn't the structure you would have predicted as the best (read: lowest-energy.) In many of them, it isn't even close. You'd literally never have picked some of these conformations to start a modeling effort - they look very disfavored, and if you're going to pick things that far from the ground state then there's no end to it. The number of structures gets worse very rapidly as you move away from the local energy minima.
We in the business had suspected as much, and everyone knew of an example or two, but this is a quantitative look at just how bad the situation is. When you add in the cases where the binding site changes its conformation unexpectedly in response to the ligand, it's a wonder that any modeling efforts work at all. (Frankly, in my experience, they mostly don't, but I'm willing to stipulate that my experience has been more negative than the average.)
I like to say that molecular modeling is a magic wand, one that we keep waving in the hope that sparks will eventually start to shoot out of it. Someday they will. But there's a lot more hard work ahead, and no shortcuts in sight.
Comments (0)
| Category: Drug Industry History | In Silico
|
|