About this Author
Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe
|

Category Archives
February 28, 2013
Posted by Derek
I saw this story this morning, about IBM looking for more markets for its Watson information-sifting system (the one that performed so publicly on "Jeopardy". And this caught my eye for sure:
John Baldoni, senior vice president for technology and science at GlaxoSmithKline, got in touch with I.B.M. shortly after watching Watson’s “Jeopardy” triumph. He was struck that Watson frequently had the right answer, he said, “but what really impressed me was that it so quickly sifted out so many wrong answers.”
That is a huge challenge in drug discovery, which amounts to making a high-stakes bet, over years of testing, on the success of a chemical compound. The failure rate is high. Improving the odds, Mr. Baldoni said, could have a huge payoff economically and medically.
Glaxo and I.B.M. researchers put Watson through a test run. They fed it all the literature on malaria, known anti-malarial drugs and other chemical compounds. Watson correctly identified known anti-malarial drugs, and suggested 15 other compounds as potential drugs to combat malaria. The two companies are now discussing other projects.
“It doesn’t just answer questions, it encourages you to think more widely,” said Catherine E. Peishoff, vice president for computational and structural chemistry at Glaxo. “It essentially says, ‘Look over here, think about this.’ That’s one of the exciting things about this technology.”
Now, without seeing some structures and naming some names, it's completely impossible to say how valuable the Watson suggestions were. But I would very much like to know on what basis these other compounds were suggested: structural similarity? Mechanisms in common? Mechanisms that are in the same pathway, but hadn't been specifically looked at for malaria? Something else entirely? Unfortunately, we're probably not going to be able to find out, unless GSK is forthcoming with more details.
Eventually, there's coing to be another, somewhat more disturbing answer to that "what basis?" question. As this Slate article says, we could well get to the point where such systems make discoveries or correlations that are correct, but beyond our ability to figure out. Watson is most certainly not there yet. I don't think anything is, or is really all that close. But that doesn't mean it won't happen.
For a look at what this might be like, see Ted Chiang's story "Catching Crumbs From the Table", which appeared first in Nature, and then in his collection Stories of Your Life and Others , which I highly recommend, as "The Evolution of Human Science".
Comments (32)
+ TrackBacks (0) | Category: In Silico | Infectious Diseases
February 8, 2013
Posted by Derek
There's a fascinating paper out on the concept of "drug-likeness" that I think every medicinal chemist should have a look at. It would be hard to count the number of publications on this topic over the last ten years or so, but what if we've been kidding ourselves about some of the main points?
The big concept in this area is, of course, Lipinski criteria, or Rule of Five. Here's what the authors, Peter Kenny and Carlos Montanari of the University of São Paulo, have to say:
No discussion of drug-likeness would be complete without reference to the influential Rule of 5 (Ro5) which is essentially a statement of property distributions for compounds taken into Phase II clinical trials. The focus of Ro5 is oral absorption and the rule neither quantifies the risks of failure associated with non-compliance nor provides guidance as to how sub-optimal characteristics of compliant compounds might be improved. It also raises a number of questions. What is the physicochemical basis of Ro50s asymmetry with respect to hydrogen bond donors and acceptors? Why is calculated octanol/water partition coefficient (ClogP) used to specify Ro50s low polarity limit when the high polarity cut off is defined in terms of numbers of hydrogen bond donors and acceptors? It is possible that these characteristics reflect the relative inability of the octanol/water partitioning system to ‘see’ donors (Fig. 1) and the likelihood that acceptors (especially as defined for Ro5) are more common than donors in pharmaceutically-relevant compounds. The importance of Ro5 is that it raised awareness across the pharmaceutical industry about the relevance of physico- chemical properties. The wide acceptance of Ro5 provided other researchers with an incentive to publish analyses of their own data and those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.
There, fellow med-chemists, doesn't this already sound like something you want to read? Thought so. Here, have some more:
Despite widespread belief that control of fundamental physicochemical properties is important in pharmaceutical design, the correlations between these and ADMET properties may not actually be as strong as is often assumed. The mere existence of a trend is of no interest in drug discovery and strengths of trends must be known if decisions are to be accurately described as data-driven. Although data analysts frequently tout the statistical significance of the trends that their analysis has revealed, weak trends can be statistically significant without being remotely interesting. We might be confident that the coin that lands heads up for 51 % of a billion throws is biased but this knowledge provides little comfort for the person charged with predicting the result of the next throw. Weak trends can be beaten and when powered by enough data, even the feeblest of trends acquires statistical significance.
So, where are the authors going with all this entertaining invective? (Not that there's anything wrong with that; I'm the last person to complain). They're worried that the transformations that primary drug property data have undergone in the literature have tended to exaggerate the correlations between these properties and the endpoints that we care about. The end result is pernicious:
Correlation inflation becomes an issue when the results of data analysis are used to make real decisions. To restrict values of properties such as lipophilicity more stringently than is justified by trends in the data is to deny one’s own drug-hunting teams room to maneuver while yielding the initiative to hungrier, more agile competitors.
They illustrate this by reference to synthetic data sets, showing how one can get rather different impressions depending on how the numbers are handled along the way. Representing sets of empirical points by using their average values, for example, can cause the final correlations to appear more robust than they really are. That, the authors say, is just what happened in this study from 2006 ("Can we rationally design promiscuous drugs?) and in this one from 2007 ("The influence of drug-like concepts on decision-making in medicinal chemistry"). The complaint is that showing a correlation between cLogP and median compound promiscuity does not imply that there is one between cLogP and compound promiscuity per se. And the authors note that the two papers manage to come to opposite conclusions about the effect of molecular weight, which does make one wonder. The "Escape from flatland" paper from 2009 and the "ADMET rules of thumb" paper from 2008 (mentioned here) also come in for criticism on this point - binning averaged data from a large continuous set and then treated those as real objects for statistic analysis. Ones conclusions depend strongly on how many bins one uses. Here's a specific take on that last paper:
The end point of the G2008 analysis is ‘‘a set of simple interpretable ADMET rules of thumb’’ and it is instructive to examine these more closely. Two classifications (ClogP<4 and MW<400 Da; ClogP>4 or MW>400 Da) were created and these were combined with the four ionization state classifications to define eight classes of compound. Each combination of ADMET property and compound class was labeled according to whether the mean value of the ADMET property was lower than, higher than or not significantly different from the average for all compounds. Although the rules of thumb are indeed simple, it is not clear how useful they are in drug discovery. Firstly, the rules only say whether or not differences are significant and not how large they are. Secondly, the rules are irrelevant if the compounds of interest are all in the same class. Thirdly, the rules predict abrupt changes in ADMET properties going from one class to another. For example, the rules predict significantly different aqueous solubility for two neutral compounds with MW of 399 and 401 Da, provided that their ClogP values do not exceed 4. It is instructive to consider how the rules might have differed had values of logP and MW of 5 and 500 Da (or 3 and 300 Da) had been used to define them instead of 4 and 400 Da.
These problems also occur in graphical representations of all these data, as you'd imagine, and the authors show several of these that they object to. A particular example is this paper from 2010 ("Getting physical in drug discovery"). Three data sets, whose correlations in their primary data do not vary significantly, generate very different looking bar charts. And that leads to this comment:
Both the MR2009 and HY2010 studies note the simplicity of the relationships that the analysis has revealed. Given that drug discovery would appear to be anything but simple, the simplicity of a drug-likeness model could actually be taken as evidence for its irrelevance to drug discovery. The number of aromatic rings in a molecule can be reduced by eliminating rings or by eliminating aromaticity and the two cases appear to be treated as equivalent in both the MR2009 and HY2010 studies. Using the mnemonic suggested in MR2009 one might expect to make a compound more developable by replacing a benzene ring with cyclohexadiene or benzoquinone.
The authors wind up by emphasizing that they're not saying that things like lipophilicity, aromaticity, molecular weight and so on are unimportant - far from it. What they're saying, though, is that we need to be aware of how strong these correlations really are so that we don't fool ourselves into thinking that we're addressing our problems, when we really aren't. We might want to stop looking for huge, universally applicable sets of rules and take what we can get in smaller, local data sets within a given series of compounds. The paper ends with a set of recommendations for authors and editors - among them, always making primary data sets part of the supplementary material, not relying on purely graphical representations to make statistical points, and a number of more stringent criteria for evaluating data that have been partitioned into bins. They say that they hope that their paper "stimulates debate", and I think it should do just that. It's certainly given me a lot of things to think about!
Comments (13)
+ TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico | The Scientific Literature
January 30, 2013
Posted by Derek
Here are some angry views that I don't necessarily endorse, but I can't say that they're completely wrong, either. A programmer bids an angry farewell to the bioinformatics world:
Bioinformatics is an attempt to make molecular biology relevant to reality. All the molecular biologists, devoid of skills beyond those of a laboratory technician, cried out for the mathematicians and programmers to magically extract science from their mountain of shitty results.
And so the programmers descended and built giant databases where huge numbers of shitty results could be searched quickly. They wrote algorithms to organize shitty results into trees and make pretty graphs of them, and the molecular biologists carefully avoided telling the programmers the actual quality of the results. When it became obvious to everyone involved that a class of results was worthless, such as microarray data, there was a rush of handwaving about “not really quantitative, but we can draw qualitative conclusions” followed by a hasty switch to a new technique that had not yet been proved worthless.
And the databases grew, and everyone annotated their data by searching the databases, then submitted in turn. No one seems to have pointed out that this makes your database a reflection of your database, not a reflection of reality. Pull out an annotation in GenBank today and it’s not very long odds that it’s completely wrong.
That's unfair to molecular biologists, but is it unfair to the state of bioinformatic databases? Comments welcome. . .
Update: more comments on this at Ycombinator.
Comments (62)
+ TrackBacks (0) | Category: Biological News | In Silico
January 28, 2013
Posted by Derek
We medicinal chemists talk a good game when it comes to the the hydrophobic effect. It's the way that non-water-soluble molecules (or parts of molecules) like to associate with each other, right? Sure thing. And it works because of. . .well, van der Waals forces. Or displacement of water molecules from protein surfaces. Or entropic effects. Or all of those, plus some other stuff that, um, complicated to explain. Something like that.
Here's a paper in Angewandte Chemie that really bears down on the topic. The authors study the binding of simple ligands to thermolysin, a well-worked-out system for which very high-resolution X-ray structures are available. And what they find is, well, that things really are complicated to explain:
In summary, there are no universally valid reasons why the hydrophobic effect should be predominantly “entropic” or “enthalpic”; small structural changes in the binding features of water molecules on the molecular level determine whether hydrophobic binding is enthalpically or entropically driven.
Admittedly, this study reaches the limits of experimental accuracy accomplishable in contemporary protein–ligand structural work. . .Surprising pairwise systematic changes in the thermodynamic data are experienced for complexes of related ligands, and they are convincingly well reflected by the structural properties. The present study unravels small but important details. Computational methods simulate molecular properties at the atomic level, and are usually determined by the summation of many small details. However, details such as those observed here are usually not regarded by these computational methods as relevant, simply because we are not fully aware of their importance for protein–ligand binding, structure–activity relationships, and rational drug design in general. . .
I think that there are a lot of things in this area of which we're not fully aware. There are many others that we treat as unified phenomena, because we've given them names that make us imagine that they are. The hydrophobic effect is definitely one of these - George Whitesides is right when he says that there are many of them. But when all of these effects, on closer inspection, break down into tiny, shifting, tricky arrays of conflicting components, can you blame us for simplifying?
Comments (13)
+ TrackBacks (0) | Category: "Me Too" Drugs | Chemical News | In Silico
January 17, 2013
Posted by Derek
Here's a recent paper in J. Med. Chem. on halogen bonding in medicinal chemistry. I find the topic interesting, because it's an effect that certainly appears to be real, but is rarely (if ever) exploited in any kind of systematic way.
Halogens, especially the lighter fluorine and chlorine, are widely used substituents in medicinal chemistry. Until recently, they were merely perceived as hydrophobic moieties and Lewis bases in accordance with their electronegativities. Much in contrast to this perception, compounds containing chlorine, bromine, or iodine can also form directed close contacts of the type R–X···Y–R′, where the halogen X acts as a Lewis acid and Y can be any electron donor moiety. . .
What seems to be happening is that the electron density around the halogen atom is not as smooth as most of us picture it. You'd imagine a solid cloud of electrons around the bromine atom of a bromoaromatic, but in reality, there seems to be a region of slight positivecharge (the "sigma hole") out on the far end. (As a side effect, this give you more of a circular stripe of negative charge as well). Both these effects have been observed experimentally.
Now, you're not going to see this with fluorine; that one is more like most of us picture it (and to be honest, fluorine's weird enough already). But as you get heavier, things become more pronounced. That gives me (and probably a lot of you) an uneasy feeling, because traditionally we've been leery of putting the heavier halogens into our molecules. "Too much weight and too much hydrophobicity for too little payback" has been the usual thinking, and often that's true. But it seems that these substituents can actually earn out their advance in some cases, and we should be ready to exploit those, because we need all the help we can get.
Interestingly, you can increase the effect by adding more fluorines to the haloaromatic, which emphasizes the sigma hole. So you have that option, or you can take a deep breath, close your eyes, and consider. . .iodos:
Interestingly, the introduction of two fluorines into a chlorobenzene scaffold makes the halogen bond strength comparable to that of unsubstituted bromobenzene, and 1,3-difluoro-5-bromobenzene and unsubstituted iodobenzene also have a comparable halogen bond strength. While bromo and chloro groups are widely employed substituents in current medicinal chemistry, iodo groups are often perceived as problematic. Substituting an iodoarene core by a substituted bromoarene scaffold might therefore be a feasible strategy to retain affinity by tuning the Br···LB (Lewis base) halogen bond to similar levels as the original I···LB halogen bond.
As someone who values ligand efficiency, the idea of putting in an iodine gives me the shivers. A fluoro-bromo combo doesn't seem much more attractive, although almost anything looks good compared to a single atom that adds 127 mass units at a single whack. But I might have to learn to love one someday.
The paper includes a number of examples of groups that seem to be capable of interacting with halogens, and some specific success stories from recent literature. It's probably worth thinking about these things similarly to the way we think about hydrogen bonds - valuable, but hard to obtain on purpose. They're both directional, and trying to pick up either one can cause more harm than good if you miss. But keep an eye out for something in your binding site that might like a bit of positive charge poking at it. Because I can bet that you never thought to address it with a bromine atom!
Update: in the spirit of scientific inquiry, I've just sent in an iodo intermediate from my current work for testing in the primary assay. It's not something I would have considered doing otherwise, but if anyone gives me any grief, I'll tell them that it's 2013 already and I'm following the latest trends in medicinal chemistry.
Comments (16)
+ TrackBacks (0) | Category: Chemical Biology | Chemical News | In Silico
January 14, 2013
Posted by Derek
Virtual screening is what many people outside the field are thinking of when they talk about the use of computational models in drug discovery. There are many other places where modeling can pitch in, but one of the dreams has always been to take a given protein target and a long list of chemical structures, hit the button, and come back to a sorted list of which ones are going to bind well. That list could be as long as "every compound in our screening deck", or "every available compound in the commercial catalogs", or "everything that our chemists can think of and draw on a whiteboard, whether it's ever been made or not". So these virtual collections can get rather large, but that's what computer power is for, right?
Despite what some people might think, we're not exactly there yet. But we're not exactly not there, either, if you know what I mean. Like much of drug discovery, it's in that awkward age. Virtual screening is certainly real, and it can be useful, but it can also waste your time if you're not careful. And that's where this paper comes in - it's a fine overview of the issue that you need to think about if you're interested in trying this technique.
For one thing, you need to decide if you're going to be taking a drug target whose structure you know pretty well and modeling a bunch of small compounds into it, or if you're taking a bunch of small molecules whose activities you know pretty well and trying to find more compounds like them. These two approaches call for some different methods, and have different potential problems. The second one, especially in the older literature, often goes under the name of QSAR, for quantitative structure-activity relationship. But as the authors point out, "virtual screening" as a name has some advantages, because many people have been burned by things labeled "QSAR" over the years. They're also being used for different purposes, which is probably a good thing:
A fundamental assumption inherent in QSAR and pharmacophore-based VS is the “similar property principle”, that is, the general observation that molecules with similar structure are likely to have similar properties. While this assumption holds true in many cases, there are many counter-examples in the field of QSAR which lead to erroneous predictions and can shake the confidence of the experimental community in the prospective utility of QSAR modeling. Interestingly, this has not yet (or not to the same extent) been the case with VS. The difference is that QSAR is typically employed to evaluate a limited number of synthetic candidates, where errors are more noticeable and costly. However, when these techniques are applied on a massive scale to screen large chemical libraries, errors are much more easily tolerated as the objective is to increase the number and diversity of hits over what would have been otherwise a random selection.
The authors extensively cover the previous literature on computational screening - successful examples, warnings of trouble, theoretical predictions both optimistic and pessimistic. It would take you quite a while to assemble this list on your own, so that by itself recommends this paper to anyone interested in the area. But they go on to codify the various pitfalls to look out for.
"Such as expecting it to work", the cynics in the audience will remark. I say that sort of thing under my breath for time to time myself - or audibly, as the case may be. But this is the sort of paper that I can really endorse, because it's a completely realistic view of what you can expect with current technology. And that comes down to "Less than you want", but still "More than you might think". You're not going to able to feed the software the complete pile of all the chemical supplier catalogs and come back to find the nanomolar leads printing out. But you can get pointers toward parts of chemical space that you wouldn't have thought about (or wouldn't have been able to physically screen).
One tricky part is that when a virtual screening effort is successful (for whatever value you assign to "success"), it can be hard to tell why, and likewise for failures. There are so many places where things can disconnect - proteins are mobile, and small molecules even more so, and accounting for these conformational ensembles is not trivial. Binding interactions are not always well understood, or well modeled. Water molecules are pesky, but can be vitally important. You might have picked inappropriate controls (positive or negative), or be weighting the various computed factors in the wrong way. Either of those will send your calculation further and further off the rails.
And so on. The paper goes into detail on these possibilities and more; I highly recommend it for anyone getting into virtual screening (or for anyone already doing it, to keep the troubleshooting guide in one handy place).
Comments (2)
+ TrackBacks (0) | Category: In Silico
January 10, 2013
Posted by Derek
There's a paper out in Nature with the provocative title of "Automated Design of Ligands to Polypharmcological Profiles". Admittedly, to someone outside my own field of medicinal chemistry, that probably sounds about as dry as the Atacama desert, but it got my attention.
It's a large multi-center contribution, but the principal authors are Andrew Hopkins at Dundee and Bryan Roth at UNC-Chapel Hill. Using James Black's principle that the best place to find a new drug is to start with an old drug, what they're doing here is taking known ligands and running through a machine-learning process to see if they can introduce new activities into them. Now, those of us who spend time trying to take out other activities might wonder what good this is, but there are a some good reasons: for one thing, many CNS agents are polypharmacological to start with. And there certainly are situations where you want dual-acting compounds, CNS or not, which can be a major challenge. And read on - you can run things to get selectivity, too.
So how well does their technique work? The example they give starts with the cholinesterase inhibitor donepezil (sold as Aricept), which has a perfectly reasonable med-chem look to its structure. The groups' prediction, using their current models, was the it had a reasonable chance of having D4 dopaminergic activity, but probably not D2 (which numbers were borne out by experiment, and might have something to do with whatever activity Aricept has for Alzheimer's). I'll let them describe the process:
We tested our method by evolving the structure of donepezil with the dual objectives of improving D2 activity and achieving blood–brain barrier penetration. In our approach the desired multi-objective profile is defined a priori and then expressed as a point in multi-dimensional space termed ‘the ideal achievement point’. In this first example the objectives were simply defined as two target properties and therefore the space has two dimensions. Each dimension is defined by a Bayesian score for the predicted activity and a combined score that describes the absorption, distribution, metabolism and excretion (ADME) properties suitable for blood–brain barrier penetration (D2 score = 100, ADME score = 50). We then generated alternative chemical structures by a set of structural transformations using donepezil as the starting structure. The population was subsequently enumerated by applying a set of transformations to the parent compound(s) of each generation. In contrast to rules-based or synthetic-reaction-based approaches for generating chemical structures, we used a knowledge-based approach by mining the medicinal chemistry literature. By deriving structural transformations from medicinal chemistry, we attempted to mimic the creative design process.
Hmm. They rank these compounds in multi-dimensional space, according to distance from the ideal end point, filter them for chemical novelty, Lipinski criteria, etc., and then use the best structures as starting points for another round. This continues until you reach close enough to the desired point, or until you dead-end on improvement. In this case, they ended up with fairly active D2 compounds, by going to a lactam in the five-membered ring, lengthening the chain a bit, and going to an arylpiperazine on the end. They also predicted, though, that these compounds would hit a number of other targets, which they indeed did on testing.
How about something a bit more. . .targeted? They tried taking these new compounds through another design loop, this time trying to get rid of all the alpha-adrenergic activity they'd picked up, while maintaining the 5-HT1A and dopamine receptor activity they now had. They tried it both ways - running the algorithms with filtration of the alpha-active compounds at each stage, and without. Interestingly, both optimizations came up with very similar compounds, differing only out on the arylpiperazine end. The alpha-active series wanted ortho-methoxyphenyl on the piperazine, while the alpha-inactive series wanted 2-pyridyl. These preferences were confirmed by experiment as well. Some of you who've worked on adrenergics might be saying "Well, yeah, that's what the receptors are already known to prefer, so what's the news here?" But keep in mind, what the receptors are known to prefer is what's been programmed into this process, so of course, that's what it's going to recapitulate. The idea is for the program to keep track of all the known activities - the huge potential SAR spreadsheet - so you don't have to try to do it yourself, with you own grey matter.
The last example asks whether, starting from donezepil, potent and selective D4 compounds could be evolved. I'm going to reproduce the figure from the paper here, to give an idea of the synthetic transformations involved:

So, donezepil (compound 1) is 614 nM against D4, and after a few rounds of optimization, you get structure 13, which is 9 nM. Not bad! Then if you take 13 as a starting point, and select for structural novelty along the way, you get 18 (five micromolar against D4), 20, 21, and (S)-27 (which is 90 nM at D4). All of these compounds picked up a great deal more selectivity for D4 compared to the earlier donezepil-derived scaffolds as well.
Well, then, are we all out of what jobs we have left? Not just yet. You'll note that the group picked GPCRs as a field to work in, partly because there's a tremendous amount known about their SAR preferences and cross-functional selectivities. And even so, of the 800 predictions made in the course of this work, the authors claim about a 75% success rate - pretty impressive, but not the All-Seeing Eye, quite yet. I'd be quite interested in seeing these algorithms tried out on kinase inhibitors, another area with a wealth of such data. But if you're dwelling among the untrodden ways, like Wordsworth's Lucy, then you're pretty much on your own, I'd say, unless you 're looking to add in some activity in one of the more well-worked-out classes.
But knowledge piles up, doesn't it? This approach is the sort of thing that will not be going away, and should be getting more powerful and useful as time goes on. I have no trouble picturing an eventual future where such algorithms do a lot of the grunt work of drug discovery, but I don't foresee that happened for a while yet. Unless, of course, you do GPCR ligand drug discovery. In that case, I'd be contacting the authors of this paper as soon as possible, because this looks like something you need to be aware of.
Comments (12)
+ TrackBacks (0) | Category: Drug Assays | In Silico | The Central Nervous System
December 11, 2012
Posted by Derek
I notied this piece on Slate (originally published in New Scientist) about Kaggle, a company that's working on data-prediction algorithms. Actually, it might be more accurate to say that they're asking other people to work on data-prediction algorithems, since they structure their tasks as a series of open challenges, inviting all comers to submit their best shots via whatever computational technique they think appropriate.
PA: How exactly do these competitions work?
JH: They rely on techniques like data mining and machine learning to predict future trends from current data. Companies, governments, and researchers present data sets and problems, and offer prize money for the best solutions. Anyone can enter: We have nearly 64,000 registered users. We've discovered that creative-data scientists can solve problems in every field better than experts in those fields can.
PA: These competitions deal with very specialized subjects. Do experts enter?
JH: Oh yes. Every time a new competition comes out, the experts say: "We've built a whole industry around this. We know the answers." And after a couple of weeks, they get blown out of the water.
I have a real approach-avoidance conflict with this sort of thing. I tend to root for outsiders and underdogs, but naturally enough, when they're coming to blow up what I feel is my own field of expertise, that's a different story, right? And that's just what this looks like: the Merck Molecular Activity Challenge, which took place earlier this fall. Merck seems to have offered up a list of compounds of known activity in a given assay, and asked people to see if they could recapitulate the data through simulation.
Looking at the data that were made available, I see that there's a training set and a test set. They're furnished as a long run of molecular descriptors, but the descriptors themselves are opaque, no doubt deliberately (Merck was not interested in causing themselves any future IP problems with this exercise). The winning team was a group of machine-learning specialists from the University of Toronto and the University of Washington. If you'd like to know a bit more about how they did it, here you go. No doubt some of you will be able to make more of their description than I did.
But I would be very interested in hearing some more details on the other end of things. How did the folks at Merck feel about the results, with the doors closed and the speaker phone turned off? Was it better or worse than what they could have come up with themselves? Are they interested enough in the winning techniques that they've approached the high-ranking groups with offers to work on virtual screening techniques? Because that's what this is all about: running a (comparatively small) test set of real molecules past a target, and then switching to simulations and screening as much of small molecule chemical space as you can computationally stand. Virtual screening is always promising, always cost-attractive, and sometimes quite useful. But you never quite know when that utility is going to manifest itself, and when it's going to be another goose hunt. It's a longstanding goal of computational drug design, for good reason.
So, how good was this one? That also depends on the data set that was used, of course. All of these algorithm-hunting methods can face a crucial dependence on the training sets used, and their relations to the real data. Never was "Garbage In, Garbage Out" more appropriate. If you feed in numbers that are intrinsically too well-behaved, you can emerge with a set of rules that look rock-solid, but will take ou completely off into the weeds when faced with a more real-world situation. And if you go to the other extreme, starting with wooly multi-binding-mode SAR with a lot of outliers and singletons in it, you can end up fitting equations to noise and fantasies. That does no one any good, either.
Back last year, I talked about the types of journal article titles that make me keep on scrolling past them, and invited more. One of the comments suggested "New and Original strategies for Predictive Chemistry: Why use knowledge when fifty cross-correlated molecular descriptors and a consensus of over-fit models will tell you the same thing?". What I'd like to know is, was this the right title for this work, or not?
Comments (28)
+ TrackBacks (0) | Category: In Silico
November 26, 2012
Posted by Derek
As mentioned the other day, this will be a post for people to ask questions directly to Philip Skinner (SDBioBrit) of Perkin-Elmer/Cambridgesoft. He's doing technical support for ChemDraw, ChemDraw4Excel, E-Notebook, Inventory, Registration, Spotfire, Chem3D, etc., and will be monitoring the comments and posting there. Hope it helps some people out!
Note - he's out on the West Coast of the US, so allow the poor guy time to get up and get some coffee in him!
Comments (76)
+ TrackBacks (0) | Category: Chemical News | In Silico
August 22, 2012
Posted by Derek
Hang around a bunch of medicinal chemists (no, really, it's more fun than you'd think) and you're bound to hear discussion of cLogP. For the chemists in the crowd, I should warn you that I'm about to say nasty things about it.
For the nonchemists in the crowd, logP is a measure of how greasy (or how polar) a compound is. It's based on a partition experiment: shake up a measured amount of a compound with defined volumes of water and n-octanol, a rather greasy solvent which I've never seen referred to in any other experimental technique. Then measure how much of the compound ends up in each layer, and take the log of the octanol/water ratio. So if a thousand times as much compound goes into the octanol as goes into the water (which for drug substances is quite common, in fact, pretty good), then the logP is 3. The reason we care about this is that really greasy compounds (and one can go up to 4, 5, 6, and possibly beyond), have problems. They tend to dissolve poorly in the gut, have problems crossing membranes in living systems, get metabolized extensively in the liver, and stick to a lot of proteins that you'd rather they didn't stick to. Fewer high-logP compounds are capable of making it as drugs.
So far, so good. But there are complications. For one thing, that description above ignores the pH of the water solution, and for charged compounds that's a big factor. logD is the term for the distribution of all species (ionized or not), and logD at pH 7.4 (physiological) is a valuable measurement if you've got the possibility of a charged species (and plenty of drug molecules do, thanks to basic amines, carboxylic acids, etc.) But there are bigger problems.
You'll notice that the experiment outlined in the second paragraph could fairly be described as tedious. In fact, I have never seen it performed. Not once, and I'll bet that the majority of medicinal chemists never have, either. And it's not like it's just being done out of my sight; there's no roomful of automated octanol/water extraction machines clanking away in the basement. I should note that there are other higher-throughput experimental techniques (such as HPLC retention times) that also correlate with logP and have been used to generate real numbers, but even those don't account for the great majority of the numbers that we talk about all the time. So how do we manage to do that?
It has to do with a sleight of hand I've performed while writing the above sections, which some of you have probably already noticed. Most of the time, when we talk about logP values in early drug discovery, we're talking about cLogp. That "c" stands for calculated. There are several programs that estimate logP based on known values for different rings and functional groups, and with different algorithms for combining and interpolating them. In my experience, almost all logP numbers that get thrown around are from these tools; no octanol is involved.
And sometimes that worries me a bit. Not all of these programs will tell you how solid those estimates are. And even if they will, not all chemists will bother to check. If your structure is quite close to something that's been measured, then fine, the estimate is bound to be pretty good. But what if you feed in a heterocycle that's not in the lookup table? The program will spit out a number, that's what. But it may not be a very good number, even if it goes out to two decimal places. I can't even remember when I might have last seen a cLogP value with a range on it, or any other suggestion that it might be a bit fuzzy.
There are more subtle problems, too - I've seen some oddities with substitutions on saturated heterocyclic rings (morpholine, etc.) that didn't quite seem to make sense. Many chemists get these numbers, look at them quizzically, and say "Hmm, I didn't know that those things sorted out like that. Live and learn!" In other words, they take the calculated values as reality. I've even had people defend these numbers by explaining to me patiently that these are, after all, calculated logP values, and the calculated log P values rank-order like so, and what exactly is my problem? And while it's hard to argue with that, we are not putting our compounds into the simulated stomachs of rationalized rodents. Real-world decisions can be made based on numbers that do not come from the real world.
Comments (38)
+ TrackBacks (0) | Category: Drug Assays | In Silico | Life in the Drug Labs
June 12, 2012
Posted by Derek
One of the major worries during a clinical trial is toxicity, naturally. There are thousands of reasons a compound might cause problem, and you can be sure that we don't have a good handle on most of them. We screen for what we know about (such as hERG channels for cardiovascular trouble), and we watch closely for signs of everything else. But when slow-building low-incidence toxicity takes your compound out late in the clinic, it's always very painful indeed.
Anything that helps to clarify that part of the business is big news, and potentially worth a lot. But advanced in clinical toxicology come on very slowly, because the only thing worse than not knowing what you'll find is thinking that you know and being wrong. A new paper in Nature highlights just this problem. The authors have a structural-similarity algorithm to try to test new compounds against known toxicities in previously tested compounds, which (as you can imagine) is an approach that's been tried in many different forms over the years. So how does this one fare?
To test their computational approach, Lounkine et al. used it to estimate the binding affinities of a comprehensive set of 656 approved drugs for 73 biological targets. They identified 1,644 possible drug–target interactions, of which 403 were already recorded in ChEMBL, a publicly available database of biologically active compounds. However, because the authors had used this database as a training set for their model, these predictions were not really indicative of the model's effectiveness, and so were not considered further.
A further 348 of the remaining 1,241 predictions were found in other databases (which the authors hadn't used as training sets), leaving 893 predictions, 694 of which were then tested experimentally. The authors found that 151 of these predicted drug–target interactions were genuine. So, of the 1,241 predictions not in ChEMBL, 499 were true; 543 were false; and 199 remain to be tested. Many of the newly discovered drug–target interactions would not have been predicted using conventional computational methods that calculate the strength of drug–target binding interactions based on the structures of the ligand and of the target's binding site.
Now, some of their predictions have turned out to be surprising and accurate. Their technique identified chlorotrianisene, for example, as a COX-1 inhibitor, and tests show that it seems to be, which wasn't known at all. The classic antihistamine diphenhydramine turns out to be active at the serotonin transporter. It's also interesting to see what known drugs light up the side effect assays the worst. Looking at their figures, it would seem that the topical antiseptic chlorhexidine (a membrane disruptor) is active all over the place. Another guanidine-containing compound, tegaserod, is also high up the list. Other promiscuous compounds are the old antipsychotic fluspirilene and the semisynthetic antibiotic rifaximin. (That last one illustrates one of the problems with this approach, which the authors take care to point out: toxicity depends on exposure. The dose makes the poison, and all that. Rifaximin is very poorly absorbed, and it would take very unusual dosing, like with a power drill, to get it to hit targets in a place like the central nervous system, even if this technique flags them).
The biggest problem with this whole approach is also highlighted by the authors, to their credit. You can see from those figures above that about half of the potentially toxic interactions it finds aren't real, and you can be sure that there are plenty of false negatives, too. So this is nowhere near ready to replace real-world testing; nothing is. But where it could be useful is in pointing out things to test with real-world assays, activities that you probably hadn't considered at all.
But the downside of that is that you could end up chasing meaningless stuff that does nothing but put the fear into you and delays your compound's development, too. That split, "stupid delay versus crucial red flag", is at the heart of clinical toxicology, and is the reason it's so hard to make solid progress in this area. So much is riding on these decisions: you could walk away from a compound, never developing one that would go on to clear billions of dollars and help untold numbers of patients. Or you could green-light something that would go on to chew up hundreds of millions of dollars of development costs (and even more in opportunity costs, considering what you could have been working on instead), or even worse, one that makes it onto the market and has to be withdrawn in a blizzard of lawsuits. It brings on a cautious attitude.
Comments (21)
+ TrackBacks (0) | Category: Drug Development | In Silico | Toxicology
April 4, 2012
Posted by Derek
Now here's something that might be about to remake the economy, or (on the other robotic hand) it might not be ready to just yet. And it might be able to help us out in drug R&D, or it might turn out to be mostly beside the point. What the heck am I talking about, you ask? The so-called "Artificial Intelligence Economy". As Adam Ozimek says, things are looking a little more futuristic lately.
He's talking about things like driverless cars and quadrotors, and Tyler Cowen adds the examples of things like Apple's Siri and IBM's Watson, as part of a wider point about American exports:
First, artificial intelligence and computing power are the future, or even the present, for much of manufacturing. It’s not just the robots; look at the hundreds of computers and software-driven devices embedded in a new car. Factory floors these days are nearly empty of people because software-driven machines are doing most of the work. The factory has been reinvented as a quiet place. There is now a joke that “a modern textile mill employs only a man and a dog—the man to feed the dog, and the dog to keep the man away from the machines.”
The next steps in the artificial intelligence revolution, as manifested most publicly through systems like Deep Blue, Watson and Siri, will revolutionize production in one sector after another. Computing power solves more problems each year, including manufacturing problems.
Two MIT professors have written a book called Race Against the Machine about all this, and it appears to be sort of a response to Cowen's earlier book The Great Stagnation . (Here's an article of theirs in The Atlantic making their case).
One of the export-economy factors that it (and Cowen) bring up is that automation makes a country's wages (and labor costs in general) less of a factor in exports, once you get past the capital expenditure. And as the size of that expenditure comes down, it becomes easier to make that leap. One thing that means, of course, is that less-skilled workers find it harder to fit in. Here's another Atlantic article, from the print magazine, which looked at an auto-parts manufacturer with a factory in South Carolina (the whole thing is well worth reading):
Before the rise of computer-run machines, factories needed people at every step of production, from the most routine to the most complex. The Gildemeister (machine), for example, automatically performs a series of operations that previously would have required several machines—each with its own operator. It’s relatively easy to train a newcomer to run a simple, single-step machine. Newcomers with no training could start out working the simplest and then gradually learn others. Eventually, with that on-the-job training, some workers could become higher-paid supervisors, overseeing the entire operation. This kind of knowledge could be acquired only on the job; few people went to school to learn how to work in a factory.
Today, the Gildemeisters and their ilk eliminate the need for many of those machines and, therefore, the workers who ran them. Skilled workers now are required only to do what computers can’t do (at least not yet): use their human judgment.
But as that article shows, more than half the workers in that particular factory are, in fact, rather unskilled, and they make a lot more than their Chinese counterparts do. What keeps them employed? That calculation on what it would take to replace them with a machine. The article focuses on one of those workers in particular, named Maddie:
It feels cruel to point out all the Level-2 concepts Maddie doesn’t know, although Maddie is quite open about these shortcomings. She doesn’t know the computer-programming language that runs the machines she operates; in fact, she was surprised to learn they are run by a specialized computer language. She doesn’t know trigonometry or calculus, and she’s never studied the properties of cutting tools or metals. She doesn’t know how to maintain a tolerance of 0.25 microns, or what tolerance means in this context, or what a micron is.
Tony explains that Maddie has a job for two reasons. First, when it comes to making fuel injectors, the company saves money and minimizes product damage by having both the precision and non-precision work done in the same place. Even if Mexican or Chinese workers could do Maddie’s job more cheaply, shipping fragile, half-finished parts to another country for processing would make no sense. Second, Maddie is cheaper than a machine. It would be easy to buy a robotic arm that could take injector bodies and caps from a tray and place them precisely in a laser welder. Yet Standard would have to invest about $100,000 on the arm and a conveyance machine to bring parts to the welder and send them on to the next station. As is common in factories, Standard invests only in machinery that will earn back its cost within two years. For Tony, it’s simple: Maddie makes less in two years than the machine would cost, so her job is safe—for now. If the robotic machines become a little cheaper, or if demand for fuel injectors goes up and Standard starts running three shifts, then investing in those robots might make sense.
At this point, some similarities to the drug discovery business will be occurring to readers of this blog, along with some differences. The automation angle isn't as important, or not yet. While pharma most definitely has a manufacturing component (and how), the research end of the business doesn't resemble it very much, despite numerous attempts by earnest consultants and managers to make it so. From an auto-parts standpoint, there's little or no standardization at all in drug R&D. Every new drug is like a completely new part that no one's ever built before; we're not turning out fuel injectors or alternators. Everyone knows how a car works. Making a fundamental change in that plan is a monumental challenge, so the auto-parts business is mostly about making small variations on known components to the standards of a given customer. But in pharma - discovery pharma, not the generic companies - we're wrenching new stuff right out of thin air, or trying to.
So you'd think that we wouldn't be feeling the low-wage competitive pressure so much, but as the last ten years have shown, we certainly are. Outsourcing has come up many a time around here, and the very fact that it exists shows that not all of drug research is quite as bespoke as we might think. (Remember, the first wave of outsourcing, which is still very much a part of the business, was the move to send the routine methyl-ethyl-butyl-futile analoging out somewhere cheaper). And this takes us, eventually, to the Pfizer-style split between drug designers (high-wage folks over here) and the drug synthesizers (low-wage folks over there). Unfortunately, I think that you have to go the full reducio ad absurdum route to get that far, but Pfizer's going to find out for us if that's an accurate reading.
What these economists are also talking about is, I'd say, the next step beyond Moore's Law: once we have all this processing power, how do we use it? The first wave of computation-driven change happened because of the easy answers to that question: we had a lot of number-crunching that was being done by hand, or very slowly by some route, and we now had machines that could do what we wanted to do more quickly. This newer wave, if wave it is, will be driven more by software taking advantage of the hardware power that we've been able to produce.
The first wave didn't revolutionize drug discovery in the way that some people were hoping for. Sheer brute force computational ability is of limited use in drug discovery, unfortunately, but that's not always going to be the case, especially as we slowly learn how to apply it. If we really are starting to get better at computational pattern recognition and decision-making algorithms, where could that have an impact?
It's important to avoid what I've termed the "Andy Grove fallacy" in thinking about all this. I think that it is a result of applying first-computational-wave thinking too indiscriminately to drug discovery, which means treating it too much like a well-worked-out human-designed engineering process. Which it certainly isn't. But this second-wave stuff might be more useful.
I can think of a few areas: in early drug discovery, we could use help teasing patterns out of large piles of structure-activity relationship data. I know that there are (and have been) several attempts at doing this, but it's going to be interesting to see if we can do it better. I would love to be able to dump a big pile of structures and assay data points into a program and have it say the equivalent of "Hey, it looks like an electron-withdrawing group in the piperidine series might be really good, because of its conformational similarity to the initial lead series, but no one's ever gotten back around to making one of those because everyone got side-tracked by the potency of the chiral amides".
Software that chews through stacks of PK and metabolic stability data would be worth having, too, because there sure is a lot of it. There are correlations in there that we really need to know about, that could have direct relevance to clinical trials, but I worry that we're still missing some of them. And clinical trial data itself is the most obvious place for software that can dig through huge piles of numbers, because those are the biggest we've got. From my perspective, though, it's almost too late for insights at that point; you've already been spending the big money just to get the numbers themselves. But insights into human toxicology from all that clinical data, that stuff could be gold. I worry that it's been like the concentration of gold in seawater, though: really there, but not practical to extract. Could we change that?
All this makes me actually a bit hopeful about experiments like this one that I described here recently. Our ignorance about medicine and human biochemistry is truly spectacular, and we need all the help we can get in understanding it. There have to be a lot of important things out there that we just don't understand, or haven't even realized the existence of. That lack of knowledge is what gives me hope, actually. If we'd already learned what there is to know about discovering drugs, and were already doing the best job that could be done, well, we'd be in a hell of a fix, wouldn't we? But we don't know much, we're not doing it as well as we could, and that provides us with a possible way out of the fix we're in.
So I want to see as much progress as possible in the current pattern-recognition and data-correlation driven artificial intelligence field. We discovery scientists are not going to automate ourselves out of business so quickly as factory workers, because our work is still so hypothesis-driven and hard to define. (For a dissenting view, with relevance to this whole discussion, see here). It's the expense of applying the scientific method to human health that's squeezing us all, instead, and if there's some help available in that department, then let's have it as soon as possible.
Comments (32)
+ TrackBacks (0) | Category: Drug Assays | Drug Development | Drug Industry History | In Silico | Pharmacokinetics | Toxicology
February 21, 2012
Posted by Derek
Here's a huge review that goes over most everything you may have wanted to know about what's called "rational drug design". The authors are especially addressing selectivity, but that's a broad enough topic to cover all the important features. (If you can't access the paper, here's a key graphic from it).
"Rational", it should be understood, generally tends to mean "computationally modeled" in the world of drug discovery. And that's certainly how this review is pitched. I'm of two minds - at least - about the whole area (a personal bias that has made for some lively discussions over the years). Some of those discussions have taken place between my own ears as well, because I'm still not sure that all my opinions about computational drug design are self-consistent.
On the one hand, drug potency is a physical act which is mediated by physical laws. Computing the change in free energy during such a process should be feasible. But it turns out to be rather difficult - proteins flex and bonds rotate, water molecules assist and interfere, electrostatic charges help and hinder, hydrogen bonds are vital (and hard to model), and a dozen other sorts of interactions between clouds of electrons weigh in as well. Never forget, too, that free energy changes have an entropy component, and that's not trivial to model, either. I keep wondering if the error bars of the various assumptions and approximations don't end up swamping out the small changes that we're interested in predicting.
But, on that other hand, there are certainly cases where modeling has helped out a great deal. A cynic would say that we've been sure to hear about those, while the cases where it had no impact at all (or did actual harm) don't make the journals very often. It can't be denied, though, that modeling really has been (at times) the tool for the job. It would be interesting to know if the frequency of that happening has been increasing over time, as our tools get better.
Because on the third hand, it's been a poor bet to go against the relentless computational tide over the last few decades. You'd have to think that sheer computing power will end up making molecular modeling ever more capable and useful, as we learn more about what we're doing. Mind you, there were people back in the mid-1980s who thought we'd already reached that point. I'm not saying that they were the best-informed people at that time, but they certainly did exist. I wonder sometimes what it would have been like, to show people in 1985 what the state of rational drug design would be like in 2012. Would they be excited, or vaguely disappointed?
And then there's that word "rational". I think that its adoption might have been the best advertising that the field's ever achieved, because it makes everything else seem irrational (or at least arational) by default. I mean, do you just wanna make compounds, or do you want to think about what you're doing? I also wonder what might have changed if that phrase had never been adopted - perhaps expectations wouldn't have gotten out of hand in the computational field's early days, but it might not have received the attention (and money) that it did, either. . .
Comments (35)
+ TrackBacks (0) | Category: In Silico
January 26, 2012
Posted by Derek
There's a new paper out in Nature Chemistry called "Quantifying the Chemical Beauty of Drugs". The authors are proposing a new "desirability score" for chemical structures in drug discovery, one that's an amalgam of physical and structural scores. To their credit, they didn't decide up front which of these things should be the miost important. Rather, they took eight properties over 770 well-known oral drugs, and set about figuring how much to weight each of them. (This was done, for the info-geeks among the crowd, by calculating the Shannon entropy for each possibility to maximize the information contained in the final model). Interestingly, this approach tended to give zero weight to the number of hydrogen-bond acceptors and to the polar surface area, which suggests that those two measurements are already subsumed in the other factors.
And that's all fine, but what does the result give us? Or, more accurately, what does it give us that we haven't had before? After all, there have been a number of such compound-rating schemes proposed before (and the authors, again to their credit, compare their new proposal with the others head-to-head). But I don't see any great advantage. The Lipinski "Rule of 5" is a pretty simple metric - too simple for many tastes - and what this gives you is a Rule of 5 with both categories smeared out towards each other to give some continuous overlap. (See the figure below, which is taken from the paper). That's certainly more in line with the real world, but in that real world, will people be willing to make decisions based on this method, or not?

The authors go for a bigger splash with the title of the paper, which refers to an experiment they tried. They had chemists across AstraZeneca's organization assess some 17,000 compounds (200 or so for each) with a "Yes/No" answer to "Would you undertake chemistry on this compound if it were a hit?" Only about 30% of the list got a "Yes" vote, and the reasons for rejecting the others were mostly "Too complex", followed closely by "Too simple". (That last one really makes me wonder - doesn't AZ have a big fragment-based drug design effort?) Note also that this sort of experiment has been done before.
Applying their model, the mean score for the "Yes" compounds was 0.67 (s.d.0.16), and the mean score for the "No" compounds was 0.49 (s.d. 0.23, which they say was statistically significant, although that must have been a close call. Overall, I wouldn't say that this test has an especially strong correlation with medicinal chemists' ideas of structural attractiveness, but then, I'm not so sure of the usefulness of those ideas to start with. I think that the two ends of the scale are hard to argue with, but there's a great mass of compounds in the middle that people decide that they like or don't like, without being able to back up those statements with much data. (I'm as guilty as anyone here).
The last part of the paper tries to extend the model from hit compounds to the targets that they bind to - a druggability assessment. The authors looked through the ChEMBL database, and ranked the various target by the scores of the ligands that are associated with them. They found that their mean ligand score for all the targets in there is 0.478. For the targets of approved drugs, it's 0.492, and for the orally active ones it's 0.539 - so there seems to be a trend, although if those differences reached statistical significance, it isn't stated in the paper.
So overall, I find nothing really wrong with this paper, but nothing spectacularly right with it, either. I'd be interested in hearing other calls on it as it gets out into the community. . .
Comments (22)
+ TrackBacks (0) | Category: Drug Development | Drug Industry History | In Silico | Life in the Drug Labs
January 9, 2012
Posted by Derek
For a look into a possible drug-discovery future (from the computational optimist viewpoint), you might want to check out a brief bit of science fiction, "Alpha Shock", in the Journal of Computer-Aided Molecular Design. Some excerpts to give you the general idea:
". . .Of course, the compounds were of little value if they couldn’t be formulated. Sanjay was pressed for time, and nanobot development still took several weeks, so he had to go “old school.” Sanjay accessed World Crystallography Repository’s (WCR) formulation suite and entered the 2D structures of his compounds. The system linked to the Amazon Hyper-Cloud and initiated a series of quantum chemical calculations to develop a custom force field for the solid phase simulations. Unfortunately the preliminary results were disappointing, even after more than 100 million combinations of excipients, particle sizes, focusing tails, and polymorphs had been analyzed in detail. He would run a more complete search overnight, but the chances were that the 10-min simulation was telling him what he needed to know: don’t expect these exact compounds to be quite right. . .
. . .“In fact,” Dmitri continued, “I think the best tactic is to turn down the interaction of this transcription factor”—a protein popped out of one node on the map—“with that protein”—another protein materialized—“and this stretch of DNA.” A 3D model of the complex assembled in front of him, slowly rotating, with the most likely binding sites and points of intervention highlighted. “Of course, you only want to disrupt this interaction in the hippocampus, and only when D7 receptor functioning is high.” The relevant pathway maps showed the effects of the blockage on downstream signaling. “Oh, and naturally you also want to turn down oxphos in the mitochondria. So we need either a single molecule that can do both things, or a two-drug combo.”
The overall impression is a bit like Charles Stross, in its deliberate you-haven't-extrapolated-wildly-enough approach. But Stross doesn't put in as many computational chemistry inside jokes, which is probably better for his sales. My first impulse is the same one I have to, say, Ray Kurzweil, that all this stuff may (in fact probably is) on its way, but not by the dates stated. That position allows me to take flak from both sides, which must be some sort of feature that I value.
Comments (16)
+ TrackBacks (0) | Category: In Silico
November 7, 2011
Posted by Derek
An e-mail correspondent and I were discussing this question, and I thought it would be an interesting one for everyone. He's a computational guy, and he's been wondering where the best use of computation/modeling effort in drug research might be. The obvious place to apply it is in lead generation and SAR development - but is that the best place? Is it the rate-limiting step enough of the time?
Problem is, the things that are often limiting steps are not as amenable to modeling. These are things like toxicology, target selection, and the like, and I'm not sure what they're susceptible to, except that simulation is probably not the answer. Or not yet, anyway. So what's the sweet spot, the place that maximizes importance and feasibility?
Update: an early vote for clinical trial design, which is a strong contender. Can't say that that doesn't get right to the hard part. . .
Comments (38)
+ TrackBacks (0) | Category: In Silico
September 20, 2011
Posted by Derek
I wrote last year about Foldit, a collaborative effort to work on protein structure problems that's been structured as an open-access game. Now the team is back with another report on how the project is going, and it's interesting stuff. The headlines have generally taken the "Computer Gamers Solve Incredible Protein Problem That Baffled Scientists!" line, but that's not exactly the full story.
The Foldit collaboration participated in the latest iteration of a regular protein-structure prediction challenge, CASP9. And their results varied - in the category of proteins with known structural homologs, for example, they didn't perform all that well. The players, it turned out, sort of over-worked the structures, and made a lot of unnecessary changes to the peripheral parts of the proteins. Another category took on proteins that have no identified structural homologs, a much harder problem. But that had its problems, too, which illustrate both the difficulties of the Foldit approach and protein modeling in general:
For prediction problems for which there were no identifiable homologous protein structures—the CASP9 Free Modeling category—Foldit players were given the five Rosetta Server CASP9 submissions (which were publicly available to other prediction groups) as starting points, along with the Alignment Tool. . .In this Free Modeling category, some of the shortcomings of the Foldit predictions became clear. The main problem was a lack of diversity in the conformational space explored by Foldit players because the starting models were already minimized with the same Rosetta energy function used by Foldit. This made it very difficult for Foldit players to get out of these local minima, and the only way for the players to improve their Foldit scores was to make very small changes ('tunneling' to the nearest local minimum) to the starting structures. However, this tunneling did lead to one of the most spectacular successes in the CASP9 experiment.
. . .the Rosetta Server, which carried out a large-scale search for the lowest-energy structure using computing power from Rosetta@home volunteers, produced a remarkably accurate model . . . However, the server ranked this model fourth out of the five submissions. The Foldit Void Crushers team correctly selected this near-native model and further improved it by accurately moving the terminal helix, producing the best model for this target of any group and one of the best overall predictions at CASP9 . . . Thus, in a situation where one model out of several is in a near-native conformation, Foldit players can recognize it and improve it to become the best model. Unfortunately for the other Free Modeling targets, there were no similarly outstanding Rosetta Server starting models, so Foldit players simply tunneled to the nearest incorrect local minima.
In the Refinement challenge, where participants take a minimized structure and try to improve its accuracy, the Foldit players had similar problems with starting from structures that had already been minimized by the same tools that they were using. Every change tended to make things look worse. The team improved their performance by reposting one of the structures as a new challenge, this time keeping the parts that were known with confidence to be near-native, while more or less randomizing the other parts to give a greater diversity to the starting points.
And those really are some of the key problems in this work. There are an awful lot of energy minima out there, and which ones you can get to depend crucially on where you start looking. In order to get to a completely different manifold of protein structures, even ones with much better energies, you may well have to go through a zone where you look like you're ruining everything. (And most of the time, you probably are ruining everything - there's no way to know if there's a safe haven on the other side or not).
But this paper also reports the results that are getting the headlines, a structure for the Mason-Pfizer monkey retroviral protease. This is an interesting protein, because although it crystallizes readily (in several different forms), and although the structures of other retroviral proteases are known, no one has been able to solve this one from the available X-ray data. The Foldit players, however, came up with several proposals that fit the data well enough for the structure to finally fall out of the diffraction data. It does have some odd features in its protein loops, different enough from the other proteases for no one to have hit on it before.
And that really is an accomplishment, and the way it was solved (with different players building on the results of others, competing to get the best optimization scores) really is the way the Foldit is supposed to work. Their less impressive performance on the CASP9 problems, though, shows that the same protein prediction difficulties apply to Foldit players as apply to the rest of the modeling field. This isn't a magic technique, and Foldit gamers are not going to rampage through the structural biology world solving all the extant problems any time soon. But it's nothing to sneeze at, either.
Comments (16)
+ TrackBacks (0) | Category: In Silico | Press Coverage
August 26, 2011
Posted by Derek
For those of you who are (or have always wanting to try being) molecular modelers, Cresset Design is holding a contest you might enjoy. They're putting up a molecule and giving out temporary licenses to their modeling software, and inviting people to come up with the closest bioisosteric match. The winner gets a free IPad2.
Of course, you're not going to be able to win by suggesting a para-fluoro group or by making a tetrazole-for-carboxylate switch. In their words:
We will use the Field alignment score for your molecule to the reference molecule as the primary judgment in designing the winner. However, molecules with high 2D similarity or high calculated logP with receive a penalty and are unlikely to win. Also entries with reasonable chemistry and good synthetic feasibility will be favoured. Feedback showing the score for your molecule and describing which properties of the molecule are being penalised will be provided on request. The winner will be the molecule that, in the opinion of the judges, represents the best design chosen from the top scoring results.
Fair enough, I'd say. I look forward to a follow-up from them at the end of the contest; I'd like to see what sort of stuff comes in.
Comments (18)
+ TrackBacks (0) | Category: In Silico
March 29, 2011
Posted by Derek
Man, am I getting all kinds of comments (here and by e-mail) about my views on modeling, QSAR, and the like. I thought it might be helpful for me to clarify my position on these things.
First off, structure. It's a valuable thing to have. My comments on the recent Nature Reviews Drug Discovery article were not meant to suggest otherwise, just to point out that the set of examples the authors picked to make this point was (in my view) flawed. It's actually surprisingly hard to come up with good comparison sets that isolate the effect of having structural information on the success of drug discovery projects. There are too many variables, and too many of them aren't independent. But just because a question (does having structural information help, overall?) is hard to answer doesn't mean that the answer is "no".
As an aside, since I've talked here about my admiration for fragment-based approaches, my own opinion should have been pretty clear already. Doing fragment-based drug discovery without good structural information looks to be very hard indeed.
Now, that said, there's structure and there's structure. Like every other tool in our kit, this one can be used well or used poorly. I think that fragment projects (to pick one example) get a lot of bang-for-the-buck out of structural data, and at the opposite end of the scale are those projects that only get good X-ray data after they've sent their compound to the clinic. No, wait, let me take that back. In those cases, the structure did no good, but it also did no harm. At the true opposite end of the scale are the projects where having structural data actually slowed things down. That's not frequent, but it does happen. Sometimes you have solid data, but for one reason or another the X-ray isn't corresponding to what's happening in real life. And sometimes this kicks in when medicinal chemists try to make too much out of less compelling structural data, just because it's all they have.
Now for in silico techniques. I have a similar attitude towards modeling of all kinds, but at one further remove than physical structure data. That is, I think it can be used well or used poorly, but I think that (for various reasons) the chances of using it poorly are somewhat increased. One reason is that modeling can be very hard to do well, naturally. And at the same time, tools with which to model conformations, docking, and so on are pretty widely available, which leads to a fair amount of work from people who really don't know what they're doing. Another reason is that the validity of any given model is of limited scope, as is the case with any mental construct that we have about what our molecules are doing, whether we used a software package or waved our hands around in the air. The software-package version of some binding model is more likely to have a wider range of usefulness than the hand-waving one, but they'll both break down at some point as you explore a range of compounds.
The key then is to figure out as quickly as possible if the project you're working on would be enhanced by modeling, or if such modeling would be merely ornamental, or even harmful. And that's not always easy to do. Any reasonable model is going to need a few iterations to get up to speed, generally requiring some specific compounds to be made by the chemists, and if you're running a project, you have to decide how much effort is worth spending to do that. You don't want to end up endlessly trying to refine the model, but at the same time, that model could turn out to be very useful after a few more turns of the crank. Which way to go? The same decisions apply, naturally, to the folks standing in front of the hoods, even without any modeling. How many more compounds are worth making in a given series? Would that effort be better used somewhere else? These calls are why we're paid the approximation of the big bucks.
So, while I don't think that modeling is an invariable boon to a project, neither do I think it's a waste of time. Sometimes it's one, and sometimes it's the other, and most of the time it's a mix of each - just like ideas at the bench. When modeling works, it can be a real help in sending the chemists down a productive path. On the other hand, you can certainly run a whole project with no modeling at all, just good old-fashioned analoging from the labs. It's the job of modelers to make the first possibility more likely and more attractive, and the job of the chemists and project managers to be open to that (and to be ready to emphasize or de-emphasize things as they develop).
This point of view seems reasonable to me (which is why I hold it!) But it also exposes me to complaints from people at both ends of the spectrum. I'm a lot more skeptical of in silico approaches than are many true believers, but I don't want to make the mistake of dismissing them outright.
Comments (15)
+ TrackBacks (0) | Category: In Silico
March 28, 2011
Posted by Derek
A friend on the computational/structural side of the business sent along this article from Nature Reviews Drug Discovery. The authors are looking through the Thomson database at drug targets that are the subject of active research in the industry, and comparing the ones that have structural information available to the ones that don't: enzyme targets (with high-resolution structures) and and GPCRs without it. They're trying to to see if structural data is worth enough to show up in the success rates (and thus the valuations) of the resulting projects.
Overall, the Thomson database has over a thousand projects in it from these two groups, a bit over 600 from the structure-enabled enzymes and just under 500 GPCR projects. What they found was that 70% of the projects in the GPCR category were listed as "suspended" or "discontinued", but only 44% of the enzyme projects were so listed. In order to correct for probability of success across different targets, the authors picked ten targets from each group that have led, overall, to similar numbers of launched drugs. Looking at the progress of the two groups, the structure-enabled projects are again lower in the "stopped" categories, with corresponding increases in discovery and the various clinical phases.
You have to go to the supplementary info for the targets themselves, but here they are: for the enzymes, it's DPP-IV, BCR-ABL, HER2 kinase, renin, Factor Xa, HDAC, HIV integrase, JAK2, Hep C protease, and cathepsin K. For the receptor projects, the list is endothelin A receptor, P2Y12, CXCR4, angiogensin II receptor, sphingosine-1-phosphate receptor, NK1, muscarinic M1, vasopressin V2, melatonin receptor, and adenosine A2A.
Looking over these, though, I think that the situation is more complicated than the authors have presented. For example, DPP-IV has good structural information now, but that's not how people got into the area. The cyanopyrrolidine class of inhibitors, which really jump-started the field, were made by analogy to a reported class of prolyl endopeptidase inhibitors (BOMCL 1996, p. 1163). Three years later, the most well-characterized Novartis compound in the series was being studied by classic enzymology techniques, because it still wasn't possible to say just how it was binding. But even more to the point, this is a well-trodden area now. Any DPP-IV project that's going on now is piggybacking not only on structural information, but on an awful lot of known SAR and toxicology.
And look at renin. That's been a target forever, structure or not. And it's safe to say that it wasn't lack of structural information that was holding the area back, nor was it the presence of it that got a compound finally through the clinic. You can say the same things about Factor Xa. The target was validated by naturally occurring peptides, and developed in various series by classical SAR. The X-ray structure of one of the first solid drug candidates in the area (rivaroxaban) bound to its target, came after the compound had been identified and the SAR had been optimized. Factor Xa efforts going on now also are standing on the shoulders of an awful lot of work.
In the case of histone deacetylase, the first launched drug in that category (SAHA, vorinostat) has already been identified before any sort of X-ray structure was available. Overall, that target is an interesting addition to the list, since there are actually a whole series of them, some of which have structural information and some of which don't. The big difficulty in that area is that we don't really know what the various roles of the different isoforms are, and thus how the profiles of different compounds might translate to the clinic, so I wouldn't say that structural data is helping with the rate-determining steps in the field.
On the receptor side, I also wouldn't say that it's lack of structural information that's necessarily holding things back in all of those cases, either. Take muscarinic M1 - muscarinic ligands have been known for a zillion years. That encompasses fairly selective antagonists, and hardly-selective-at-all agonists, so I'm not sure which class the authors intended. If they're talking about antagonists, then there are plenty already known. And if they're talking about agonists, I doubt that even detailed structural information would help, given the size of the native ligand (acetylcholine).
And the vasopressin V2 case is similar to some of the enzyme ones, in that there's already an approved drug in this category (tolvaptan), with several others in the same structural class chasing it. Then you have the adenosine A2A field, where long lists of agonists and antagonists have been found over the years, structure or not. The problem there has been finding a clinical use for them; all sorts of indications have been chased over the years, a problem that structural information would have not helped with in the least.
Now, it's true that there are projects in these categories where structure has helped out quite a bit, and it's also true that detailed GPCR structures would be welcome (and are slowly coming along, for that matter). I'm not denying either of those. But what does strike me is that there are so many confounding variables in this particular comparison, especially among the specific targets that are the subject of the article's featured graphic, that I just don't think that its conclusions follow.
Comments (32)
+ TrackBacks (0) | Category: Drug Development | Drug Industry History | In Silico
August 9, 2010
Posted by Derek
David Baker's lab at the University of Washington has been working on several approaches to protein structure problems. I mentioned Rosetta@home here, and now the team has published an interesting paper on another one of their efforts, FoldIt.
That one, instead of being a large-scale passive computation effort, is more of an active process - in fact, it's active enough that it's designed as a game:
We hypothesized that human spatial reasoning could improve both the sampling of conformational space and the determination of when to pursue suboptimal conformations if the stochastic elements of the search were replaced with human decision making while retaining the deterministic Rosetta algorithms as user tools. We developed a multiplayer online game, Foldit, with the goal of producing accurate protein structure models through gameplay. Improperly folded protein conformations are posted online as puzzles for a fixed amount of time, during which players interactively reshape them in the direction they believe will lead to the highest score (the negative of the Rosetta energy). The player’s current status is shown, along with a leader board of other players, and groups of players working together, competing in the same puzzle.
So how's it working out? Pretty well, actually. It turns out that human players are willing to do more extensive rearrangements to the protein chains in the quest for lower energies than computational algorithms are. They're also better at evaluating which positions to start from. Both of these remind me of the differences between human chess play and machine play, as I understand them, and probably for quite similar reasons. Baker's team is trying to adapt the automated software to use some of the human-style approaches, when feasible.
There are several dozen participants who clearly seem to have done better in finding low-energy structures than the rest of the crowd. Interestingly, they're mostly not employed in the field, with "Business/Financial/Legal" making up the largest self-declared group in a wide range of fairly even-distributed categories. Compared to the "everyone who's played" set, the biggest difference is that there are far fewer students in the high-end group, proportionally. That group of better problem solvers also tends to be slightly more female (although both groups are still mostly men), definitely older (that loss of students again), and less well-stocked with college graduates and PhDs. Make of that what you will.
Their conclusion is worth thinking about, too:
The solution of challenging structure prediction problems by Foldit players demonstrates the considerable potential of a hybrid human–computer optimization framework in the form of a massively multiplayer game. The approach should be readily extendable to related problems, such as protein design and other scientific domains where human three-dimensional structural problem solving can be used. Our results indicate that scientific advancement is possible if even a small fraction of the energy that goes into playing computer games can be channelled into scientific discovery.
That's crossed my mind, too. In my more pessimistic moments, I've imagined the human race gradually entertaining itself to death, or at least to stasis, as our options for doing so become more and more compelling. (Reading Infinite Jest a few years ago probably exacerbated such thinking). Perhaps this is one way out of that problem. I'm not sure that it's possible to make a game compelling enough when it's hooked up to some sort of useful gear train, but it's worth a try.
Comments (16)
+ TrackBacks (0) | Category: Biological News | In Silico | Who Discovers and Why
June 22, 2010
Posted by Derek
The folks at Cresset sent me a note about a free download of some software that they've developed for molecular fields (an approach you can read more about here). Fieldview is a free tool for trying this out yourself, and can be had here. Worth a look for the computationally curious, especially at the price. . .
Comments (11)
+ TrackBacks (0) | Category: In Silico
June 18, 2010
Posted by Derek
A reader points me to this discussion, which is trying to figure out what the most useful discovery made via bioinfomatics is so far. There's a $100 prize for the winning suggestion, just to keep the discussion moving (and no, I don't anticipate offering cash bounties around here any time soon!) The early going seems to have ended up in the "Hold it, that's not bioinformatics, is it?" ditch, but that's not a useless discussion, either.
So if you have some suggestions, hop over there and add them to the fray, or vote for the ones that you like so far. I'm racking my brain a bit myself.
Comments (11)
+ TrackBacks (0) | Category: In Silico
May 17, 2010
Posted by Derek
I'll have the opportunity to sit in on a few talks during a conference on free energy calculations in drug design. Since I'm not a computational guy myself, I'll be picking my sessions carefully, but I am interested in hearing what the state of the art is.
If we could just walk right up and calculate the free energies of binding events reliably, that would mean that the era of high throughput screening would begin to come to its end - well, in the physical world, anyway. Depending on how lengthy the computations needed to be, we could (in theory) just sit back and let the hardware hum while it ran through all the compounds we could think up - then we'd come back in on Monday and see who the winners were. Despite what some of you outside the field of medicinal chemistry might have read, we are not exactly to this point yet. That phrase "in theory" covers an awful lot of ground. But progress is apparently being made (here's a recent paper (PDF) with background).
So here's a question for the readership: what would you most want such calculations to be able to do for you? What would convince you that they're actually believable? And how close to you think that we actually are to that? Your comments will go directly to the ears of a roomful of high-powered modelers, so feel free to unload.
That thought of a roomful of computational chemists, though, reminds me inexorably of a story about Robert Oppenheimer that Freeman Dyson retells here . At a theoretical physics conference in Vancouver, the attendees were on a boat ride among the islands when the weather turned impenetrably foggy. Someone asked what the consequences for physics would be if the boat sank, and Oppenheimer instantly said "It wouldn't do any permanent good". There, that should ensure me a warm welcome at the meeting!
Comments (43)
+ TrackBacks (0) | Category: In Silico
May 10, 2010
Posted by Derek
Here's a new paper from the folks at the Burnham Institute and UCSD on a new target for vaccinia virus. They're going after a virulence factor (N1L) through computational screening, which is a challenge, since this is a protein-protein interaction.
They pulled out a number of structures, which have some modest activity in cell infection assays. In addition, they showed through calorimetry that the compounds do appear to be affecting the target protein, specifically its equilibrium between monomeric and oligomeric forms. But the structures of their best hits. . .well, here's the table. You can ignore compounds 6 and 8; they show up as cytotoxic. But the whole list is pretty ghastly, at least to my eyes.
These sorts of highly aromatic polyphenol structures have two long traditions in medicinal chemistry: showing activity in assays, for the first part, and not being realizable as actual drugs, for the second. There's no doubt that they can do a lot of things; it's just that getting them to do them in a real-world situation is not trivial. Part of the problem is specificity (and associated toxicity) and part of it is pharmacokinetics. As you'd imagine, these compounds can have rather funky clearance behavior, what with all those phenols.
So I'd regard these as proof-of-concept compounds that validate N1L as a target. I think that we'll need to wait for someone to format up an assay for high-throughput (non-virtual) screening to see if something more tractable comes up. Either that, or rework the virtual screens on the basis that we've seen enough polyphenols come up on this target already. . .
Note: readers of the paper will note that our old friend resveratrol turns up as an active compound as well. It's very much in the polyphenol tradition; make of that what you will.
Comments (25)
+ TrackBacks (0) | Category: In Silico | Infectious Diseases | Pharmacokinetics
Posted by Derek
My take on the recent news that Bill Gates has invested ten million dollars in the computational drug design company Schrödinger is here at Nature News. (They've recently made all their stories open-access, by the way, so you don't need a subscription to get the full stories).
In short, I think that patient billionaire money is just the sort of thing the field needs, because anyone with a short timeline and a need for a good return is going to have a rough time of it. . .
Comments (7)
+ TrackBacks (0) | Category: In Silico
May 4, 2010
Posted by Derek
I was talking with a colleague yesterday, and I suddenly had an insight into an opportunity in scientific publishing. We were discussing the various computational/modeling papers that you see out in the literature. Some of them are quite interesting, many are worth looking at if it's your particular field - but many others are, well, not so great. I should mention up front that the same objections apply - and how - to the non-computational literature, of course. But there are a number of second-tier (and lower) journals to soak up those sorts of papers in the other disciplines.
What surprises me is that there's no Computational Chemistry Letters or some such. Communications in Computational Chemistry? CADD Comm? This would be the dumping ground for the piles of unconvincing computer-driven stuff that gets sent around by people who have paid a bit too much attention to the sales brochures that came with their software packages.
The barriers for entry to such things have been getting lower and lower, while the real state of the art has been getting more and more complicated. That's created a gap into which too much stuff falls. Who will speak for the bottom-dwelling "We modeled it, therefore it's real" constituency? The advent of systems biology has created more opportunities than ever for these folks. Isn't it time that there was an expensive, low-impact, completely disregardable journal for them, too?
Comments (13)
+ TrackBacks (0) | Category: In Silico | The Scientific Literature
March 29, 2010
Posted by Derek
For the medicinal chemists in the audience, I wanted to strongly recommend a new paper from a group at Roche. It's a tour through the various sorts of interactions between proteins and ligands, with copious examples, and it's a very sensible look at the subject. It covers a number of topics that have been discussed here (and throughout the literature in recent years), and looks to be an excellent one-stop reference.
In fact, read the right way, it's a testament to how tricky medicinal chemistry is. Some of the topics are hydrogen bonds (and why they can be excellent keys to binding or, alternatively, of no use whatsoever), water molecules bound to proteins (and why disturbing them can account for large amounts of binding energy, or, alternatively, kill your compound's chances of ever binding at all), halogen bonds (which really do exist, although not everyone realizes that), interactions with aryl rings (some of which can be just as beneficial coming in 90 degrees to where you might imagine), and so on.
And this is just to get compounds to bind to their targets, which is the absolute first step on the road to a drug. Then you can start worrying about how to have your compounds not bind to things you don't want (many of which you probably don't even realize even are out there). And about how to get it to decent blood levels, for a decent amount of time, and into the right compartments of the body. And at that point, it's nearly time to see if it does any good for the disease you're trying to target!
Comments (5)
+ TrackBacks (0) | Category: Drug Assays | In Silico | Life in the Drug Labs
March 23, 2010
Posted by Derek
You know, you'd think that we'd understand the way things bind to proteins well enough to be able to explain why biotin sticks so very, very tightly to avidins. That's one of the most impressive binding events in all of biology, short of pushing electrons and forming a solid chemical bond - biotin's stuck in there at femtomolar levels. It's so strong and so reliable that this interaction is the basis for untold numbers of laboratory and commercial assays - just hang a biotin off one thing, expose it to something else that has an avidin (most often streptavidin) coated on it, and it'll stick, or else something is Very Wrong. So we have that all figured out.
Wrong. Turns out that there's a substantial literature given to arguing about just why this binding is so tight. One group holds out for hydrophobic interactions (which seems rather weird to me, considering that biotin's rather polar by most standards). Another group has a hydrogen-bonding explanation, which (on the surface) seems more feasible to me. Now a new paper says that the computational methods applied so far can't handle electrostatic factors well, and that those are the real story.
I'm not going to take a strong position on any of these; I'll keep my head down while the computational catapults launch at each other. But it's definitely worth noting that we apparently can't explain the strongest binding site interaction that we know of. It's the sort of thing that we'd all like to be able to generate at will in our med-chem programs, but how can we do that when we don't even know what's causing it?
Comments (12)
+ TrackBacks (0) | Category: Drug Assays | In Silico
March 22, 2010
Posted by Derek
I mentioned Benford's Law in passing in this post (while speculating on how long people report their reactions to have run when publishing their results). That's the rather odd result that many data sets don't show a random distribution of leading digits - rather, 1 is the first digit around 30% of the time, 2 leads off about 18% of the time, and so on down.
For data that come from some underlying power-law distribution, this actually makes some sense. In that case, the data points spend more time being collected in the "lag phase" when they're more likely to start with a 1, and proportionally less and less time out in the higher-number-leading areas. The law only holds up when looking at distributions that cover several orders of magnitude - but all the same, it also seems to apply to data sets where there's no obvious exponential growth driving the numbers.
Lack of adherence to Benford's Law can be acceptable as corroborative evidence of financial fraud. Now a group from Astellas reports that several data sets used in drug discovery (such as databases of water solubility values) obey the expected distribution. What's more, they're suggesting that modelers and QSAR people check their training data sets to make sure that those follow Benford's Law as well, as a way to make sure that the data have been randomly selected.
Is anyone willing to try this out on a bunch of raw clinical data to see what happens? Could this be a way to check the integrity of reported data from multiple trial centers? You'd have to pick your study set carefully - a lot of the things we look for don't cover a broad range - but it's worth thinking about. . .
Comments (9)
+ TrackBacks (0) | Category: Clinical Trials | In Silico | The Dark Side
December 10, 2009
Posted by Derek
We spend a lot of time in this business talking about molecular scaffolds - separate chemical cores that we elaborate into more advanced compounds. And there's no doubt that such things exist, but is part of the reason they exist just an outcome of the way chemical research is done? Some analysis in the past has suggested that chemical types get explored in a success-breeds-success fashion, so that the (over)representation of some scaffold might not mean that it has unique properties. It's just that it's done what's been asked of it, so people have stuck with it.
A new paper in J. Med. Chem. from a group in Bonn takes another look at this question. They're trying to see if the so-called "privileged substructures" really exist: chemotypes that have special selectivity for certain target classes. Digging through a public-domain database (BindingDB), they found about six thousand compounds with activity toward some 259 targets. Many of these compounds hit more than one target, as you'd expect, so there were about 18,000 interactions to work with.
Isolating structural scaffolds from the compound set and analyzing them for their selectivity showed some interesting trends. They divide the targets up into communities (kinases, serine proteases, and so on), and they definitely find community-selective scaffolds, which is certainly the experience of medicinal chemists. Inside these sets, various scaffolds also show tendencies for selectivity against individual members of the community. Digging through their supporting information, though, it appears that a good number of the most-selective scaffolds tend to come from the serine protease community (their number 3), with another big chunk coming from kinases (their number 1a). Strip those (and some adenosine receptor ligands and DPP inhibitors, numbers 11 and 8) out, and you've taken out all the really eye-catching selectivity numbers out of their supplementary table S5. So I'm not sure that they've identified as many hot structures as one might think.
Another problem I have, when I look at these structures, is that a great number of them look too large for any useful further development. That's just a function of the data this team had to start with, but this gets back to the question of "drug-like" versus "lead-like" structures. I have a feeling that too many of the compounds in the BindingDB set are in the former category, or even beyond, which skews things a bit. Looking at a publication on it from 2007, I get the impression that a majority of compounds in it have a molecular weight greater than 400, with a definite long tail toward the higher weights. What medicinal chemists would like, of course, is a set of smaller scaffolds that will give them a greater chance of landing in a selective chemical space that can be developed. Some of the structures in this paper qualify, but definitely not all of them. . .
Comments (6)
+ TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico
December 7, 2009
Posted by Derek
Almost all of the drugs on the market target one or more small-molecule binding sites on proteins. But there's a lot more to the world than small-molecule binding sites. Proteins spend a vast amount of time interacting with other proteins, in vital ways that we'd like to be able to affect. But those binding events tend to be across broader surfaces, rather than in well-defined binding pockets, and we medicinal chemists haven't had great success in targeting them.
There are some successful examples, with a trend towards more of them in the recent literature. Inhibitors of interactions of the oncolocy target Bcl are probably the best known, with Abbott's ABT-737 being the poster child of the whole group.
But even though things seem to be picking up in this area, there's still a very long way to go, considering the number of possible useful interactions we could be targeting. And for every successful molecule that gets published, there are surely an iceberg of failed attempts that never make the literature. What's holding us back?
A new article in Drug Discovery Today suggests, as others have, that our compound libraries aren't optimized for finding hits in such assays. Given that the molecular weights of the compounds that are known to work tend toward the high side, that may well be true - but, of course, since the amount of chemical diversity up in those weight ranges is ridiculously huge, we're not going to be able to fix the situation through brute-force expansion of our screening libraries. (We'll table, for now, the topic of the later success rate of such whopper molecules).
Some recent work has suggested that there might be overall molecular shapes that are found more often in protein-protein inhibitors, but I'm not sure if everyone buys into this theory or not. This latest paper does a similar analysis, using 66 structurally diverse protein-protein inhibitors (PPIs) from the literature compared to a larger set (557 compounds) of traditional drug molecules. The PPIs tend to be larger and greasier, as feared>. They tried some decision-tree analysis to see what discriminated the two data sets, and found a shape description and another one that correlated more with aromatic ring/multiple-bond count. Overall, the decision tree stuff didn't shake things down as well as it does with data sets for more traditional target classes, which doesn't come as a surprise, either.
So the big questions are still out there: can we go after protein-protein targets with reasonably-sized molecules, or are they going to have to be big and ugly? And in either case, are there structures that have a better chance of giving us a lead series? If that's true, is part of the problem that we don't tend to have such things around already? If I knew the answers to these questions, I'd be out there making the drugs, to be honest. . .
Comments (14)
+ TrackBacks (0) | Category: Drug Assays | Drug Industry History | In Silico
November 17, 2009
Posted by Derek
There's a new paper out in Nature that presents an intriguing way to look for off-target effects of drug candidates. The authors (a large multi-center team) looked at a large number of known drugs (or well-characterized clinical candidates) and their activity profiles. They then characterized the protein targets by the similarities of the molecules that were known to bind to them.
That gave a large number of possible combinations - nearly a million, actually, and in most cases, no correlations showed up. But in about 7,000 examples, a drug matched some other ligand set to an interesting degree. On closer inspection, some of these off-target effects turned out to be already known (but had not been picked up during their initial searching using the MDDR database). Many others turned out to be trivial variations on other known structures.
But what was left over was a set of 3,832 predictions of meaningful off-target binding events. The authors took 184 of these out to review them carefully and see how well they held up. 42 of these turned out to be already confirmed in the primary literature, although not reported in any of the databases they'd used to construct the system - that result alone is enough to make one think that they might be on the right track here.
Of the remaining 142 correlations, 30 were experimentally feasible to check directly. Of these, 23 came back with inhibition constants less than 15 micromolar - not incredibly potent, but something to think about, and a lot better hit rate than one would expect by chance. Some of the hits were quite striking - for example, an old alpha-blocker, indoramin, showed a strong association for dopamine receptors, and turned out to be an 18 nM ligand for D4, which is better than it does on the alpha receptors themselves. In general, they uncovered a lot of new GPCR activities for older CNS drugs, which doesn't surprise me, given the polypharmacy that's often seen in that area.
But they found four examples of compounds that jumped into completely new target categories. Rescriptor (delavirdine), a reverse transcriptase inhibitor used against HIV, showed a strong score against histamine subtypes, and turned out to bind H4 at about five micromolar. That may not sound like much, but the drug's blood levels make that a realistic level to think about, and its side effects include a skin rash that's just what you might expect from such off-target binding.
There are some limitations. To their credit, the authors mention in detail a number of false positives that their method generated - equally compelling predictions of activities that just aren't there. This doesn't surprise me much - compounds can look quite similar to existing classes and not share their activity. I'm actually a bit surprised that their methods works as well as it does, and look forward to seeing refined versions of it.
To my mind, this would be an effort well worth some collaborative support by all the large drug companies. A better off-target prediction tool would be worth a great deal to the whole industry, and we might be able to provide a lot more useful data to refine the models used. Anyone want to step up?
Update: be sure to check out the comments section for other examples in this field, and a lively debate about which methods might work best. . .
Comments (20)
+ TrackBacks (0) | Category: Drug Assays | In Silico | Toxicology
Posted by Derek
I've been remiss in not mentioning this, but I just found out recently that Warren DeLano (the man behind the excellent open-source PyMOL program) passed away suddenly earlier this month. He was 37 - another unfortunate loss of a scientist who had done a lot of fine work and was clearly on the way to doing much more.
I notice that as I write this I have a PyMOL window open on my desktop; I use the program regularly to look at protein structures. Si monumentum requiris, circumspice.
Comments (8)
+ TrackBacks (0) | Category: Current Events | In Silico
July 16, 2009
Posted by Derek
I had a printout of the structure of maitotoxin on my desk the other day, mostly as a joke to alarm anyone who came into my office. "Yep, here's the best hit from the latest screen. . .I hear that you're on the list to run the chemistry end. . .what's that you say?"

This is, needless to say, one of the largest and scariest marine natural product structures ever determined (and that determination has been no stroll past the dessert table, either).
But that' hasn't stopped people from messing around with it. And there's much speculation that other people are strongly considering messing around with it, too - you synthetic chemists can guess the sorts of people that this might be, and their names, and what it might be like to sit through the seminars that result, and so on.
I fear that a total synthesis of maitotoxin would be largely a waste of time, but I'm willing to hear arguments against that position. Just looking at it, though, inspires thought. This eldrich beastie has 98 chiral centers. So let's do some math. If you're interested in the SAR of such molecules, you have your choice of (two to the 98th) possible isomers, which comes out to a bit over (3 times ten to the 29th) compounds. This is. . .a pretty large number. If you're looking for 10mg of each isomer to add to your screening collection (no sense in going back and making them again), then you're looking at a good bit over half the mass of the entire Earth. And that's just in sheer compounds; we're not counting the weight of vials, which will, I'd say, safely move you up toward the planetary weight of a low-end gas giant. We will ignore shelving considerations in the interest of time.
Recall that yesterday's post gave a number of about 27 million compounds below 11 heavy atoms. You could toss 27 million compounds into a collection of ten to the 29th and never see them again, of course. But that brings up two points: one, that the small-compound estimate ignores stereochemistry, and we've been getting those insane maitotoxin numbers by considering nothing but. The thing is, with only 11 non-hydrogen atoms, there aren't quite as many chances for things to get out of control. The GDB compound set goes up only to 110 million or so if you consider stereoisomers, which actually isn't nearly as much as I'd thought.
But the second point is that this shows you why the Berne group stopped at 11 heavy atoms, because the problem becomes intractable really fast as you go higher. It's worth remembering that the GDB people actually threw out over 98% of their scaffolds because they represented potential ring structures that are too strained to be very stable. And they only considered C, N, O and F as heavy atoms (even adding sulfur was considered too much to deal with, computationally). Then they tossed out another 98 or 99% of the structures that emerged from that enumeration as reactive and/or unstable. Relax your standards a bit, allow another atom or two, bump up the molecular weight, do any of those and you're going to exceed anyone's computational capacity. Update: the Berne group has just taken a crack at it, and managed a reasonable set up to 13 heavy atoms, with various simplifying assumptions to ease the burden. If you want to mess around with it, it's here, free of charge).
No, there are a lot of compounds out there. And if you look at the really big ones - and maitotoxin is nothing if not a really big one - there are whole universes contained just in each of them. (Bonus points for guessing the source of the name of the post, by the way).
Comments (25)
+ TrackBacks (0) | Category: Chemical News | In Silico
July 15, 2009
Posted by Derek
I've been meaning to get around to a very interesting paper from the Shoichet group that came out a month or so ago in Nature Chemical Biology. Today's the day! It examines the content of screening libraries and compares them to what natural products generally look like, and they turn up some surprising things along the way. The main question they're trying to answer is: given the huge numbers of possible compounds, and the relatively tiny fraction of those we can screen, why does high-throughput screening even work at all?
The first data set they consider is the Generated Database (GDB), a calculated set of all the reasonable structures with 11 or fewer nonhydrogen atoms, which grew out of this work. Neglecting stereochemistry, that gives you between 26 and 27 million compounds. Once you're past the assumptions of the enumeration (which certainly seem defensible - no multiheteroatom single-bond chains, no gem-diols, no acid chlorides, etc.), then there are no human bias involved: that's the list.
The second list is everything from the Dictionary of Natural Products and all the metabolites and natural products from the Kyoto Encyclopedia of Genes and Genomes. That gives you 140,000+ compounds. And the final list is the ZINC database of over 9 million commercially available compounds, which (as they point out) is a pretty good proxy for a lot of screening collections as well.
One rather disturbing statistic comes out early when you start looking at overlaps between these data sets. For example, how many of the possible GDB structures are commercially available? The answer: 25,810 of them - in other words, you can only buy fewer than 0.01% of the possible compounds with 11 heavy atoms or below, making the "purchasable GDB" a paltry list indeed.
Now, what happens when you compare that list of natural products to these other data sets? Well, for one thing, the purchasable part of the GDB turns out to be much more similar to the natural product list than the full set. Everything in the GDB has at least 20% Tanimoto similarity to at least one compound in the natural products set, not that 20% means much of anything in that scoring system. But only 1% of the GDB has a 40% Tanimoto similarity, and less than 0.005% has an 80% Tanimoto similarity. That's a pretty steep dropoff!
But the "purchasable GDB" holds up much better. 10% of that list has 100% Tanimoto similarity (that is, 10% of the purchasable compounds are natural products themselves). The authors also compare individual commercial screening collections. If you're interested, ChemBridge and Asinex are the least natural-product-rich (about 5% of their collections), whereas IBS and Otava are the most (about 10%).
So one answer to "why does HTS ever work for anything" is that compound collections seem to be biased toward natural-product type structures, which we can reasonably assume have generally evolved to have some sort of biological activity. It would be most interesting to see the results of such an analysis run from inside several drug companies against their own compound collections. My guess is that the natural product similarities would be even higher than the "purchasable GDB" set's, because drug company collections have been deliberately stocked with structural series that have shown activity in one project or another.
That's certainly looking at things from a different perspective, because you can also hear a lot of talk about how our compound files are too ugly - too flat, too hydrophobic, not natural-product-like enough. These viewpoints aren't contradictory, though - if Shoichet is right, then improving those similarities would indeed lead to higher hit rates. Compared to everything else, we're already at the top of the similarity list, but in absolute terms there's still a lot of room for improvement.
So how would one go about changing this, assuming that one buys into this set of assumptions? The authors have searched through the various databases for ring structures, taking those as a good proxy for structural scaffolds. As it turns out 83% of the ring scaffolds among the natural products are unrepresented among the commercially available molecules - a result that I assume that Asinex, ChemBridge, Life Chemicals, Otava, Bionet and their ilk are noting with great interest. In fact, the authors go even further in pointing out opportunities, with a table of rings from this group that closely resemble known drug-like ring systems.
But wait a minute. . .when you look at those scaffolds, a number of them turn out to be rather, well, homely. I'd be worried about elimination to form a Michael acceptor in compound 19, for example. I'm not crazy about the N,S acetal in 21 or the overall stability of the acetals in 15, 17 and 31. The propiolactone in 23 is surely reactive, as is the quinone in 25, and I'd be very surprised if that's not what they owe their biological activities to. And so on.

All that said, there are still some structures in there that I'd be willing to check out, and there must be more of them in that 83%. No doubt a number of the rings that do sneak into the commercial list are not very well elaborated, either. I think that there is a real commercial opportunity here. A company could do quite well for itself by promoting its compound collection as being more natural-product similar than the competition, with tractable molecules, and a huge number of them unrepresented in any other catalog.
Now all you'd have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren't easy to make, or to work with. And as it so happens, there are quite a few good ones available these days. Anyone want to take this business model to heart?
Comments (13)
+ TrackBacks (0) | Category: Drug Assays | Drug Industry History | In Silico
July 7, 2009
Posted by Derek
While we're on the topic of hydrogen bonds and computations, there's a paper coming out in JACS that attempts to answer an old question. Why, exactly, does every living thing on earth use so much ribose? It's the absolute, unchanging carbohydrate backbone to all the RNA on Earth, and like the other things in this category (why L amino acids instead of D?), it's attracted a lot of speculation. If you subscribe to the RNA-first hypothesis of the origins of life, then the question becomes even more pressing.
A few years ago, it was found that ribose, all by itself, diffuses through membranes faster than the other pentose sugars. This results holds up for several kinds of lipid bilayers, suggesting that it's not some property of the membrane itself that's at work. So what about the ability of the sugar molecules to escape from water and into the lipid layers?
Well, they don't differ much in logP, that's for sure, as the original authors point out. This latest paper finds, though, by using molecular dynamic simulations that there is something odd about ribose. In nonpolar environments, its hydroxy groups form a chain of hydrogen-bond-like interactions, particularly notable when it's in the beta-pyranose form. These aren't a factor in aqueous solution, and the other pentoses don't seem to pick up as much stabilization under hydrophobic conditions, either.
So ribose is happier inside the lipid layer than the other sugars, and thus pays less of a price for leaving the aqueous environment, and (both in simulation and experimentally) diffuses across membranes ten times as quickly as its closely related carboyhydate kin. (Try saying that five times fast!) This, as both the original Salk paper and this latest one note, leads to an interesting speculation on why ribose was preferred in the origins of life: it got there firstest with the mostest. (That's a popular misquote of Nathan Bedford Forrest's doctrine of warfare, and if he's ever come up before in a discussion of ribose solvation, I'd like to hear about it).
Comments (9)
+ TrackBacks (0) | Category: Biological News | In Silico | Life As We (Don't) Know It
Posted by Derek
Hydrogen bonds are important. There, that should be an sweepingly obvious enough statement to get things started. But they really are - hydrogen bonding accounts for the weird properties of water, for one thing, and it's those weird properties that are keeping us alive. And leaving out the water (a mighty big step), internal hydrogen bonding is still absolutely essential to the structure of large biological molecules - proteins, complex carbohydrates, DNA and RNA, and so on.
But we don't understand hydrogen bonds all that well, dang it all. It's not like we're totally ignorant of them, for sure, but there are a lot of important things that we don't have a good handle on. One of these may just have been illustrated by this paper in Nature Structural and Molecular Biology by a group from Scripps. They've been working on understanding the fact that all hydrogen bonds are not created equal. By carefully going through a lot of protein mutants, they have evidence for the idea that H-bonds that form in polar environments are weaker than ones that form in nonpolar ones.
That makes sense, on the face of it. One way to think of it is that a hydrogen bond in a locally hydrophobic area is the only game in town, and counts for more. But this work claims that such bonds can be worth as much as 1.2 kcal/mole more than the wimpier ones, which is rather a lot. Those kinds of energy differences could add up very quickly when you're trying to understand why a protein folds up the way it does, or why one small molecule binds more tightly than another one.
Do we take such things into account when we're trying to compute these energies? Generally speaking, no, we do not - well, not yet. If these folks are right, though, we'd better start.
Update: note that the paper itself doesn't suggest that this is a new idea - they reference work going back to 1963 (!) on the topic. What they're trying to do is put more real numbers into the mix. And that's what my last paragraph above is trying to state (and perhaps overstate): it's difficult to account for these thing computationally, since they vary so widely, and since we don't have that good a computational handle on hydrogen bonds in general. The more real world data that can be fed back into the models, the better.
Comments (7)
+ TrackBacks (0) | Category: In Silico
July 2, 2009
Posted by Derek
Moore's Law: number of semiconductors on a chip doubling every 18 months or so, etc. Everyone's heard of it. But can we agree that anyone who uses it as a metaphor or perscription for drug research doesn't know what they're talking about?
I first came across the comparison back during the genomics frenzy. One company that had bought into the craze in a big way press-released (after a rather interval) that they'd advanced their first compound to the clinic based on this wonderful genomics information. I remember rolling my eyes and thinking "Oh, yeah", but on a hunch I went to the Yahoo! stock message boards (often a teeming heap of crazy, then as now). And there I found people just levitating with delight at this news. "This is Moore's Law as applied to drug discovery!" shouted one enthusiast. "Do you people realize what this means?" What it meant, apparently, was not only that this announcement had come rather quickly. It also meant that this genomics stuff was going to discover twice as many drugs as this real soon. And real soon after that, twice as many more, and so on until the guy posting the comment was as rich as Warren Buffet, because he was a visionary who'd been smart enough to load himself into the catapult and help cut the rope. (For those who don't know how that story ended, the answer is Not Well: the stock that occasioned all this hyperventilation ended up dropping by a factor of nearly a hundred over the next couple of years. The press-released clinical candidate was never, ever, heard of again).
I bring this up because a reader in the industry forwarded me this column from Bio-IT World, entitled, yes, "Only Moore's Law Can Save Big Pharma". I've read it three times now, and I still have only the vaguest idea of what it's talking about. Let's see if any of you can do better.
The author starts off by talking about the pressures that the drug industry is under, and I have no problem with him there. That is, until he gets to the scientific pressures, which he sketches out thusly:
Scientifically, the classic drug discovery paradigm has reached the end of its long road. Penicillin, stumbled on by accident, was a bona fide magic bullet. The industry has since been organized to conduct programs of discovery, not design. The most that can be said for modern pharmaceutical research, with its hundreds of thousands of candidate molecules being shoveled through high-throughput screening, is that it is an organized accident. This approach is perhaps best characterized by the Chief Scientific Officer of a prominent biotech company who recently said, "Drug discovery is all about passion and faith. It has nothing to do with analytics."
The problem with faith-based drug discovery is that the low hanging fruit has already been plucked, driving would be discoverers further afield. Searching for the next miracle drug in some witch doctor's jungle brew is not science. It's desperation.
The only way to escape this downward spiral is new science. Fortunately, the fuzzy outlines of a revolution are just emerging. For lack of a better word, call it Digital Chemistry.
And when the man says "fuzzy outline", well, you'd better take him at his word. What, I know you're all asking, is this Digital Chemistry stuff? Here, wade into this:
Tomorrow's drug companies will build rationally engineered multi-component molecular machines, not small molecule drugs isolated from tree bark or bread mold. These molecular machines will be assembled from discrete interchangeable modules designed using hierarchical simulation tools that resemble the tool chains used to build complex integrated circuits from simple nanoscale components. Guess-and-check wet chemistry can't scale. Hit or miss discovery lacks cross-product synergy. Digital Chemistry will change that.
Honestly, if I start talking like this, I hope that onlookers will forgo taking notes and catch on quickly enough to call the ambulance. I know that I'm quoting too much, but I have to tell you more about how all this is going to work:
But modeling protein-protein interaction is computationally intractable, you say? True. But the kinetic behavior of the component molecules that will one day constitute the expanding design library for Digital Chemistry will be synthetically constrained. This will allow engineers to deliver ever more complex functional behavior as the drugs and the tools used to design them co-evolve.
How will drugs of the future function? Intracellular microtherapeutic action will be triggered if and only if precisely targeted DNA or RNA pathologies are detected within individual sick cells. Normal cells will be unaffected. Corrective action shutting down only malfunctioning cells will have the potential of delivering 99% cure rates. Some therapies will be broad based and others will be personalized, programmed using DNA from the patient's own tumor that has been extracted, sequenced, and used to configure "target codes" that can be custom loaded into the detection module of these molecular machines. .
Look, I know where this is coming from. And I freely admit that I hope that, eventually, a really detailed molecular-level knowledge of disease pathology, coupled with a really robust nanotechnology, will allow us to treat disease in ways that we can't even approach now. Speed the day! But the day is not sped by acting as if this is the short-term solution for the ills of the drug industry, or by talking as if we already have any idea at all about how to go about these things. We don't.
And what does that paragraph up there mean? "The kinetic behavior. . .will be synthetically constrained"? Honestly, I should be qualified to make sense of that, but I can't. And how do we go from protein-protein interactions at the beginning of all that to DNA and RNA pathologies at the end, anyway? If all the genomics business has taught us anything, it's that these are two very, very different worlds - both important, but separated by a rather wide zone of very lightly-filled-in knowledge.
Let's take this step by step; there's no other way. In the future, according to this piece, we will detect pathologies by detecting cell-by-cell variations in DNA and/or RNA. How will we do that? At present, you have to rip open cells and kill them to sequence their nucleic acids, and the sensitivities are not good enough to do it one cell at a time. So we're going to find some way to do that in a specific non-lethal way, either from the outside of the cells (by a technology that we cannot even yet envision) or by getting inside them (by a technology that we cannot even envision) and reading off their sequences in situ (by a technology that we cannot even envision). Moreover, we're going to do that not only with the permanent DNA, but with the various transiently expressed RNA species, which are localized to all sort of different cell compartments, present in minute amounts and often for short periods of time, and handled in ways that we're only beginning to grasp and for purposes that are not at all yet clear. Right.
Then. . .then we're going to take "corrective action". By this I presume that we're either going to selectively kill those cells or alter them through gene therapy. I should note that gene therapy, though incredibly promising as ever, is something that so far we have been unable, in most cases, to get to work. Never mind. We're going to do this cell by cell, selectively picking out just the ones we want out of the trillions of possibilities in the living organism, using technologies that, I cannot emphasize enough, we do not yet have. We do not yet know how to find most individual cells types in a complex living tissue; huge arguments ensue about whether certain rare types (such as stem cells) are present at all. We cannot find and pick out, for example, every precancerous cell in a given volume of tissue, not even by slicing pieces out of it, taking it out into the lab, and using all the modern techniques of instrumental analysis and molecular biology.
What will we use to do any of this inside the living organism? What will such things be made of? How will you dose them, whatever they are? Will they be taken up though the gut? Doesn't seem likely, given the size and complexity we're talking about. So, intravenous then, fine - how will they distribute through the body? Everything spreads out a bit differently, you know. How do you keep them from sticking to all kinds of proteins and surfaces that you're not interested in? How long will they last in vivo? How will you keep them from being cleared out by the liver, or from setting off a potentially deadly immune response? All of these could vary from patient to patient, just to make things more interesting. How will we get any of these things into cells, when we only roughly understand the dozens of different transport mechanisms involved? And how will we keep the cells from pumping them right back out? They do that, you know. And when it's time to kill the cells, how do you make absolutely sure that you're only killing the ones you want? And when it's time to do the gene therapy, what's the energy source for all the chemistry involved, as we cut out some sequences and splice in the others? Are we absolutely sure that we're only doing that in just the right places in just the right cells, or will we (disastrously) be sticking in copies into the DNA of a quarter of a per cent of all the others?
And what does all this nucleic acid focus have to do with protein expression and processing? You can't fix a lot of things at the DNA level. Misfolding, misglycosylation, defects in transport and removal - a lot of this stuff is post-genomic. Are we going to be able to sequence proteins in vivo, cell by cell, as well? Detect tertiary structure problems? How? And fix them, how?
Alright, you get the idea. The thing is, and this may be surprising considering those last few paragraphs, that I don't consider all of this to be intrinsically impossible. Many people who beat up on nanotechnology would disagree, but I think that some of these things are, at least in broad hazy theory, possibly doable. But they will require technologies that we are nowhere close to owning. Babbling, as the Bio-IT World piece does, about "detection modules" and "target codes" and "corrective action" is absolutely no help at all. Every one of those phrases unpacks into a gigantic tangle of incredibly complex details and total unknowns. I'm not ready to rule some of this stuff out. But I'm not ready to rule it in just by waving my hands.
Comments (46)
+ TrackBacks (0) | Category: Drug Industry History | General Scientific News | In Silico | Press Coverage
April 1, 2009
Posted by Derek
Thanks to a comment on this post, I’ve had a chance to read this interesting article from Stephen Johnson of Bristol-Myers Squibb, entitled “The Trouble with QSAR (Or How I Learned to Stop Worrying And Embrace Fallacy)”. (As a side note, it’s interesting to see that people still make references to the titling of Dr. Strangelove. I’ve never met Johnson, but I’d gather from that that he can’t be much younger than I am).

The most arresting part of the article is the graph found in its abstract. No mention is made of it in the text, but none has to be. It’s a plot of the US highway fatality rate versus the tonnage of fresh lemons imported from Mexico, and I have to say, it’s a pretty darn straight line. I’ve seen a lot shakier plots used to justify some sweeping conclusions, and if those were justified, well, then I’m forced to conclude that Mexican lemons have improved highway safety a great deal. The vitamin C, maybe? The fragrance? Bioflavanoids?
None of the above, of course. Correlation, tiresomely, once again refuses to imply causation, even when you ask it nicely. And that’s the whole point of the article. QSAR, for those outside the business, stands for Quantitative Structure-Activity Relationship(s), an attempt to rationalize the behavior of a series of drug candidate compounds through computational means. The problem is, there are plenty of possible variables (size, surface area, molecular weight, polarity, solubility, charge, hydrogen bond donors and acceptors, and as many structural representation parameters as you can stand). As Johnson notes dryly:
” With such an infinite array of descriptions possible, each of which can be coupled with any of a myriad of statistical methods, the number of equivalent solutions is typically fairly substantial.”
That it is. And (as he rightly mentions) one of the other problems is that all these variables are discontinuous. Some region of the molecule can get larger, but only up to a point. When it’s too large to fit into the binding site any more, activity drops off steeply. Similarly, the difference between forming a crucial hydrogen bond and not forming one is a big difference, and it can be realized by a very small change in structure and properties. (Thus the “magic methyl” effect).
But that’s not the whole problem. Johnson takes many of his fellow computational chemists to task for what he sees as sloppy work. Too many models are advanced just because they’ve shown some (limited) correlations, and they’re not tested hard enough afterwards. Finding a model with a good “fitness score” becomes an end in itself:
”We can generate so many hypotheses, relating convoluted molecular factors to activity in such complicated ways, that the process of careful hypothesis testing so critical to scientific understanding has been circumvented in favor of blind validation tests with low resulting information content. QSAR disappoints so often, not only because the response surface is not smooth but because we have embraced the fallacy that correlation begets causation.”
Comments (32)
+ TrackBacks (0) | Category: In Silico
March 26, 2009
Posted by Derek
So, people like me spend their time trying to make small molecules that will bind to some target protein. So what happens, anyway, when a small molecule binds to a target protein? Right, right, it interacts with some site on the thing, hydrogen bonds, hydrophobic interactions, all that – but what really happens?
That’s surprisingly hard to work out. The tools we have to look at such things are powerful, but they have limitations. X-ray crystal structures are great, but can lead you astray if you’re not careful. The biggest problem with them, though (in my opinion) is that you see this beautiful frozen picture of your drug candidate in the protein, and you start to think of the binding as. . .well, as this beautiful frozen picture. Which is the last thing it really is.
Proteins are dynamic, to a degree that many medicinal chemists have trouble keeping in mind. Looking at binding events in solution is more realistic than looking at them in the crystal, but it’s harder to do. There are various NMR methods (here's a recent review), some of which require specially labeled protein to work well, but they have to be interpreted in the context of NMR’s time scale limitations. “Normal” NMR experiments give you time-averaged spectra – if you want to see things happening quickly, or if you want to catch snapshots of the intermediate states along the way, you have a lot more work to do.
Here’s a recent paper that’s done some of that work. They’re looking at a well-known enzyme, dihydrofolate reductase (DHFR). It’s the target of methotrexate, a classic chemotherapy drug, and of the antibiotic trimethoprim. (As a side note, that points out the connections that sometimes exist between oncology and anti-infectives. DHFR produces tetrahydrofolate, which is necessary for a host of key biosynthetic pathways. Inhibiting it is espccially hard on cells that are spending a lot of their metabolic energy on dividing – such as tumor cells and invasive bacteria).
What they found was that both inhibitors do something similar, and it affects the whole conformational ensemble of the protein:
". . .residues lining the drugs retain their μs-ms switching, whereas distal loops stop switching altogether. Thus, as a whole, the inhibited protein is dynamically dysfunctional. Drug-bound DHFR appears to be on the brink of a global transition, but its restricted loops prevent the transition from occurring, leaving a “half-switching” enzyme. Changes in pico- to nanosecond (ps-ns) backbone amide and side-chain methyl dynamics indicate drug binding is “felt” throughout the protein.
There are implications, though, for apparently similar compounds having rather different effects out in the other loops:
. . .motion across a wide range of timescales can be regulated by the specific nature of ligands bound. Occupation of the active site by small ligands of different shapes and physical characteristics places differential stresses on the enzyme, resulting in differential thermal fluctuations that propagate through the structure. In this view, enzymes, through evolution, develop sensitivities to ligand properties from which mechanisms for organizing and building such fluctuations into useful work can arise. . .Because the affected loop structures are primarily not in contact with drug, it is reasonable to envision inhibitory small-molecule drugs that act by allosterically modulating dynamic motions."
There are plenty of references in the paper to other investigations of this kind, so if this is your sort of thing, you'll find plenty of material there. One thing to take home, though, is to remember that not only are proteins mobile beasts (with and without ligand bound to them), but that this mobility is quite different in each state. And keep in mind that the ligand-bound state can be quite odd compared to anything else the protein experiences otherwise. . .
Comments (3)
+ TrackBacks (0) | Category: Biological News | Cancer | Chemical News | In Silico
February 24, 2009
Posted by Derek
Medicinal chemists spend a lot of their time exploring and trying to make sense of structure-activity relationships (SARs). We vary our molecules in all kinds of ways, have the biologists run them through the assays, and then sit down to make sense of the results.
And then, like as not, we get up again after a few minutes, shaking our heads. Has anyone out there ever worked on a project where the entire SAR made sense? I’ve always considered it a triumph if even a reasonable majority of the compounds fit into an interpretable pattern. SAR development is a perfect example of things not quite working out the way that they do in textbooks.
The most common surprise when you get your results back, if that phrase “common surprise” makes any sense, is to find that you’ve pushed some trend a bit too far. Methyl was pretty good, ethyl was better, but anything larger drops dead. I don’t count that sort of thing – those are boundary conditions, for the most part, and one of the things you do in a med-chem program is establish the limits under which you can work. But there are still a number of cases where what you thought was a wall turns out to have a secret passage or two hidden in it. You can’t put any para-substituents on that ring, sure. . .unless you have a basic amine over on the other end of the molecule, and then you suddenly can.
I’d say that a lot of these get missed, because after a project’s been running a while, various SAR dogmas get propagated. There are features of the structure space that “everybody knows”, and that few people want to spend their time violating. But it’s worth devoting a small (but real) amount of effort to going back and checking some of these after the lead molecule has evolved a bit, since you can get surprised.
Some projects I’ve worked on have so many conditional clauses of this sort built into their SAR that you wonder whether there are any boundaries at all. This works, unless you have this, but if you have that over there it can be OK, although there is that other compound which didn’t. . .making sense of this stuff can just be impossible. The opposite situation, the fabled Perfectly Additive SAR, is something I’ve never encountered in person, although I’ve heard tales after the fact. That’s the closest we come to the textbooks, where you can mix and match groups and substituents any way you like, predicting as you go from the previous trends just how they’ll come out. I have to think that any time you can do this, that it has to be taking place in a fairly narrow structure space – surely we can always break any trend like this with a little imagination.
Another well-known bit of craziness is the Only Thing That Works There. You’ll have whole series of compounds that have to have a a methyl group at some position, or they’re all dead. Nothing smaller, nothing larger, nothing with a different electronic flavor: it’s methyl or death. (Or fluoro, or a thiazole, or what have you – I’ve probably seen this with methyl more than with other groups, but it can happen all over the place). A sharp SAR is certainly nothing to fear; it’s probably telling you that you really are making good close contacts with the protein target somewhere. But it can be unnerving, and sometimes there’s not a lot of room left on the ledge when you have more than one constraint like this.
Why does all this go on? Multiple binding modes, you have to think. Proteins are flexible beasts, and they've got lots of ways to react to ligands. And it's important never to forget that we can't predict their responses, at least not yet and not very well. And of course, in all this discussion, we've just been considering one target protein. When you think about the other things your molecule might be hitting in cells or in a whole animal, and that the SAR relationships for those off-target things are just as fluid and complicated as for your target, well. . .you can see why medicinal chemistry is not going away anytime soon. Or shouldn't, anyway.
Comments (40)
+ TrackBacks (0) | Category: Drug Assays | In Silico | Life in the Drug Labs
December 10, 2008
Posted by Derek
There’s a trick that every medicinal chemist learns very early, and continues to apply every time its feasible: take two parts of your compound, and tie them together into a ring.
The reason that works so well may not be immediately obvious if you’re not a medicinal chemist, so let me expand on them a bit. The first thing to know is that this method tends to work either really well or not at all – it’s a “death or glory” move. And that gives you a clue as to what’s going on. The idea is that the rotatable bonds in your molecule are, under normal conditions, doing just that: rotating. Any molecule the size of a normal drug has all kinds of possible shapes and rotational isomers, and room temperature is an energetic enough environment to populate a lot of them.
But there’s only one of them that’s the best for fitting into your drug target, most likely. So what are the odds? As your molecule approaches its binding pocket, there’s a complicated energetic dance going on. Different parts of your drug candidate will start interacting with the target (usually a protein), and that starts to tie down all that floppy rotation. The question is, does the gain resulting from these interactions cancel out the energetic price that has to be paid for them? Is there a pathway that leads to a favorable tight-binding situation, or is your molecule going to approach, flop around a bit, and dance away?
Several things are at work during that shall-we-dance period. The different conformations of your compound vary in energy, depending on how much its parts are starting to bang into each other, and how much you’re asking the bonds to twist around. The closer that desired drug-binding shape is to the shape your molecule wants to be in anyway, the better off you are, from that perspective. So tying back the molecule and making a ring in the structure does one thing immediately: it cuts down on the range of conformations it can take, in the same way that tying a rope between your ankles cuts down on your ability to dance. You’ve handcuffed your molecule, which would probably be cruel if they were sentient, but then, a lot of organic chemistry would be pretty unspeakable if molecules had feelings.
That’s why this method tends to be either a big winner or a big loser. If the preferred binding mode of your compound is close to the shape it takes when you tie it down, then you’ve suddenly zeroed in on just the thing you want, and the binding affinity is going to take a big leap. But if it’s not, well, you’ve now probably made it impossible for the thing to adopt the conformation it needs, and the binding affinity is going to take a big leap over a cliff.
There’s another effect to reducing the flexibility of your compound, and that has to do with entropy. All that favorable-interaction business is one component of the energy involved, namely the enthalpy, but entropy is the other. Loosely speaking, the more disordered a system, the higher its entropy. A floppy molecule, when it binds to a drug target, has to settle down into a much tighter fit, and entropically, that’s unfavorable. Energetically, you’re paying to do that. But if your molecule is already much less flexible, there’s not much of a toll as it fits into the pocket. If loss-of-floppiness is a bad thing, then don’t start out with so much of it.
So, how much do I and my medicinal chemistry colleagues think about this stuff, day to day? A fair amount, but there are parts of it that we probably don’t pay enough attention to. Entropy gets less respect from us than it deserves, I think. It’s easy to imagine molecules bumping into each other, sticking and unsticking, but the more nebulous change-in-disorder part of the equation is just as important. And it doesn’t just apply to our drug molecules – proteins get less disordered as they bind those molecules (or more disordered, in some cases), and those entropic changes can mean a lot, too.
I also mentioned molecules finding a pathway to binding, and that’s something that we don’t think about as much, either. We probably make things all the time that would be potent binders, if they just could get past some energetic hump and wedge themselves into place. But there are no crowbars available; our drug candidates have to be able to work their way in on their own. The can’t-get-there-from-here cases come back from the assays as inactive. The tendency is to imagine these in the binding site already, and to try to think of what could be going wrong in there – but it may be that they’d be fine, but that their structures won’t allow them to come in for a landing.
Picturing this accurately is very hard indeed. We have enough trouble with good representations of static pictures of our molecules bound to their targets, so making a movie of the process is a whole different story. Each frame is on a femtosecond scale – molecules flip around rather quickly – and every frame would have to be computed accurately (drug structure, protein structure, and the energetics of the whole system) for the resulting video clip to make sense. It’s been done, but not all that often, and we’re not good at it.
Comments (13)
+ TrackBacks (0) | Category: In Silico | Pharma 101
September 25, 2008
Posted by Derek
Want a hard problem? Something to really keep you challenged? Try protein folding. That'll eat up all those spare computational cycles you have lounging around and come back to ask for more. And it'll do the same for your brain cells, too, for that matter.
The reason is that a protein of any reasonable size has a staggering number of shapes it can adopt. If you hold a ball-and-stick model of one, you realize pretty quickly that there are an awful lot of rotatable bonds in there (not least because they flop around while you're trying to hold the model in your hands). My daughter was playing around with a toy once that was made of snap-together parts that looked like elbow macaroni pieces, and I told her that this was just like a lot of molecules inside her body. We folded and twisted the thing around very quickly to a wide variety of shapes, even though it only had ten links or so, and I then pointed out to her that real proteins all had different things sticking off at right angles in the middle of each piece, making the whole situation even crazier.
There's a new (open access) paper in PNAS that illustrates some of the difficulties. The authors have been studying man-made proteins that have substantially similar sequences of amino acids, but still have different folding and overall shape. In this latest work, they've made it up to two proteins (56 amino acids each) that have 95% sequence identity, but still have very different folds. It's just a few key residues that make the difference and kick the overall protein into a different energetic and structural landscape. The other regions of the proteins can be mutated pretty substantially without affecting their overall folding, on the other hand. (In the picture, the red residues are the key ones and the blue areas are the identical/can-be-mutated domains).

This ties in with an overall theme of biology - it's nonlinear as can be. The systems in it are huge and hugely complicated, but the importance of the various parts varies enormously. There are small key chokepoints in many physiological systems that can't be messed with, just as there are some amino acids that can't be touched in a given protein. (Dramatic examples include the many single-amino-acid based genetic disorders).
But perhaps the way to look at it is that the complexity is actually an attempt to overcome this nonlinearity. Otherwise the system would be too brittle to work. All those overlapping, compensating, inter-regulating feedback loops that you find in biochemistry are, I think, a largely successful attempt to run a robust organism out of what are fundamentally not very robust components. Evolution is a tinkerer, most definitely, and there sure is an awful lot of tinkering that's been needed.
Comments (8)
+ TrackBacks (0) | Category: General Scientific News | In Silico
September 4, 2008
Posted by Derek
X-ray crystallography is wonderful stuff – I think you’ll get chemists to generally agree on that. There’s no other technique that can provide such certainty about the structure of a compound – and for medicinal chemists, it has the invaluable ability to show you a snapshot of your drug candidate bound to its protein target. Of course, not all proteins can be crystallized, and not all of them can be crystallized with drug ligands in them. But an X-ray structure is usually considered the last word, when you can get one – and thanks to automation, computing power, and to brighter X-ray sources, we get more of them than ever.
But there are a surprising number of ways that X-ray data can mislead you. For an excellent treatment of these, complete with plenty of references to the recent literature, see an excellent paper coming out in Drug Discovery Today from researchers at Astra-Zeneca (Andy Davis and Stephen St.-Gallay) and Uppsala University (Gerard Kleywegt). These folks all know their computational and structural biology, and they’re willing to tell you how much they don’t know, either.
For starters, a small (but significant) number of protein structures derived from X-ray data are just plain wrong. Medicinal chemists should always look first at the resolution of an X-ray structure, since the tighter the data, the better the chance there is of things being as they seem. The authors make the important point that there’s some subjective judgment involved on the part of a crystallographer interpreting raw electron-density maps, and the poorer the resolution, the more judgment calls there are to be made:
Nevertheless, most chemists who undertake structure-based design treat a protein crystal structure reverently as if it was determined at very high resolution, regardless of the resolution at which the structure was actually determined (admittedly, crystallographers themselves are not immune to this practice either). Also, the fact that the crystallographer is bound to have made certain assumptions, to have had certain biases and perhaps even to have made mistakes is usually ignored. Assumptions, biases, ambiguities and mistakes may manifest themselves (even in high-resolution structures) at the level of individual atoms, of residues (e.g. sidechain conformations) and beyond.
Then there’s the problem of interpreting how your drug candidate interacts with the protein. The ability to get an X-ray structure doesn’t always correlate well with the binding potency of a given compound, so it’s not like you can necessarily count on a lot of clear signals about why the compound is binding. Hydrogen bonds may be perfectly obvious, or they can be rather hard to interpret. Binding through (or through displacement of) water molecules is extremely important, too, and that can be hard to get a handle on as well.
And not least, there’s the assumption that your structure is going to do you good once you’ve got it nailed down:
It is usually tacitly assumed that the conditions under which the complex was crystallised are relevant, that the observed protein conformation is relevant for interaction with the ligand (i.e. no flexibility in the active-site residues) and that the structure actually contributes insights that will lead to the design of better compounds. While these assumptions seem perfectly reasonable at first sight, they are not all necessarily true. . .
That’s a key point, because that’s the sort of error that can really lead you into trouble. After all, everything looks good, and you can start to think that you really understand the system, that is until none of your wonderful X-ray-based analogs work out they way you thought they would. The authors make the point that when your X-ray data and your structure-activity data seem to diverge, it’s often a sign that you don’t understand some key points about the thermodynamics of binding. (An X-ray is a static picture, and says nothing about what energetic tradeoffs were made along the way). Instead of an irritating disconnect or distraction, it should be looked at as a chance to find out what’s really going on. . .
Comments (15)
+ TrackBacks (0) | Category: Analytical Chemistry | Drug Assays | In Silico
May 23, 2008
Posted by Derek
Something that’s come up in the last few posts around here is the way that we chemists think about the insides of enzymes. It’s a tricky subject, because when you picture things on that scale, the intuition you have for objects starts to betray you.
Consider water. We humans have a pretty good practical understanding of how water behaves in the bulk phase; we have the experience. But what about five water molecules sitting in the pocket of an enzyme? That’s not exactly a glass from the tap. These guys are interacting with the protein as much (or more) than they’re interacting with each other, and our intuition about water molecules is based on how they act when it’s surrounded by plenty of their own.
And if five water molecules are hard to handle, how about one? There’s no hope of seeing any bulk properties now, because there’s no bulk. We’re more used to having trouble in the other direction, predicting group behavior from individuals: you can’t tell much about a thousand-piece jigsaw puzzle from one piece that you found under the couch, and you wouldn’t be able to say much about the behavior of an ant colony from observing one ant in a jar. And neither of those are worth very much, compared to their group. But with molecules, the single-ant-in-a-jar situation is very important (that’s a single water molecule sitting in the active site of an enzyme), and knowledge of ant social behavior or water’s actions in a glass doesn’t help much.
Larger molecules than water are our business, of course, and those are tricky, too. We can study the shape and flexibility of our drug candidates in solution (by NMR, to pick the easiest method), and in the solid phase, surrounded by packed arrays of themselves (X-ray crystal structures). But the way that they look inside an enzyme's active site doesn't have to be related to either of those, although you might as well start there.
As single-molecule (and single-atom) techniques have become more possible, we're starting to get an idea of how small clusters of them have to be before they stop acting like tiny pieces of what we're used to, and starts acting like something else. But these experiments are usually done in isolation, in the gas phase or on some inert surface. The inside of a protein is another thing entirely; molecules there are the opposite of isolated. And studying them in those small spaces is no small task.
Comments (4)
+ TrackBacks (0) | Category: In Silico
May 1, 2008
Posted by Derek
Drug Discovery Today has the first part of an article on the history of the molecular modeling field, this one covering about 1960 to 1990. It’s a for-the-record document, since as time goes on it’ll be increasingly hard to unscramble all the early approaches and players. I think this is true for almost any technology; the early years are tangled indeed.
As you would imagine, the work from the 1960s and 1970s has an otherwordly feel to it, considering the hardware that was available. And that brings up another thing common to the early years of new technologies: when you look back on them from their later years, you wonder how these people could possibly have even tried to do these things.
I mean, you read about, say, Richard Cramer establishing the computer-aided drug design program at Smith, Kline and French in nineteen-flipping-seventy-one, and on one level you feel like congratulating his group for their farsightedness. But mainly you just feeling like saying “Oh, you poor people. I am so sorry.” Because from today's perspective, there is just no way that anyone could have done any meaningful molecular modeling for drug design in 1971. I mean, we have enough trouble doing it for a lot of projects in 2008.
Think about it: big ol’ IBM mainframe, with those tape drives that for many years were visual shorthand for Computer System but now look closer to steam engines and water wheels. Punch cards: riffling stacks of them, and whole mechanical devices with arrays of rods to make and troubleshoot stiff pieces of paper with holes in them. And the software – written in what, FORTRAN? If they were lucky. And written in a time when people were just starting to say, well, yes, I suppose that you could, in fact, represent attractive and repulsive molecular forces in terms that could be used by a computer program. . .hmm, let’s see about hydrogen bonds, then. . .
It gives a person the shudders. But that must be inevitable – you get the same feeling when you see an early TV set and wonder how anyone could have derived entertainment from a fuzzy four-inch-wide grey screen. Or see the earliest automobiles, which look to have been quite a bit more trouble than a horse. How do people persevere?
Well, for one thing, by knowing that they’re the first. Even if technology isn’t what you might dream of it being some day, you’re still the one out on the cutting edge, with what could be the best in the world as it is. They also do it by not being able to know just what the limits to their capabilities are, not having the benefit of decades of hindsight. The molecular modelers of the early 1970s did not, I’m sure, see themselves as tentatively exploring something that would probably be of no use for years to come. They must have thought that there was something good just waiting right there to be done with the technology they had (which was, as just mentioned, the best ever seen). They may well have been wrong about that, but who was to know until it was tried?
And all of this – the realizations that there’s something new in the world, that there are new things that can be done with it, and (later) that there’s more to it (both its possibilities and difficulties) than was first apparent – all of this comes on gradually. If it were to hit you all at once, you’d be paralyzed with indecision. But the gap in the trees turns into a trail, and then into a dirt path before you feel the gravel under your feet, speeding up before you realize that you’re driving down a huge highway that branches off to destinations you didn’t even know existed.
People are seeing their way through to some of those narrow footpaths right now, no doubt. With any luck, in another thirty years people will look back and pity them for what they didn’t and couldn’t know. But the people doing it today don’t feel worthy of pity at all – some of them probably feel as if they’re the luckiest people alive. . .
Comments (8)
+ TrackBacks (0) | Category: Drug Industry History | In Silico | Who Discovers and Why
March 27, 2008
Posted by Derek
There’s an excellent paper in the most recent issue of Chemistry and Biology that illustrates some of what fragment-based drug discovery is all about. The authors (the van Aalten group at Dundee) are looking at a known inhibitor of the enzyme chitinase, a natural product called argifin. It’s an odd-looking thing – five amino acids bonded together into a ring, with one of them (an arginine) further functionalized with a urea into a sort of side-chain tail. It’s about a 27 nM inhibitor of the enzyme.
(For the non-chemists, that number is a binding affinity, a measure of what concentration of the compound is needed to shut down the enzyme. The lower, the better, other things being equal. Most drugs are down in the nanomolar range – below that are the ulta-potent picomolar and femtomolar ranges, where few compounds venture. And above that, once you get up to 1000 nanomolar, is micromolar, and then 1000 micromolar is one millimolar. By traditional med-chem standards, single-digit nanomolar = good, double-digit nanomolar = not bad, triple-digit nanomolar or low micromolar = starting point to make something better, high micromolar = ignore, and millimolar = can do better with stuff off the bottom of your shoe.
What the authors did was break this argifin beast up, piece by piece, measuring what that did to the chitinase affinity. And each time they were able to get an X-ray structure of the truncated versions, which turned out to be a key part of the story. Taking one amino acid out of the ring (and thus breaking it open) lowered the binding by about 200-fold – but you wouldn’t have guessed that from the X-ray structure. It looks to be fitting into the enzyme in almost exactly the same way as the parent.
And that brings up a good point about X-ray crystal structures. You can’t really tell how well something binds by looking at one. For one thing, it can be hard to see how favorable the various visible interactions might actually be. And for another, you don’t get any information at all about what the compound had to pay, energetically, to get there.
In the broken argifin case, a lot of the affinity loss can probably be put down to entropy: the molecule now has a lot more freedom of movement, which has to be overcome in order to bind in the right spot. The cyclic natural product, on the other hand, was already pretty much there. This fits in with the classic med-chem trick of tying back side chains and cyclizing structures. Often you’ll kill activity completely by doing that (because you narrowed down on the wrong shape for the final molecule), but when you hit, you hit big.
The structure was chopped down further. Losing another amino acid only hurt the activity a bit more, and losing still another one gave a dipeptide that was still only about three times less potent than the first cut-down compound. Slicing that down to a monopeptide, basically just a well-decorated arginine, sent the activity down another sixfold or so – but by now we’re up to about 80 micromolar, which most medicinal chemists would regard as the amount of activity you could get by testing the lint in your pocket.
But they went further, making just the little dimethylguanylurea that’s hanging off the far end. That thing is around 500 micromolar, a level of potency that would normally get you laughed at. But wait. . .they have the X-ray structures all along the way, and what becomes clear is that this guanylurea piece is binding to the same site on the protein, in the same manner, all the way down. So if you’re wondering if you can get an X-ray structure of some 500 micromolar dust bunny, the answer is that you sure can, if it has a defined binding site.
And the value of these various derivatives almost completely inverts if you look at them from a binding efficiency standpoint. (One common way to measure that is to take the minus log of the binding constant and divide by the molecular weight in kilodaltons). That’s a “bang for the buck” index, a test of how much affinity you’re getting for the weight of your molecule. As it turns out, argifin – 27 nanomolar though it be – isn’t that efficient a binder, because it weighs a hefty 676. The binding efficiency index comes out to just under 12, which is nothing to get revved up about. The truncated analogs, for the most part, aren’t much better, ranging from 9 to 15.
But that guanylurea piece is another story. It doesn’t bind very tightly, but it bats way above its scrawny size, with a BEI of nearly 28. That’s much more impressive. If the whole argifin molecule bound that efficiently, it would be down in the ten-to-the-minus nineteenth range, and I don’t even know the name of that order of magnitude. If you wanted to make a more reasonably sized molecule, and you should, a compound of MW 400 would be about ten femtomolar with a binding efficiency like that. There’s plenty of room to do better than argifin.
So the thing to do, clearly, is to start from the guanylurea and build out, checking the binding efficiency along the way to make sure that you’re getting the most out of your additions. And that is exactly the point of fragment-based drug discovery. You can do it this way, cutting down a larger molecule to find what parts of it are worth the most, or you can screen to find small fragments which, though not very potent in the absolute sense, bind very efficiently. Either way, you take that small, efficient piece as your anchor and work from there. And either way, some sort of structural read on your compounds (X-ray or NMR) is very useful. That’ll give you confidence that your important binding piece really is acting the same way as you go forward, and give you some clues about where to build out in the next round of analogs.
This particular story may be about as good an illustration as one could possibly find - here's hoping that there are more that can work out this way. Congratulations to van Aalten and his co-workers at Dundee and Bath for one of the best papers I've read in quite a while.
Comments (12)
+ TrackBacks (0) | Category: Analytical Chemistry | Drug Assays | In Silico
March 5, 2008
Posted by Derek
There’s an interesting article coming out in J. Med. Chem. on antibiotic compounds, which highlights something that’s pretty clear if you spend some time looking at the drugs in that area. We make a big deal (or have made one over the last ten years) about drug-like properties – all that Rule-of-Five stuff and its progeny. Well, take a look at the historically best-selling antibiotic drugs: you’ve never seen such a collection of Rule of Five violators in your life.
That’s partly because a lot of structures in that area have come from natural products, but hey, natural products are drugs, too. Erythromycin, the aminoglycosides, azithromycin, tetracycline: what a crew! But they’ve helped an untold number of people over the years. It’s true that the fluoroquinolones are much more normal-looking, but those are balanced out by weirdo one-shots like fosfomycin. I mean, look at that thing – would you ever believe that that’s a marketed drug? (And with decent bioavailability, too?)
No, you have to be broad-minded if you’re going to beat up on bacteria, and I think some broad-mindedness would do us all good in other therapeutic areas, too. I don’t mean we should ignore what we’ve learned about drug-like properties: our problem is that we tend to make allowances and exceptions on the greasy high-molecular weight end of the scale, since that’s where too many of our compounds end up. It wouldn’t hurt to push things on the other end, because I think that you have a better chance of getting away with too much polarity than you have of getting away with too little.
One reason for that might be that there are a lot of transporter proteins in vivo that are used to dealing with such groups. It’s easy to forget, but a great number of proteins are decorated with carbohydrate residues, and they’re on there for a lot of reasons. And a lot of extremely important small molecules in biochemistry are polar as well – right off the top of my head, I don’t know what the logD or polar surface area of things like ATP or NAD are, but I’ll bet that they’re far off the usual run of drugs. Admittedly, those aren’t going to reach good blood levels if you dose them orally; we’re trying to do something that’s rather unnatural as far as the body’s concerned. But we could still usefully take advantage of some of the transport and handling systems for such molecules.
But that’s not always easy to do. We all talk about making our compounds more polar and more soluble, but we balk at some of the things that will do that for us. Sure, you can slap a couple of methoxyethoxys on your ugly flat molecule, or hang a morpholine off the end of a chain to drag things into the water layer. But slap five or six hydroxyls on your molecule, and you’ll be lucky not to have the security guards show up at your desk.
There are, to be sure, some good reasons why they might. Hydroxyls and such tend to introduce chiral centers, which can make your synthesis difficult and dramatically increase the amount of work needed to fill out the structural possibilities of your lead series. That’s why these things tend to be (or derive from) natural products. Some bacterium or fungus has done most of the heavy lifting already, both in terms of working out the most active isomers and in synthesizing them for you. Erythromycin’s a fine starting material when you can get it by fermentation, but no one would ever, ever consider it if it had to be made by pure total synthesis.
There’s another consideration, which gets you right at the bench level. For an organic chemist, working with charged, water-soluble compounds is no fun. A lot of our lab infrastructure is built for things that would rather dissolve in ethyl acetate than water. A constant run of things with low logD values would mean that we’d all have to learn some new skills (and that we’d all probably have to spend a lot of time on the lyophilizer). Ion-exchange resins, gel chromatography, desalting columns – you might as well be a biochemist if you’re going to work with that stuff. But in the end, perhaps we might be better off, at least part of the time, if we were.
Comments (13)
+ TrackBacks (0) | Category: Drug Industry History | In Silico | Infectious Diseases
April 22, 2007
Posted by Derek
Pretty much the only thing that an interested lay person has heard about ligand binding is the "lock and key" metaphor. I'm not saying that you could walk down the sidewalk getting nods of recognition with it, but if someone's heard anything about how enzymes or receptors work (well, anything correct), that's probably what they've heard.
And there's a lot to it. Many proteins are really, really good at picking out their ligands from crowds of similar compounds. (If they were perfect at it, on the other hand, we drug company types would be out of business). But the lock-and-key metaphor makes the listener believe that both the ligand and the protein are rigid objects, which they most definitely are not. There's no everyday analog to the way that two conformationally mobile objects fit to each other - well, OK, maybe there is, but it's not one that you can safely use for illustrative purposes. Ahem.
The other big breakdown of the lock and key is that it doesn't deal well with the numerous proteins that can recognize more than one ligand for their binding sites. Particularly impressive are the nuclear receptors and the CYP metabolizing enzymes. Both those classes bind a bewildering number of not-very-similar compounds, and they can do it impressively well. They manage the trick by having binding pockets that can drastically change their shapes and charge distributions, as parts of the proteins themselves slide, twist, and flip around. I can't come up with even a vulgar metaphor for that process.
I'm thinking of doing several posts on the limits of metaphor and simplification in science, and if I do, this will be the first. It's a constant struggle not to mistake the picture for the real thing, particularly if the simplification is a pretty useful one. But eventually, no matter how good, the metaphor will thin out on you, and you'll be in the position of a Greek bird pecking at some painted fruit and wondering why it's still hungry.
Comments (29)
+ TrackBacks (0) | Category: In Silico | Metaphors, Good and Bad
March 12, 2007
Posted by Derek
I wanted to link tonight to the "Milkshake Manifesto" over at OrgPrep Daily. It's a set of rules for med-chem, and looking them over, I agree with them pretty much across the board. There's a general theme in them of getting as close to the real system as you can, which is a theme I've sounded many times.
That applies to things like "Rule of Five" approximations and docking scores - useful, perhaps if you're sorting through a huge pile of compounds that you have to prioritize, not so useful if you've already got animal data.
He also takes a shot at Caco-2 cells and other such approximations to figure out membrane and tissue penetration. I've never yet seen an in vitro assay for permeability that I would trust - it's just too complicated, and it may never yield to a reductionist approach.
I'm a big fan of reductionism, don't get me wrong, but it's not the tool for every job. Living systems are especially tricky to pare down, and you can simplify yourself right out of any useful data if you're not very careful. The closer to the real world, the better off you are. It isn't easy, and it isn't cheap, but nothing good ever came easy or cheap, did it?
Comments (6)
+ TrackBacks (0) | Category: Drug Assays | Drug Development | In Silico
February 27, 2007
Posted by Derek
SciTheory has a post, complete with links to the relevant articles in Science, etc., on a recent batch of trouble in structural biology. Geoffrey Chang and his group at Scripps have been working on the structures of transporter proteins, which sit in the cell membrane and actively move nonpermeable molecules in and out. There are a heap of these things, since (as any medicinal chemist will tell you) a lot of reasonable-looking molecules just won't get into cells without help. It's even tougher at a physiological level, because (from a chemist's perspective) many of the things that need to be shuttled around aren't very reasonable-looking at all - they're too small and polar or too large and greasy.
Many of these transportersm especially in bacteria, fall into a large group known as the ABC transporters, which have an ATP binding site in them for fuel. (For the non-scientists in the audience, ATP is the molecule used for energy storage in everything living on Earth. Thinking of an ATP-binding site as a NiCad battery pack gets you remarkably close to the real situation). Chang solved the structure of one of these, the bacterial protein MsbA, by X-ray crystallography back in 2001, and it was quite an accomplishment. Getting good X-ray diffraction data on proteins which spend their lives stuck in the cell membrane is rather a black art.
How dark an art is now apparent - here's the original paper's abstract in PubMed, but if you look just above the abstract, you'll see a retraction notice, and it's not alone. Five papers on various structures have been withdrawn. As SciTheory says, anyone who doubted the original MsbA structure had some real food for thought last year when another bacterial transporter was solved at the ETH in Zurich. These two should have looked more similar than they did, to most ways of thinking, but they were quite divergent.
And now we know why. Chang's group was done in by some homebrew software which swapped two columns of data. In a structure this large and complicated, you can have such disruptive things happen and still be able to settle down on a final protein picture - it's just that it'll be completely wrong. And so it was. The same software seems to have undermined the other determinations, too.
This is important (as well as sad and painful) on several levels. For one thing, transporters are essential to understanding resistance to antibiotics and cancer therapies, and they're vital parts of a lot of poorly understood processes in normal cells. We're not going to be able to get a handle on the often-inscrutable distribution of drug candidates in living systems until we know more about these proteins, but now some of what we thought we knew has evaporated on us.
Another point that people shouldn't miss is the trouble with relying too much on computational methods. There's really no alternative to them in protein crystallography, of course, but there always has to be a final "Does that make sense?" test. The difficulty is that many perfectly valid protein structures show up with odd and surprising features. Alternately, it's unnerving that the data for these things can be so thoroughly hosed and still give you a valid-looking structure, but that just serves to underline how careful you have to be.
And we're talking about X-ray data, which (done properly) is considered to be pretty solid stuff. So what does this say about basing research programs on the higher levels of abstraction found in molecular modeling and docking progams?
Comments (21)
+ TrackBacks (0) | Category: In Silico
December 14, 2006
Posted by Derek
Glenn Reynolds gave the pharma industry a much-appreciated thank-you card over at Instapundit:
Only a moron would want to live in a society where people are ashamed to work for drug companies. And yet, I'm not surprised to see that resulting from the demagogy that abounds among politicians and "public interest" types who are not serving the public interest whatsoever.
I'm thinking of having that first sentence engraved on something expensive. Glenn's post prompted Dean Esmay to write a short post on the ethics of drug companies, though, and he's rather less positive. I suppose I shouldn't be surprised, given some of the things he's gone in for in the past. As usual, some of the problem is the difficulty that people have coming to terms with the fact that drug discovery is a for-profit industry.
One comment on his post came from Jerry Kindall, which is mostly favorable to the industry, but nonetheless contains this paragraph:
Drug discovery used to be a total crap-shoot but it's getting more and more targeted as the years go by thanks to ever more sophisticated computer modeling. They are now able to say "okay, this is the chemical receptor that we think we need to address, let's design a molecule that fits into it." This is essentially a nanotechnology, although not the type most people think of when they hear the term.
Ay, would that it were true. As my industry readers know, and as I've been ranting abouit here fairly often, drug discovery is just as much of a crap-shoot as it's ever been. And wouldn't it be great if "sophisticated computer modeling" helped that much? Instead, we get things like this. No, I think what's happening here is that we're being underestimated by our enemies and overestimated by our friends. . .
Comments (32)
+ TrackBacks (0) | Category: In Silico | Why Everyone Loves Us
September 11, 2006
Posted by Derek
It's been a while since I wrote about the neuraminidase inhibitors (Tamiflu and Relenza, oseltamavir and zanamivir). As we start to head into fall, though, I'm sure that avian flu will invade the headlines again, if nothing else (and I hope it's nothing else).
There's an interesting report in Nature (subscriber link) on how these drugs work. Bird flu is a Type A influenza, but there are two broad groups inside that class, which are defined by what variety of neuraminidase enzyme they express. (There are actually nine enzyme variants known, but four of them fall into one group and five into the other).
The drugs were developed against group-2 enzymes, but they're also effective against group-1 influenzas. Since the X-ray crystal structures showed the the drugs bound in the same way to all the group-2 neuraminidases, and since the active sites of all the subtypes across the two groups are extremely similar, no one ever thought that their binding modes would be different. Well, until last month, anyway, when the X-ray crystallographic data came in.
And what it showed was that the active sites of the group-1 enzymes, sequence homology be damned, have a much different structure than the group-2s. As it turns out, though, they can adopt a similar shape when an inhibitor binds to them, which is why the marketed inhibitors still work on them, but they're fundamentally quite different.
I can't resist the urge to use this example to illustrate some of the real problems in our current state of the art for computation and modeling. The differences between these two enzymes are due to their different amino acid residues far away from the active site, which makes modeling them much, much more difficult (and makes the error bars much, much wider when you do). That's why no one realized how far off the group-1 and group-2 neuraminidases were until the X-ray structure was available: modeling couldn't tell you. Any modeling efforts that tried would probably have decided, incorrectly, that the two groups were nearly identical. Why shouldn't they be?
But if we'd had that X-ray data from the start, modeling would very likely have told you, incorrectly, that there was little chance that either Relenza or Tamiflu would work on the group-1 enzyme variants. Why should they? The "induced fit" binding modes, where the enzyme changes shape significantly as the ligand binds, are understandably very difficult to model. There are just too many possibilities, too many of which are within each other's computational error bars.
Now, it's true that this latest work isn't based on molecular modeling at all. (You have to wonder how close these guys got, though). But plenty of projects that are using it are just as much in the dark as a neuraminidase team would have been, and they may not even realize it. Most molecular modelers are well aware of these limitations, but not all of them - or all of the managers over them - are willing to accept them. And when you get out to investors or the general public, it's all too easy for modelers or managers to act as if things are perfectly under control, when in reality they're lurching around in the dark. Like the rest of us. . .
Comments (11)
+ TrackBacks (0) | Category: In Silico | Infectious Diseases
March 23, 2006
Posted by Derek
Here's a limits-to-knowledge post for you. On Wednesday, when I was cranking out a batch of an intermediate we're using these days, I needed to separate two fairly closely related compounds (which I'll call A and B) from each other. One surefire way to have done that was chromatography, but I just didn't have time for that. While I was rota-vapping down the mixture, I noticed that some white crystals were starting to come out of the methylene chloride solution, so I took the flask off and checked a small sample of the solid. Sure enough, it was pretty pure A, so I filtered that off and continued.
Taking out all the solvent left me with more white stuff, which was mostly B, with some A still hanging in there. In the past, we'd purified B by crystallizing it from another solvent mixture (ethyl acetate/hexane, the first combination the lazy - or just plain experienced - organic chemist reaches for). So I tried that out, dissoving the solid in a small-to medium amount of hot ethyl acetate, then adding hexane while it was still warm. I cooled the solution down by dipping the flask in ice water until it had come down to about room temperature, and was swirling it around when suddenly it starting snowing white powder. Ta-daa! A check of this stuff showed that it was almost completely pure B. The solution, for its part, was now a majority of A with some B left around. I took what I had and ran with it - this was one of the bird-in-the-hand situations, because people were waiting on this stuff.
My point is that such things are almost completely empirical. I've never heard of anyone who could predict from first principles what solvent system to use to get something to crystallize. I'd be tremendously impressed if anyone could take the structures of my two compounds, feed them into a dissolvo-matic program and announce "Yep, methylene chloride for A, and ethyl acetate-hexane for B. That'll do the trick."
As far as I know, there's no such thing, and no one is even close. I'd be glad to hear if I'm wrong. But if we can't predict, even just in rank order, what solvents will dissolve (or crash out) a given molecule, just how good is our molecular modeling, anyway?
Comments (8)
+ TrackBacks (0) | Category: In Silico
November 1, 2005
Posted by Derek
I mentioned an interesting paper that's coming out in the Journal of Medicinal Chemistry on molecular modeling. It's a long one from a large group of people scattered across GlaxoSmithKline's worldwide research facilities, entitled "A Critical Assessment of Docking Programs and Scoring Functions." And that's what it is, all right.
For the non-med-chem readers, those are two of the key techniques in computational molecular modeling. Docking refers to taking a modeled version of your small molecule and trying to fit it into a similarly modeled version of the binding site of your protein target. The program ties to take into account the size and shape of the molecule and the binding site, of course, as well as more subtle interactions between the various functional groups. Scoring functions are what the programs use to try to rate how well the docking procedure went for a given compound, and to compare it to others in a given data set.
The GSK team did a very thorough job, evaluating ten different docking programs. They started with seven varying types of protein targets, mostly different classes of enzymes, all of which are known drug targets. An expert computational chemist took each one and polished up the model of the binding site. At the same time, lists of between one and two hundred potential binding compounds were put together for each target, including several series of related compounds. Another modeling chemist took these structures and got them ready for docking. They made sure that a crystal structure of each structural class was known for each case (to check the accuracy of the modeling later on), and also made sure that the binding affinity of the compounds ranged over at least four orders of magnitude (from pretty darn good, in other words, to pretty darn awful). The goal was to make the whole exercise as real-world as possible. Then each of those binding site models and their associated lists of potential ligands were turned over to separate chemists with experience in the various docking programs, and they told them to have at it. As the paper puts it:
"To optimize the performance of each docking program, computational chemists with expertise in a particular program were identified from the worldwide GSK computational chemistry community. Each program expert was given complete freedom and sufficient time to maximize the performance of the docking program. . .No time deadlines were imposed so that even low-throughput docking programs could be evaluated. Indeed, no constraints whatsoever were placed on the level of agonizing over details of how each docking program was applied."
It's important to remember that the results of this paper come from experienced users who had a great deal of knowledge about the targets, and all the time they needed to mess with them. The aformentioned agonizing was devoted to three typical kinds of question that such software is designed to answer: The first was: what is the conformation (the 3-D physical "pose") of a small molecule once it's in a binding site? This is why they picked all these things with known crystal structures, since those provide a check with real data. Results of this test were OK, in some cases fairly good. Some of the target proteins seemed to have binding sites that were more suited for the capabilities of the programs, which could take the majority of the compounds in their list and fit them pretty close (within two angstroms) to the known crystal structures.
And every target had at least one program that could take at least a third or so of the test compounds and dock them fairly well. But the problem was, no one program could do that for more than 35% of the binding modes. The best performances were scattered among the different software packages, and there seems to be absolutely no way to know in advance whether a given program is going to perform well on a new target. The other problem, and it's a big one, was that the scoring functions couldn't reliably identify when the program had hit on one of the good answers. There wasn't much correlation between what the program thought was a well-docked conformation and its resemblance to the known crystal structure.
The second question they looked at was: given a list of molecules (some active, some inactive), how well can the software pick out some active ones? This process is often known as "virtual screening". Again, the results were fairly good, but with some significant problems. For all but one of the targets, at least one of the programs could find at least half of the top 10% of the active compounds. (I know, that sounds like a lot of defensive hedging compared to what some people think these programs can do, but that's the real world for you). The programs also did pretty well at pulling a variety of structures out, and not just making their total by grabbing only the members of one particular class.
But that fairly-decent performance is for the programs as a group. As before, though, the best performances were scattered through all the software packages, with no real standout. Most of the programs, at one point or another, had to grind through a significant amount of a compound lists to do the job, too, which is something you really don't want in real-world use. Another disturbing result was that some of the scoring functions seemed to be picking the right compounds for the wrong reasons – that is, based on incorrect binding modes.
Now we're ready for the third question, a hard one which (in my experience) is one of the ones that medicinal chemists most would like molecular modeling software to answer: given a list of compounds, can the program rank-order them according to their expected affinity for the target? Unfortunately, the answer is "absolutely not." No scoring function in any of the software packages could even come close. The compounds that the programs ranked as winners were just as likely to stink, and the ones that they put into the discard heap were just as likely to be fine.
My way of looking at the first two tests is to say that if you have just one molecular modeling package, it is guaranteed to mislead you a fair amount of the time. And you have no way of knowing when it's doing that. If you have more than one program to work with, though, then they are guaranteed to disagree with each other a fair amount of the time, and you have no way of knowing which one of them is right – if either. I'll let the authors have last word on the third test, and on the software in general:
". . .in the area of rank-ordering or affinity prediction, reliance on a scoring function alone will not provide broadly reliable or useful information. . .This study demonstrates unequivocally that significant improvements are needed before compound scoring by docking algorithms will routinely have a consistent and major impact on lead optimization. . .it is not completely obvious by what means these improvements will arise. . ."
Comments (5)
+ TrackBacks (0) | Category: In Silico
September 29, 2005
Posted by Derek
A comment to the last post really gave me the shivers:
"I like to think of modelling as the "silent killer". It is easy to rely on it for quick answers, and easy to forget that there is no substitute for an actual experiment. . .
I remember asking a fellow scientist if a particular molecule performed as hypothesized, the response was: " I don't know. It did not dock well into the enzyme, so I didn't make it."
I've made this point before, but it needs to be made again: molecular modeling is not reality. Most models are not that good, or only good around a limited group of rather similar compounds. If you as a medicinal chemist are crossing out easy-to-make compounds in unexplored chemical space just because the software doesn't like it, you are handcuffing yourself and tying your thumbs together. Stop it, stop it for your own good, or you may never discover anything unexpected or useful.
"The silent killer": I like that phrase a lot. I get the occasional testy e-mail from the computational types when I talk like this, but I'm sticking to my beliefs here. Molecular models based on numerous high-resolution X-ray structures are, I think, sort of useful, sometimes. Models based on only one X-ray structure are to be approached with great caution. And binding models that are just calculated up de novo should be treated as hazardous to your scientific health, unless you have a great deal of evidence to make you think otherwise.
OK, you silicon jockeys, go ahead and flood my in-box. I've earned it.
Comments (7)
+ TrackBacks (0) | Category: In Silico
September 28, 2005
Posted by Derek
We medicinal chemists spend our days trying to make small molecules that bind to targets in living systems. Almost all of those targets are proteins of one sort or another, and most of them have binding pockets already built into them, which we're trying to hijack for our own purposes. Molecular modelers try to figure out how these things fit together, but there are still a lot of unknowns in what would seem so basic a process.
I'm willing to bet that most chemists and biologists have a mental picture of a small molecule ligand fitting into a binding site which involves the protein sort of folding down around things - gently biting down on the ligand, as it were. It seems intuitively obvious that a protein's motions would settle down once it complexes with its target molecule.
And like a lot of intuitively obvious things in drug research, that idea appears to be mistaken. There's a recent study in the Journal of Medicinal Chemistry from a group at Michigan that tackles this question in a rigorous manner. They looked through the X-ray crystal structure data banks for proteins that have had high-quality structures determined both with and without small molecules bound in them. After controlling for experimental conditions (the temperature that the X-ray structure was taken at, among other things) and for the way the data were processed, they still had a few dozen closely matched pairs.
What they found was that in most of these structures, at least some of the atoms in and near the binding site are more mobile when there's a ligand bound. At times, the effect was pretty dramatic, with the entire binding site becoming more flexible, weirdly enough. Examples where everything got less mobile were found, but that only happened in a minority of the cases. The proteins the authors studied were scattered across a wide range of structural and functional classes, and there's no reason to think that they hit on an anomalous data set.
So, we're going to have to adjust our mental pictures, and the molecular modelers will have to adjust their simulations. I'd like to know just how many of those in silico models of binding would have predicted this greater flexibility. I fear that the answer is "darn near none of them". We have a long way to go.
Comments (9)
+ TrackBacks (0) | Category: In Silico
September 6, 2005
Posted by Derek
I recall a project earlier in my career where we'd all been beating on the same molecular series for quite a while. Many regions of the molecule had been explored, and my urge was often to leave the reservation. I put some time into extending the areas we knew about, but I wanted to go off and make something that didn't look like anything that we'd done before.
Which I did sometimes, and then I'd often get asked: "Why did you make that compound?" My answer was simply "Because no one had ever messed with that area before, and I wanted to see what would happen." Reactions to that approach varied. Some folks found that a perfectly reasonable answer, sufficient by itself. Others didn't care for it much. "You have to have a hypothesis in mind," they'd say. "Are you trying to improve the pharmacokinetics? Fix a metabolic problem? Pick up a binding interaction that you think is out there in the XYZ loop of the protein? You can't just. . .make stuff."
I respected the people in that first group a lot more than I did the ones in the second. I thought then, and think now, that you can just go make stuff. In fact, you not only can, but you should. You probably don't want to spend all your time doing that, but if you never do it at all, you're going to miss the best surprises.
I take issue with the idea that there has to be a specific hypothesis behind every compound. That supposes amounts of knowledge that we just don't have. Most of the time, we don't know why our PK is acting weird, and we're not sure about the metabolic fate of the compounds. And we sure don't know their binding mode well enough to sit at our desks and talk about what amino acids in the protein backbone we're reaching out for. (OK, if you've got half a dozen X-ray structures of your ligands bound in the active site of your target, you have a much better idea. But if your next compound breaks new structural ground, off you may well go into a different binding mode, and half your presuppositions will go, too.)
I like to think that I've come to realize just how ignorant I am in issues of drug discovery. (In case you have any doubt, I'm very ignorant indeed.) But I still hear people confidently sizing up new analog ideas on the blackboard, though: No, that one won't bind well in the Whoozat region. Doesn't have the right spacing. And that one should be able to reach out to that hydrophobic pocket we all know about. Let's make that one first. (These folks are talking without X-ray structures in hand, mind you.)
Well, if it makes you feel better, then go ahead, I suppose. But this kind of thing is one tiny step up from lucky rabbit feet, for which there is still a market.
Comments (4)
+ TrackBacks (0) | Category: In Silico | Who Discovers and Why
August 17, 2004
Posted by Derek
I'm going to take off from another comment, this one from Ron, who asks (in reference to the post two days ago): "would it not be fair to say that cellular biochemistry gets even more complicated the more we learn about it?
It would indeed be fair. I think that as a scientific field matures it goes through several stages. Brute-force collection of facts and observations comes early on, as you'd figure. Then the theorizing starts, with better and better theories being honed by more targeted experiments. This phase can be mighty lengthy, depending on the depth of the field and the number of outstanding problems it contains. A zillion inconsistent semi-trivialities can take a long time to sort out (think of the mathematical proof of the Four-Color Theorem), as can a smaller number of profound headscratchers (like, say, a reconciliation of quantum mechanics with relativity as they deal with gravity.)
If the general principles discovered are powerful enough, things can get simpler to understand. Think of the host of problems that early 20th-century physics had, many of which resolved themselves as applications of quantum mechanics. Earlier, chemistry went through something similar earlier, on a smaller scale, with the adoption of the stereochemical principles of van't Hoff. Suddenly, what seemed to be several separate problems turned out to be facets of one explanation: that atoms had regular three-dimensional patterns of bonding to other atoms. (If that sounds too obvious for such emphasis, keep in mind that this notion was fiercely ridiculed at resisted at the time.)
Cell biology is up to its pith helmet in hypotheses, and is nowhere near out of the swamps of fact collection. As in all molecular biology, the sheer number of different systems is making for a real fiesta. Your average cell is a morass of interlocking positive and negative feedback loops, many of which only show up fleetingly, under certain conditions, and in very defined locations. Some general principles have been established, but the number of things that have to be dealt with is still increasing, and I'm not sure when it's going to level out.
For example, the other day a group at Sugen (now Pfizer) published a paper establishing just how many genes there are in mice that code for protein kinase enzymes. Through adding phosphoryl groups, these enzymes are extremely important actors in the activation, transport, and modulation of the activities of thousands upon thousands of other proteins, and it turns out that there are exactly 540 of them. (Doubtless there are some variations as they get turned into proteins, but that's how many genes there are.) And that's that.
Now, that earlier discovery of protein phosphorylation as a signaling mechanism was a huge advance, and it has been appropriately rewarded. And knowing just how many different kinase enzymes there are is a step forward, too. But figuring out all the proteins they interact with, and when, and where, and what happens when they do - well, that's first cousin to hard work.
Comments (0)
+ TrackBacks (0) | Category: Biological News | In Silico
August 15, 2004
Posted by Derek
Reader Maynard Handley, in a comment to the most recent post below, asks:
". . .how far are we from doing at least a substantial fraction of this stuff in silico? I've read that some amazing computational models of full cells now exist, but even so, this author didn't expect that drugs could be usefully tested computationally until 2030 which seems awfully far out."
I don't know the article that he's referring to, but "awfully far out" pretty much sums up my reaction, too. I just don't think we have enough data to do any real whole-cell modeling yet. It's coming, and perhaps for a few very well-worked-out subsystems we could do it now, but I'm sceptical even of that.
A few days reading the current cell biology literature will illustrate the problem. All sorts of proteins are found, all the time, to be players in systems that no one suspected them of being involved it. Kinases are found to phosphorylate things that no one had seen them do before, lipases are found to accept substrates that no one had realized they could. A given signaling peptide is gradually found to have more uses than a Swiss army knife. We don't even really understand the basic mechanisms (like G-protein-coupled receptor signaling) enough to model them to any useful level.
The process of finding these things out doesn't seem like it's going to end soon, and there have to be many fundamental surprises waiting for us. Modeling the system in their absence is going to be risky - interesting, no doubt, and potentially lucrative (if you find a useful approximation), but risky. It's going to take some pretty convincing stuff for the drug industry to ever depend on it.
And all of this applies to single cells, which come in, naturally, an uncounted variety, each with its own peculiarities, the great majority of which we don't have any clue about. And then you come to the interactions between cells, which are highly significant and (in many ways) a closed book to us at present. If we knew more about these things, we'd be able, for example, to culture human cell lines that acted just like their primary tissue progenitors - but we can't do it, not yet.
No, although I have every belief that these things are susceptible to modeling, I just don't think we'll see it (on a useful scale) any time soon. Over the next twenty years, I'd expect to have some of the easier-to-handle cellular subsystems worked out to give robust in silico treatments, but a whole cell? And all the types of whole cells? Much longer than that. More than that I can't guess.
Comments (3)
+ TrackBacks (0) | Category: In Silico
April 15, 2004
Posted by Derek
Thinking about molecular modeling, as I did in the last post, brings up another topic: when you go back to the late 1980s, in the real manic phase of the technological hype, what brings you up short is realizing that these folks were planning on doing all this with 1980s hardware.
That puts things in perspective. Here we are in 2004, and we still can't just sit down and design a drug from first principles. Don't believe anyone who tells you that we can, either - if that were possible, there would be a lot more drugs out there. I'm not saying that molecular modeling never makes a contribution (I know better, and from personal experience.) It's just that it hasn't (yet) caught up to the hallucinations of fifteen or twenty years ago, which is entirely the fault of the people who were doing the hallucinating.
You can make the same comments about other waves of hype that have broken over the pharmaceutical world (combinatorial chemistry comes immediately to mind.) What I'm wondering is: what's the hype of today? There's bound to be a hot new idea that's going to solve our problems, but will end up changed beyond recognition after twenty years of the real world. Any votes on what's going to look faintly ridiculous to us in 2024? As you'd guess, I have some candidates of my own. . .
Comments (2)
| Category: Drug Industry History | In Silico
April 14, 2004
Posted by Derek
Molecular modeling is a technology with a past. Specifically, it's a past of overoptimistic predictions (often made, to be fair, by people who didn't understand what they were talking about.) Back in the late 1980s, when I started in the drug industry, modeling was going to take over the world and pretty darn soon, too. Several companies were founded to take advantage of this brave new world that had such software in it, and they raised serious money with tales of how they were just going to zzzzzip right to the drug structures. No dead ends, no detours, no cast of thousands - just a few chemists standing by to make the structure as it printed out for them. This has not quite worked out.
For those not in the business, modeling is the attempt to figure out molecular shapes, properties, and interactions by computation. There are many levels, some more successful than others. The ones I'm speaking of involve predicting three-dimensional shapes of molecules (and their target binding sites), and deciding which ones are more likely to fit well. It sounds like just what we need. It also sounds reasonably doable, in the same way that Hercules was probably told at first that he was going to just have to round up a few stray animals.
Predicting the shapes involves modeling the individual chemical bonds, and the interactions as the atoms and functional groups rotate around them, banging into each other or sticking through various forces. Originally, these things were calculated as if they were in interstellar space, with nothing around them. Later (and ever since) a number of methods to add some real-world solvent effects have been tried.
Another set of programs evaluates intermolecular fits, trying to work out the energies in play when a drug molecule slides into its binding site. Many tricky refinements have been added to those packages over the years, too, taking advantage of the latest insights into how various groups stack, pack, and interact.
And often enough, it just isn't enough. Many times the structures we have for our binding sites aren't accurate - the best ones are from X-ray crystallography, and plenty of good stuff just doesn't crystallize. (There are other cases where the crystal structure doesn't bear much relation to what's going on inside the real system, too, just to keep everyone on their toes.) Modeling goes haywire for all kinds of reasons.
One of the companies that emerged back in the change-the-world era of modeling was Vertex, up in Cambridge. It was founded by Joshua Boger, a Merck chemist who wanted a piece of the new thing and wasn't sure that Merck was taking it seriously enough. Well, coming soon in the Journal of Medicinal Chemistry (it's in the web preprint section now) is a paper from Vertex which gives us all some idea of why things didn't work out quite as planned.
The Vertex guys went back over about 150 cases, and found that in the majority of them, the structure of the small molecule in its binding pocket wasn't the structure you would have predicted as the best (read: lowest-energy.) In many of them, it isn't even close. You'd literally never have picked some of these conformations to start a modeling effort - they look very disfavored, and if you're going to pick things that far from the ground state then there's no end to it. The number of structures gets worse very rapidly as you move away from the local energy minima.
We in the business had suspected as much, and everyone knew of an example or two, but this is a quantitative look at just how bad the situation is. When you add in the cases where the binding site changes its conformation unexpectedly in response to the ligand, it's a wonder that any modeling efforts work at all. (Frankly, in my experience, they mostly don't, but I'm willing to stipulate that my experience has been more negative than the average.)
I like to say that molecular modeling is a magic wand, one that we keep waving in the hope that sparks will eventually start to shoot out of it. Someday they will. But there's a lot more hard work ahead, and no shortcuts in sight.
Comments (0)
| Category: Drug Industry History | In Silico
|
|