Just how many different small-molecule binding sites are there? That's the subject of this new paper in PNAS, from Jeffrey Skolnick and Mu Gao at Georgia Tech, which several people have sent along to me in the last couple of days.
This question has a lot of bearing on questions of protein evolution. The paper's intro brings up two competing hypotheses of how protein function evolved. One, the "inherent functionality model", assumes that primitive binding pockets are a necessary consequence of protein folding, and that the effects of small molecules on these (probably quite nonspecific) motifs has been honed by evolutionary pressures since then. (The wellspring of this idea is this paper from 1976, by Jensen, and this paper will give you an overview of the field). The other way it might have worked, the "acquired functionality model", would be the case if proteins tend, in their "unevolved" states, to be more spherical, in which case binding events must have been much more rare, but also much more significant. In that system, the very existence of binding pockets themselves is what's under the most evolutionary pressure.
The Skolnick paper references this work from the Hecht group at Princeton, which already provides evidence for the first model. In that paper, a set of near-random 4-helical-bundle proteins was produced in E. coli - the only patterning was a rough polar/nonpolar alternation in amino acid residues. Nonetheless, many members of this unplanned family showed real levels of binding to things like heme, and many even showed above-background levels of several types of enzymatic activity.
In this new work, Skolnick and Gao produce a computational set of artificial proteins (called the ART library in the text), made up of nothing but poly-leucine. These were modeled to the secondary structure of known proteins in the PDB, to produce natural-ish proteins (from a broad structural point of view) that have no functional side chain residues themselves. Nonetheless, they found that the small-molecule-sized pockets of the ART set actually match up quite well with those found in real proteins. But here's where my technical competence begins to run out, because I'm not sure that I understand what "match up quite well" really means here. (If you can read through this earlier paper of theirs at speed, you're doing better than I can). The current work says that "Given two input pockets, a template and a target, (our algorithm) evaluates their PS-score, which measures the similarity in their backbone geometries, side-chain orientations, and the chemical similarities between the aligned pocket-lining residues." And that's fine, but what I don't know is how well it does that. I can see poly-Leu giving you pretty standard backbone geometries and side-chain orientations (although isn't leucine a little more likely than average to form alpha-helices?), but when we start talking chemical similarities between the pocket-lining residues, well, how can that be?
But I'm even willing to go along with the main point of the paper, which is that there are not-so-many types of small-molecule binding pockets, even if I'm not so sure about their estimate of how many there are. For the record, they're guessing not many more than about 500. And while that seems low to me, it all depends on what we mean by "similar". I'm a medicinal chemist, someone who's used to seeing "magic methyl effects" where very small changes in ligand structure can make big differences in binding to a protein. And that makes me think that I could probably take a set of binding pockets that Skolnick's people would call so similar as to be basically identical, and still find small molecules that would differentiate them. In fact, that's a big part of my job.
But in general, I see the point they're making, but it's one that I've already internalized. There are a finite number of proteins in the human body. Fifty thousand? A couple of hundred thousand? Probably not a million. Not all of these have small-molecule binding sites, for sure, so there's a smaller set to deal with right there. Even if those binding sites were completely different from one another, we'd be looking at a set of binding pockets in the thousands/tens of thousands range, most likely. But they're not completely different, as any medicinal chemist knows: try to make a selective muscarinic agonist, or a really targeted serine hydrolase inhibitor, and you'll learn that lesson quickly. And anyone who's run their drug lead through a big selectivity panel has seen the sorts of off-target activities that come up: you hit someof the other members of your target's family to greater or lesser degree. You hit the flippin' sigma receptor, not that anyone knows what that means. You hit the hERG channel, and good luck to you then. Your compound is a substrate for one of the CYP enzymes, or it binds tightly to serum albumin. Who has even seen a compound that binds only to its putative target? And this is only with the counterscreens we have, which is a small subset of the things that are really out there in cells.
And that takes me to my main objection to this paper. As I say, I'm willing to stipulate, gladly, that there are only so many types of binding pockets in this world (although I think that it's more than 500). But this sort of thing is what I have a problem with:
". . .we conclude that ligand-binding promiscuity is likely an inherent feature resulting from the geometric and physical–chemical properties of proteins. This promiscuity implies that the notion of one molecule–one protein target that underlies many aspects of drug discovery is likely incorrect, a conclusion consistent with recent studies. Moreover, within a cell, a given endogenous ligand likely interacts at low levels with multiple proteins that may have different global structures.
"Many aspects of drug discovery" assume that we're only hitting one target? Come on down and try that line out in a drug company, and be prepared for rude comments. Believe me, we all know that our compounds hit other things, and we all know that we don't even know the tenth of it. This is a straw man; I don't know of anyone doing drug discovery that has ever believed anything else. Besides, there are whole fields (CNS) where polypharmacy is assumed, and even encouraged. But even when we're targeting single proteins, believe me, no one is naive enough to think that we're hitting those alone.
Other aspects of this paper, though, are fine by me. As the authors point out, this sort of thing has implications for drawing evolutionary family trees of proteins - we should not assume too much when we see similar binding pockets, since these may well have a better chance of being coincidence than we think. And there are also implications for origin-of-life studies: this work (and the other work in the field, cited above) imply that a random collection of proteins could still display a variety of functions. Whether these are good enough to start assembling a primitive living system is another question, but it may be that proteinaceous life has an easier time bootstrapping itself than we might imagine.