Note: critique of this paper continues here, in another post.
A reader sent along a puzzled note about this paper that's out in Science. It's from a large multicenter team (at least nine departments across the US, Canada, and Europe), and it's an ambitious effort to profile 3250 small molecules in a broad chemogenomics screen in yeast. This set was selected from an earlier 50,000 compounds, since these realiably inhibited the growth of wild-type yeast. They're looking for what they call "chemogenomic fitness signatures", which are derived from screening first against 1100 heterozygous yeast strains, one gene deletion per, representing the yeast essential genome. Then there's a second round of screening against 4800 homozygous deletions strain of non-essential genes, to look for related pathways, compensation, and so on.
All in all, they identified 317 compounds that appear to perturb 121 genes, and many of these annotations are new. Overall, the responses tended to cluster in related groups, and the paper goes into detail about these signatures (and about the outliers, which are naturally interested for their own reasons). Broad pathway effects like mitrochondrial stress show up pretty clearly, for example. And unfortunately, that's all I'm going to say for now about the biology, because we need to talk about the chemistry in this paper. It isn't good.
As my correspondent (a chemist himself) mentions, a close look at Figure 2 of the paper raises some real questions. Take a look at that cyclohexadiene enamine - can that really be drawn correctly, or isn't it just N-phenylbenzylamine? The problem is, that compound (drawn correctly) shows up elsewhere in Figure 2, hitting a completely different pathway. These two tautomers are not going to have different biological effects, partly because the first one would exist for about two molecular vibrations before it converted to the second. But how could both of them appear on the same figure?
And look at what they're calling "cyclohexa-2,4-dien-1-one". No such compound exists as such in the real world - we call it phenol, and we draw it as an aromatic ring with an OH coming from it. Thiazolidinedione is listed as "thiazolidine-2,4-quinone". Both of these would lead to red "X" marks on an undergraduate exam paper. It is clear that no chemist, not even someone who's been through second-year organic class, was involved in this work (or at the very least, involved in the preparation of Figure 2). Why not? Who reviewed this, anyway?
There are some unusual features from a med-chem standpoint as well. Is THF really targeting tubulin folding? Does adamantane really target ubiquinone biosynthesis? Fine, these are the cellular effects that they noted, I guess. But the weirdest thing on Figure 2's annotations is that imidazole is shown as having one profile, while protonated imidazole is shown as a completely different one. How is this possible? How could anyone who knows any chemistry look at that and not raise an eyebrow? Isn't this assay run in some sort of buffered medium? Don't yeast cells have any buffering capacity of their own? Salts of basic amine drugs are dosed all the time, and they are not considered - ever - as having totally different cellular effects. What a world it would be if that were true! Seeing this sort of thing makes a person wonder about the rest of the paper.
More subtle problems emerge when you go to the supplementary material and take a look at the list of compounds. It's a pretty mixed bag. The concentrations used for the assays vary widely - rapamycin gets run at 1 micromolar, while ketoconazole is nearly 1 millimolar. (Can you even run that compound at that concentration? Or that compound at left at 967 micromolar? Is it really soluble in the yeast wells at such levels? There are plenty more that you can wonder about in the same way.
And I went searching for my old friends, the rhodanines, and there they were. Unfortunately, compound SGTC_2454 is 5-benzylidenerhodanine, whose activity is listed as "A dopamine receptor inhibitor" (!). But compound SGTC_1883 is also 5-benzylidenerhodanine, the same compound, run at similar concentration, but this time unannotated. The 5-thienylidenerhodanine is SGTC_30, but that one's listed as a phosphatase inhibitor. Neither of these attributions seem likely to me. There are other duplicates, but many of them are no doubt intentional (run by different parts of the team).
I hate to say this, but just a morning's look at this paper leaves me with little doubt that there are still more strange things buried in the chemistry side of this paper. But since I work for a living (dang it), I'm going to leave it right here, because what I've already noted is more than troubling enough. These mistakes are serious, and call the conclusions of the paper into question: if you can annotate imidazole and its protonated form into two different categories, or annotate two different tautomers (one of which doesn't really exist) into two different categories, what else is wrong, and how much are these annotations worth? And this isn't even the first time that Science has let something like this through. Back in 2010, they published a paper on the "Reactome" that had chemists around the world groaning. How many times does this lesson need to be learned, anyway?
Update: this situation brings up a number of larger issues, such as the divide between chemists and biologists (especially in academia?) and the place of organic chemistry in such high-profile publications (and the place of organic chemists as reviewers of it). I'll defer these to another post, but believe me, they're on my mind.
Update 2 Jake Yeston, deputy editor at Science, tells me that they're looking into this situation. More as I hear it.
Update 3: OK, if Figure 2 is just fragments, structural pieces that were common to compounds that had these signatures, then (1) these are still not acceptable structures, even as fragments, and (2), many of these don't make sense from a medicinal chemistry standpoint. It's bizarre to claim a tetrahydrofuran ring (for example) as the key driver for a class of compounds; the chance that this group is making an actual, persistent interaction with some protein site (or family of sites) is remote indeed. The imidazole/protonated imidazole pair is a good example of this: why on Earth would you pick these two groups to illustrate some chemical tendency? Again, this looks like the work of people who don't really have much chemical knowledge.
A closer look at the compounds themselves does not inspire any more confidence. There's one of them from Table S3, which showed a very large difference in IC50 across different yeast strains. It was tested at 400 micromolar. That, folks, was sold to the authors of this paper by ChemDiv, as part of a "drug-like compound" library. Try pulling some SMILES strings from that table yourself and see what you think about their drug likeness.