This will be a long one. I'm going to take another look at the Science paper that stirred up so much comment here on Friday. In that post, my first objection (but certainly not my only one) was the chemical structures shown in the paper's Figure 2. A number of them are basically impossible, and I just could not imagine how this got through any sort of refereeing process. There is, for example, a cyclohexadien-one structure, shown at left, and that one just doesn't exist as such - it's phenol, and those equilibrium arrows, though very imbalanced, are still not drawn to scale.
Well, that problem is solved by those structures being intended as fragments, substructures of other molecules. But I'm still positive that no organic chemist was involved in putting that figure together, or in reviewing it, because the reason that I was confused (and many other chemists were as well) is that no one who knows organic chemistry draws substructures like this. What you want to do is put dashed bonds in there, or R groups, as shown. That does two things: it shows that you're talking about a whole class of compounds, not just the structure shown, and it also shows where things are substituted. Now, on that cyclohexadienone, there's not much doubt where it's substituted, once you realize that someone actually intended it to be a fragment. It can't exist unless that carbon is tied up, either with two R groups (as shown), or with an exo-alkene, in which case you have a class of compounds called quinone methides. We'll return to those in a bit, but first, another word about substructures and R groups.
Figure 2 also has many structures in it where the fragment structure, as drawn, is a perfectly reasonable molecule (unlike the example above). Tetrahydrofuran and imidazole appear, and there's certainly nothing wrong with either of those. But if you're going to refer to those as common fragments, leading to common effects, you have to specify where they're substituted, because that can make a world of difference. If you still want to say that they can be substituted at different points, then you can draw a THF, for example, with a "floating" R group as shown at left. That's OK, and anyone who knows organic chemistry will understand what you mean by it. If you just draw THF, though, then an organic chemist will understand that to mean just plain old THF, and thus the misunderstanding.
If the problems with this paper ended at the level of structure drawing, which many people will no doubt see as just a minor aesthetic point, then I'd be apologizing right now. Update: although it is irritating. On Twitter, I just saw that someone spotted "dihydrophyranone" on this figure, which someone figured was close enough to "dihydropyranone", I guess, and anyway, it's just chemistry. But they don't. It struck me when I first saw this work that sloppiness in organic chemistry might be symptomatic of deeper trouble, and I think that's the case. The problems just keep on coming. Let's start with those THF and imidazole rings. They're in Figure 2 because they're supposed to be substructures that lead to some consistent pathway activity in the paper's huge (and impressive) yeast screening effort. But what we're talking about is a pharmacophore, to use a term from medicinal chemistry, and just "imidazole" by itself is too small a structure, from a library of 3200 compounds, to be a likely pharmacophore. Particularly when you're not even specifying where it's substituted and how. There are all kinds of imidazole out there, and they do all kinds of things.
So just how many imidazoles are in the library, and how many caused this particular signature? I think I've found them all. Shown at left are the four imidazoles (and there are only four) that exhibit the activity shown in Figure 2 (ergosterol depletion / effects on membrane). Note that all four of them are known antifungals - which makes sense, given that the compounds were chosen for the their ability to inhibit the growth of yeast, and topical antifungals will indeed do that for you. And that phenotype is exactly what you'd expect from miconazole, et al., because that's their known mechanism of action: they mess up the synthesis of ergosterol, which is an essential part of the fungal cell membrane. It would be quite worrisome if these compounds didn't show up under that heading. (Note that miconazole is on the list twice).
But note that there are nine other imidazoles that don't have that same response signature at all - and I didn't even count the benzimidazoles, and there are many, although from that structure in Figure 2, who's to say that they shouldn't be included? What I'm saying here is that imidazole by itself is not enough. A majority of the imidazoles in this screen actually don't get binned this way. You shouldn't look at a compound's structure, see that it has an imidazole, and then decide by looking at Figure 2 that it's therefore probably going to deplete ergosterol and lead to membrane effects. (Keep in mind that those membrane effects probably aren't going to show up in mammalian cells, anyway, since we don't use ergosterol that way).
There are other imidazole-containing antifungals on the list that are not marked down for "ergosterol depletion / effects on membrane". Ketonconazole is SGTC_217 and 1066, and one of those runs gets this designation, while the other one gets signature 118. Both bifonazole and sertaconazole also inhibit the production of ergosterol - although, to be fair, bifonazole does it by a different mechanism. It gets annotated as Response Signature 19, one of the minor ones, while sertaconazole gets marked down for "plasma membrane distress". That's OK, though, because it's known to have a direct effect on fungal membranes separate from its ergosterol-depleting one, so it's believable that it ends up in a different category. But there are plenty of other antifungals on this list, some containing imidazoles and some containing triazoles, whose mechanism of action is also known to be ergosterol depletion. Fluconazole, for example, is SGTC_227, 1787 and 1788, and that's how it works. But its signature is listed as "Iron homeostasis" once and "azole and statin" twice. Itraconzole is SGTC_1076, and it's also annotated as Response Signature 19. Voriconazole is SGTC_1084, and it's down as "azole and statin". Climbazole is SGTC_2777, and it's marked as "iron homeostasis" as well. This scattering of known drugs between different categories is possibly and indicator of this screen's ability to differentiate them, or possibly an indicator of its inherent limitations.
Now we get to another big problem, the imidazolium at the bottom of Figure 2. It is, as I said on Friday, completely nuts to assign a protonated imidazole to a different category than a nonprotonated one. Note that several of the imidazole-containing compounds mentioned above are already protonated salts - they, in fact, fit the imidazolium structure drawn, rather than the imidazole one that they're assigned to. This mistake alone makes Figure 2 very problematic indeed. If the paper was, in fact, talking about protonated imidazoles (which, again, is what the authors have drawn) it would be enough to immediately call into question the whole thing, because a protonated imidazole is the same as a regular imidazole when you put it into a buffered system. In fact, if you go through the list, you find that what they're actually talking about are N-alkylimidazoliums, so the structure at the bottom of FIgure 2 is wrong, and misleading. There are two compounds on the list with this signature, in case you were wondering, but the annotation may well be accurate, because some long-chain alkylimidazolium compounds (such as ionic liquid components) are already known to cause mitochondrial depolarization.
But there are several other alkylimidazolium compounds in the set (which is a bit odd, since they're not exactly drug-like). And they're not assigned to the mitochondrial distress phenotype, as Figure 2 would have you think. SGTC_1247, 179, 193, 1991, 327, and 547 all have this moeity, and they scatter between several other categories. Once again, a majority of compounds with the Figure 2 substructure don't actually map to the phenotype shown (while plenty of other structural types do). What use, exactly, is Figure 2 supposed to be?
Let's turn to some other structures in it. The impossible/implausible ones, as mentioned above, turn out to be that way because they're supposed to have substituents on them. But look around - adamantane is on there. To put it as kindly as possible, adamantane itself is not much of a pharmacophore, having nothing going for it but an odd size and shape for grease. Tetrahydrofuran (THF) is on there, too, and similar objections apply. When attempts have been made to rank the sorts of functional groups that are likely to interact with protein binding sites, ethers always come out poorly. THF by itself is not some sort of key structural unit; highlighting it as one here is, for a medicinal chemist, distinctly weird.
What's also weird is when I search for THF-containing compounds that show this activity signature, I can't find much. The only things with a THF ring in them seem to be SGTC_2563 (the complex natural product tomatine) and SGTC_3239, and neither one of them is marked with the signature shown. There are some imbedded THF rings as in the other structural fragments shown (the succinimide-derived Diels-Alder ones), but no other THFs - and as mentioned, it's truly unlikely that the ether is the key thing about these compounds, anyway. If anyone finds another THF compound annotated for tubulin folding, I'll correct this post immediately, but for now, I can't seem to track one down, even though Table S4 says that there are 65 of them. Again, what exactly is Figure 2 supposed to be telling anyone?
Now we come to some even larger concerns. The supplementary material for the paper says that 95% of the compounds on the list are "drug-like" and were filtered by the commercial suppliers to eliminate reactive compounds. They do caution that different people have different cutoffs for this sort of thing, and boy, do they ever. There are many, many compounds in this collection that I would not have bothered putting into a cell assay, for fear of hitting too many things and generating uninterpretable data. Quinone methides are a good example - as mentioned before, they're in this set. Rhodanines and similar scaffolds are well represented, and are well known to hit all over the place. Some of these things are tested at hundreds of micromolar.
I recognize that one aim of a study like this is to stress the cells by any means necessary and see what happens, but even with that in mind, I think fewer nasty compounds could have been used, and might have given cleaner data. The curves seen in the supplementary data are often, well, ugly. See the comments section from the Friday post on that, but I would be wary of interpreting many of them myself.
There's another problem with these compounds, which might very well have also led to the nastiness of the assay curves. As mentioned on Friday, how can anyone expect many of these compounds to actually be soluble at the levels shown? I've shown a selection of them here; I could go on. I just don't see any way that these compounds can be realistically assayed at these levels. Visual inspection of the wells would surely show cloudy gunk all over the place. Again, how are such assays to be interpreted?
And one final point, although it's a big one. Compound purity. Anyone who's ever ordered three thousand compounds from commercial and public collections will know, will be absolutely certain that they will not all be what they say on the label. There will be many colors and consistencies, and LC/MS checks will show many peaks for some of these. There's no way around it; that's how it is when you buy compounds. I can find no evidence in the paper or its supplementary files that any compound purity assays were undertaken at any point. This is not just bad procedure; this is something that would have caused me to reject the paper all by itself had I refereed it. This is yet another sign that no one who's used to dealing with medicinal chemistry worked on this project. No one with any experience would just bung in three thousand compounds like this and report the results as if they're all real. The hits in an assay like this, by the way, are likely to be enriched in crap, making this more of an issue than ever.
Damn it, I hate to be so hard on so many people who did so much work. But wasn't there a chemist anywhere in the room at any point?