The Science paper on chemogenomic signatures that I went on about at great length has been revised. Figure 2, which drove me and every other chemist who saw it up the wall, has been completely reworked:
To improve clarity, the authors revised Fig. 2 by (i) illustrating the substitution sites of fragments; (ii) labeling fragments numerically for reference to supplementary materials containing details about their derivation; and (iii) representing the dominant tautomers of signature compounds. The authors also discovered an error in their fragment generation software that, when corrected, resulted in slightly fewer enriched fragments being identified. In the revised Fig. 2, they removed redundant substructures and, where applicable, illustrated larger substructures containing the enriched fragment common among signature compounds.
Looking it over in the revised version, it is indeed much improved. The chemical structures now look like chemical structures, and some of the more offensive "pharmacophores" (like tetrahydrofuran) have now disappeared. Several figures and tables have been added to the supplementary material to highlight where these fragments are in the active compounds (Figure S25, an especially large addition), and to cross-index things more thoroughly.
So the most teeth-gritting parts of the paper have been reworked, and that's a good thing. I definitely appreciate the work that the authors have put into making the work more accurate and interpretable, although these things really should have been caught earlier in the process.
Looking over the new Figure S25, though, you can still see what I think are the underlying problems with the entire study. That's the one where "Fragments that are significantly enriched in specific sets of signature compounds (FDR ≤ 0.1 and signature compounds fraction ≥ 0.2) are highlighted in blue within the relevant signature compounds. . .". It's a good idea to put something like that in there, but the annotations are a bit odd. For example, the compounds flagged as "6_cell wall" have their common pyridines highlighted, even though there's a common heterocyclic core that that all but one those pyridines are attached to (it only varies by alkyl substitutents). That single outlier compound seems to be the reason that the whole heterocycle isn't colored in - but there are plenty of other monosubstituted pyridines on the list that have completely different signatures, so it's not like "monosubstituted pyridine" carries much weight. Meanwhile, the next set ("7_cell wall") has more of the exact same series of heterocycles, but in this case, it's just the core heterocycle that's shaded in. That seems to be because one of them is a 2-substituted isomer, while the others are all 3-substituted, so the software just ignores them in favor of coloring in the central ring.
The same thing happens with "8_ubiquinone biosynthesis and proteosome". What gets shaded in is an adamantane ring, even though every single one of the compounds is also a Schiff base imine (which is a lot more likely to be doing something than the adamantane). But that functional group gets no recognition from the software, because some of the aryl substitution patterns are different. One could just as easily have colored in the imine, though, which is what happens with the next category ("9_ubiquinone biosynthesis and proteosome"), where many of the same compounds show up again.
I won't go into more detail; the whole thing is like this. Just one more example: "12_iron homeostasis" features more monosubstituted pyridines being highlighted as the active fragment. But look at the list: there's are 3-aminopyridine pieces, 4-aminomethylpyridines, 3-carboxylpyridines, all of them substituted with all kinds of stuff. The only common thread, according to the annotation software, is "pyridine", but those are, believe me, all sorts of different pyridines. (And as the above example shows, it's not like pyridines form some sort of unique category in this data set, anyway).
So although the most eye-rolling features of this work have been cleaned up, the underlying medicinal chemistry is still pretty bizarre, at least to anyone who knows any medicinal chemistry. I hate to be this way, but I still don't see anyone getting an awful lot of use out of this.