About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« A Specific Crowdfunding Example | Main | And The Award For Clinical Futility Goes To. . . »

May 15, 2013

GSK's Published Kinase Inhibitor Set

Email This Entry

Posted by Derek

Speaking about open-source drug discovery (such as it is) and sharing of data sets (such as they are), I really should mention a significant example in this area: the GSK Published Kinase Inhibitor Set. (It was mentioned in the comments to this post). The company has made 367 compounds available to any academic investigator working in the kinase field, as long as they make their results publicly available (at ChEMBL, for example). The people at GSK doing this are David Drewry and William Zuercher, for the record - here's a recent paper from them and their co-workers on the compound set and its behavior in reporter-gene assays.

Why are they doing this? To seed discovery in the field. There's an awful lot of chemical biology to be done in the kinase field, far more than any one organization could take on, and the more sets of eyes (and cerebral cortices) that are on these problems, the better. So far, there have been about 80 collaborations, mostly in Europe and North America, all the way from broad high-content phenotypic screening to targeted efforts against rare tumor types.

The plan is to continue to firm up the collection, making more data available for each compound as work is done on them, and to add more compounds with different selectivity profiles and chemotypes. Now, the compounds so far are all things that have been published on by GSK in the past, obviating concerns about IP. There are, though, a multitude of other compounds in the literature from other companies, and you have to think that some of these would be useful additions to the set. How, though, does one get this to happen? That's the stage that things are in now. Beyond that, there's the possibility of some sort of open network to optimize entirely new probes and tools, but there's plenty that could be done even before getting to that stage.

So if you're in academia, and interested in kinase pathways, you absolutely need to take a look at this compound set. And for those of us in industry, we need to think about the benefits that we could get by helping to expand it, or by starting similar efforts of our own in other fields. The science is big enough for it. Any takers?

Comments (22) + TrackBacks (0) | Category: Academia (vs. Industry) | Biological News | Chemical News | Drug Assays


1. Varadh on May 15, 2013 9:01 AM writes...

Looks a good beginning towards the future through open network and towards open source.

Permalink to Comment

2. Varadh on May 15, 2013 9:02 AM writes...

Looks a good beginning towards the future through open network and towards open source.

Permalink to Comment

3. Bill on May 15, 2013 9:04 AM writes...

Is there a website for the GSK Published Kinase Inhibitor Set? Because you would think there would be, but I can't find it, which doesn't speak so well for the effort.

Permalink to Comment

4. ChrisL on May 15, 2013 9:21 AM writes...

Like (3) Bill I am having trouble getting the structures for the GSK set. The "recent paper" PlosOne reference that Derek links to did not help me. The references in the paper to PubChem data deposition were totally unhelpful. Why cannot the chemistry structures be deposited as a plain vanilla sdf file that everyone can read and examine in their favorite software? This is a good example of how to obfuscate any reasonable medicinal chemistry discussion by hiding the chemistry data in nearly inaccessible formats.

Permalink to Comment

5. DCRogers on May 15, 2013 9:32 AM writes...

"The company has made 367 compounds available to any academic investigator working in the kinase field"

Sadly, there is less here than meets the eye -- I want more negative data. Without it, modeling is well-neigh useless -- only able to regurgitate the blindingly obvious surrounded by vast uncovered spaces of I-don't-know-Jack.

The authors acknowledge the importance of negative data stating "importantly [includes] compounds inactive at their original kinase target". But it's doubtful to me that a mere 367 compounds is capturing more than a fragment of what can go structurally wrong, or provide even a glimpse of the variety of structural motifs that might be worth exploring.

That said, I bet they have kinase assay data that would make me drool -- I'll wait until they make that available before I pop the champagne cork.

Permalink to Comment

6. William Zuercher on May 15, 2013 9:35 AM writes...

@Bill and @ChrisL: Thank you for the interest in seeing the compounds in the set. We wish to eschew any obfuscation, so ’ll see if we can work with your suggestions on a simple link to a PDF or other easily digestible way to access the structures and data. In the meantime, here is how to get the structures and data from ChEMBL: from the main ChEMBL site (, follow the “Activity Source Filter” link. Deselect all options save the “GSK Published Kinase Inhibitor Set” and click update. Any search will now be conducted only on the PKIS. To retrieve all data, enter the wildcard “%” into the search field and search for compound, target, or assay data with the three buttons to the right.

Any further suggestions, questions, or comments are most welcome!

Permalink to Comment

7. William Zuercher on May 15, 2013 9:40 AM writes...

I don't know why the funky characters popped into my last post. The wildcard symbol for ChEMBL searching is the percent sign.

@DCRogers: A large and growing body of data is available at ChEMBL.

Permalink to Comment

8. Chris Hayes on May 15, 2013 9:46 AM writes...

I'm an academic in the UK, and we have recently used the GSK PKIS set in a phenotypic screen. Bill Zuercher, and all involved at GSK, have been fantastic and they have helped us much more than I could have hoped. It has been very, very open indeed, and much more open than collaborations with some academic colleagues!

I would encourage all interested parties to contact GSK (that probably means Bill, as his e-mail is in the PLoS One paper!), and I'm sure that he will try to help you as much as he's helped us.

This is a fantastic (free!) resource for academics.

Many thanks Bill (If you are reading this post).

Permalink to Comment

9. DCRogers on May 15, 2013 11:26 AM writes...

@7: "A large and growing body of data is available at ChEMBL"

Sorry for being slightly sour in my previous note: you are correct that ChEMBL has a growing amount of interesting large-scale data sets. That said, you get a lot of noise because data samples arrives from different sources, with widely-varying quality and techniques.

To re-spin my comment in a positive direction, it would be great if the larger universe of associated screening data you must have searched around your compound set was available in ChEMBL.

(Or perhaps it is, and I need to update my slightly-musty database?)

Anyhow, I should have been a bit more appreciative of the efforts you must have gone through to encourage the release of this data set -- I realize it must have been quite a bit of work getting it out through legal, accounting, etc. You folks deserve much thanks for that!

Permalink to Comment

10. Matt Soellner on May 15, 2013 12:09 PM writes...

@DCRogers: they have deposited in ChEMBL binding data at 2 concentrations (1 uM and 100 nM) for a panel of 220 kinases. It's all there for the viewing whether you obtain the compounds or not. All of this data comes from Nanosyn.

If you google "GSK PKIS" you can find a few powerpoint presentations that David and Bill have given on the PKIS.

I'll echo the comments of others. My lab has this compound set and both Bill and David have been great to work with.

Permalink to Comment

11. NigelR on May 15, 2013 1:09 PM writes...

Given all of the data in ChEMBL is it possible to define both the set of published kinase inhibitors to give maximal interpretable of the kinome (ie not just maximising coverage by using pan inhibitors) and what profiles are still needed to increase that coverage ?

At least having a list of which kinase inhbitors would be valuable to be released by the less enlightened pharmas would be better than a random release. It would also help direct academic med chem efforts to the unrepresented regions.

Permalink to Comment

12. David Borhani on May 15, 2013 1:29 PM writes...

The CHEMBL download doesn't work for me. I selected only the GSK set, searched with %, and got zero hits. When I search with *, I get 1,225,703 hits (even though the source filter says GSK is the only source selected).

An SDF file from the source would make life a bit easier.

Permalink to Comment

13. William Zuercher on May 15, 2013 1:46 PM writes...

@David Borhani: The search worked earlier. As luck would have it, a new version of ChEMBL was just released today. If you email me ( your address, I will send an SD file.

Permalink to Comment

14. DCRogers on May 15, 2013 2:05 PM writes...

@10: "they have deposited in ChEMBL binding data at 2 concentrations (1 uM and 100 nM) for a panel of 220 kinases"

Thanks Matt, I will update my ChEMBL database and have a go!

Kudos to the authors, who deserve appreciation rather than brickbats. I fully withdraw my earlier comment -- this'll teach me to think twice before writing cranky messages prior to my first cup of coffee in the morning.

Permalink to Comment

15. David Drewry on May 16, 2013 12:30 PM writes...

Thank you, Derek, for posting this. As you mentioned we need more people thinking about (and talking about) the benefits of sharing compounds. Drug discovery is difficult, and we will make more headway in collaboration. Discovering new medicines is a rare event, but not because we are not trying, rather we just don't know enough.

Sharing well annotated compound sets so that many more experiments can be run is one way to improve our collective knowledge base and make discoveries.

Bill and I would also like to thank everyone for their comments and suggestions to improve data accessibility. Our friends at ChEMBL recently posted some information and instructions on their blog that will help:

Permalink to Comment

16. Anonymous on May 18, 2013 5:08 AM writes...

Derek you should post the MTA. Very restrictive. While an admirable first step for a FIPCO, this is more a batched standard MTA than open source drug discovery. We are still waiting on compounds more than 8 months after the inquiry. But I hope it works to guide GSK toward an outcome for patients, and that the experience leads to true open access.

Permalink to Comment

17. Anonymous on May 19, 2013 8:09 AM writes...

@Anonymous 16: We spent significant effort to make the agreement minimally restrictive and believe that the resulting MTA is consistent with the broad aim of openly advancing kinase science. Most of the collaborations have had no issue whatsoever with the MTA template, and we've been able to get the compound set into their hands within 4-6 weeks of initial contact. Any delays are due to changes requested by the recipient institution.

Permalink to Comment

18. William Zuercher on May 19, 2013 8:13 AM writes...

I posted comment 17.

Permalink to Comment

19. Steric clash on May 19, 2013 8:42 PM writes...

@18 Let's have a look!

Permalink to Comment

20. William Zuercher on May 21, 2013 2:43 PM writes...

@19: I am happy to provide a copy of the MTA upon email request.

Permalink to Comment

21. CDD Data Guy on June 24, 2013 11:56 AM writes...

Given the significant interest in this dataset, as well as the comments above that getting the data desired from ChEMBL was tricky, the data team at Collaborative Drug Discovery (CDD) have gathered the PKIS data that ChEMBL has kindly made available, and processed it so it could be accessed via CDD's public access web site ( Public access accounts are free).
The transfer to CDD makes the data available in a more med-chemist friendly manner. There was also some tidying up of the data set. For example there are actually only 364 compounds (some duplicates were due to salt forms or alternate names of the same molecule) and the target names were normalized where possible (for example, the kinases IKKA, IKKB and IKKE were called IKK-alpha, IKK-beta, IKK-epsilon for the dataset from UNC).

Permalink to Comment

22. Anonymous on July 10, 2013 11:17 AM writes...

Couldn't find a way to download the SD files from CDD. Is there a way to do it? Thanks.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry