About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« An Anticoagulant Antidote | Main | Another Big Genome Disparity (With Bonus ENCODE Bashing) »

May 10, 2013

Why Not Share More Bioactivity Data?

Email This Entry

Posted by Derek

The ChEMBL database of compounds has been including bioactivity data for some time, and the next version of it is slated to have even more. There are a lot of numbers out in the open literature that can be collected, and a lot of numbers inside academic labs. But if you want to tap the deepest sources of small-molecule biological activity data, you have to look to the drug industry. We generate vast heaps of such; it's the driveshaft of the whole discovery effort.

But sharing such data is a very sticky issue. No one's going to talk about their active projects, of course, but companies are reluctant to open the books even to long-dead efforts. The upside is seen as small, and the downside (though unlikely) is seen as potentially large. Here's a post from the ChEMBL blog that talks about the problem:

. . .So, what would your answer be if someone asked you if you consider it to be a good idea if they would deposit some of their unpublished bioactivity data in ChEMBL? My guess is that you would be all in favour of this idea. 'Go for it', you might even say. On the other hand, if the same person would ask you what you think of the idea to deposit some of ‘your bioactivity data’ in ChEMBL the situation might be completely different.

First and foremost you might respond that there is no such bioactivity data that you could share. Well let’s see about that later. What other barriers are there? If we cut to the chase then there is one consideration that (at least in my experience) comes up regularly and this is the question: 'What’s in it for me?' Did you ask yourself the same question? If you did and you were thinking about ‘instant gratification’ I haven’t got a lot to offer. Sorry, to disappoint you. However, since when is science about ‘instant gratification’? If we would all start to share the bioactivity data that we can share (and yes, there is data that we can share but don’t) instead of keeping it locked up in our databases or spreadsheets this would make a huge difference to all of us. So far the main and almost exclusive way of sharing bioactivity data is through publications but this is (at least in my view) far too limited. In order to start to change this (at least a little bit) the concept of ChEMBL supplementary bioactivity data has been introduced (as part of the efforts of the Open PHACTS project,

There's more on this in an article in Future Medicinal Chemistry. Basically, if an assay has been described in an open scientific publication, the data generated through it qualifies for deposit in ChEMBL. No one's asking for companies to throw open their books, but even when details of a finished (or abandoned) project are published, there are often many more data points generated than ever get included in the manuscript. Why not give them a home?

I get the impression, though, that GSK is the only organization so far that's been willing to give this a try. So I wanted to give it some publicity as well, since there are surely many people who aren't aware of the effort at all, and might be willing to help out. I don't expect that data sharing on this level is going to lead to any immediate breakthroughs, of course, but even though assay numbers like this have a small chance of helping someone, they have a zero chance of helping if they're stuck in the digital equivalent of someone's desk drawer.

What can be shared, should be. And there's surely a lot more that falls into that category than we're used to thinking.

Comments (18) + TrackBacks (0) | Category: Drug Assays | The Scientific Literature


1. Teddy Z on May 10, 2013 7:48 AM writes...

The way I think it would be thunk about is this. Data is a corporate asset. That's what is constantly pounded in our heads (that's why you countersign your notebook, those of you that actually do). So, when do companies give away corporate assets? When they have no value. So, I would hope that companies would give away the biological data to at least the marketed drugs. The idea there is that there is no meat left to pick off of those bones, so they have no value. Imagine the wealth of GPCR data that exists from the anti-depressives alone.

Permalink to Comment

2. Anonymous on May 10, 2013 8:44 AM writes...

I suspect its an activation barrier problem. One - the time and effort taken to convince the organisation to release the data - and two - the time taken to find the data and curate it even in a minimal way - when the project is long in the past (I guess most Med Chem get written at least a year after the work was current)

Given the recent post about data quality in assays how much of this data would actually be useful anyway ?

Permalink to Comment

3. Chris Swain on May 10, 2013 9:02 AM writes...

Perhaps we need to move one step at a time. Perhaps a requirement of publication should be that all data in a publication must be made available in a standard format so that it can be very easily imported into ChEMBL and other public repositories.

The next step might be to disclose HERG, AMES activity for all structures in the public domain, with the hope that better predictive tools might be designed.

Permalink to Comment

4. Pete on May 10, 2013 9:03 AM writes...

One way forward might be to provide those who hold data with financial incentives (e.g. tax breaks) to deposit the data. Sharing the results of toxicology studies would be particularly helpful (and some might suggest to be an ethical requirement). One issue that will need to be addressed (particularly in litigation-happy USA) is what legal liabilities might result from sharing data. For example, one would want some sort of guarantee that you're not going to have some smart ass lawyer building a patent infringement case out of the data that you've shared.

Permalink to Comment

5. JAB on May 10, 2013 10:38 AM writes...

Kudos to Bill Zuercher and Dave Drewry of GSK for their efforts to distribute the well curated GSK kinase inhibitor set to as many investigators as possible, including us.

Permalink to Comment

6. Cellbio on May 10, 2013 11:13 AM writes...

@2- There are many ways the data would have value, even with the limitations of reliability of absolute values. One such way is to inform academics, who by virtue of their capacity (they only run the assay that pays the bills) and environment, have limited access to broad data sets that help one to discern between an interesting lead, a class promiscuous compound (pan kinase inhibitor), or a compound that broadly reports as a hit but is either garbage or the cure for death of all causes.

Permalink to Comment

7. will on May 10, 2013 1:56 PM writes...

@ Pete - generally, the use of a patented compound in a research setting of developing new drugs is exempted from infringement. I would be primarily concerned about someone making an invalidity attack on my patent based on previously unreleased in-house data

@ Teddy - data on even quite old drugs is still potentially valuable, as a new indication can breathe life into an otherwise decaying product

Permalink to Comment

8. Pete on May 10, 2013 2:34 PM writes...

@ Will, Compounds are not the only things that get patented in drug discovery.

Permalink to Comment

9. sgcox on May 10, 2013 3:18 PM writes...

Second to #5.JAB
GSK guys go extra mile with this project. Very helpful.

Permalink to Comment

10. will on May 10, 2013 3:58 PM writes...

@ pete - I guess I misunderstood your comment then, I thought your concern was that a company might publish biodata on a particular test compound, and then a separate entity would rise up with a patent covering said compound

I don't know if the question of whether a method patent covering a particulary assay would also be subject to the research exemption. logically, i think it would

it's too late in the day for me to think of any other patentable subject matter that published biodata would constitute evidence of infringement

Permalink to Comment

11. Pete on May 10, 2013 4:34 PM writes...

@ Will, To be quite honest my original comment was fairly generic and I'd not been thinking too much about detailed scenarios. My main point was that we need to at least acknowledge the possibility that data could be used against those who have deposited it. Assay technology patents do have the potential to make life difficult especially when patent lawyers say what one might have developed in house was 'obvious'.

Permalink to Comment

12. Anonymous BMS Researcher on May 11, 2013 9:03 AM writes...

Even getting people to submit content for *internal* data repositories can be like pulling teeth. Unlesss something is either required from on high or directly on the critical path to making the metrics, it ain't gonna happen. It took stringent auditing to make everybody maintain good lab notebooks, for instance.

Permalink to Comment

13. Insiliconsulting on May 11, 2013 11:08 AM writes...

Chembl came into being when Wellcome trust paid ~4 million pounds for the data and further development. It was arguably a last ditch effort by the content owner to make SOME money in the face of curated database competitors and ever reducing profits. Still a good deal for scientists the world over though. Thanks Wellcome trust.

Permalink to Comment

14. SK on May 11, 2013 6:54 PM writes...

@ Will & Pete:

Risk of finding evidence infringement when releasing such data is not such a big issue, although the research exemption is somewhat narrow and only really applies to experimentation on drugs which could be the subject of an FDA submission. There may be a non-negligible risk of infringement that might deter such release.

A bigger risk is that release of the data would act as a "defensive publication" which increases the chance of an obviousness challenge. This could impact on already marketed drugs and also prevent otherwise socially valuable compounds from getting to market due to unpatentability.

A more efficient solution would be for companies to share data and then provide FDA clinical exclusivity once a drug candidate is ready for clinical development and then an extended period of FDA-administered market exclusivity upon regulatory approval (similar to the Orphan Drug Act, but say, 12-15 years). This would allow researchers to share valuable data at the "pre-competitive" stage, while providing ownership rights to the company which is willing to enter clinical testing (there would likely be a period where trade secret protection is used by smaller biotechs that develop drug candidates). It would also reduce waste due to excessive patent litigation between generics and innovators. This will possibly address the current productivity issues facing the industry.

Permalink to Comment

15. cdsouthan on May 12, 2013 3:05 AM writes...

In the first instance there is a big public payoff if patent assignees (academic, US Gov or commercial) can surface at least some of their better SAR data published in patents but never written up in the journals that ChEMBL captures. What would have even more impact is if a) journals desisted in publishing pharmacological data (in vitro or in vivo) on blinded structures (violating the principals of scientific reproducibility) and b) ensured even the most basic level of transparency by (promptly) publishing clinical trial results linked to a structure. (See and

Permalink to Comment

16. cliffintokyo on May 13, 2013 8:10 AM writes...

Reality check:
Who has time to peruse other people's data when we are all so busy keeping up with, analysing, reporting, explaining, and utilizing (for patents and follow-up lead discovery) the mountain of data for our own compounds?

Permalink to Comment

17. MO on May 14, 2013 1:32 PM writes...

I can't get that excited. As usual the benefits to the wider community are oversold. Typically the best compounds from a series (most potent or best illustrating the SAR) are the ones selected for publication, so these additional Chembl-only data points will be the unexciting also-rans in the series, which fit with the trends but weren't interesting enough to talk about or to generate n=2. Also, those who don't know better will combine data from different sources inappropriately.

One outcome will be an increase in the number of publications "mining" this data, but how many new findings will these uncover? Few, I suggest.

Permalink to Comment

18. Michael Overduin on May 18, 2013 12:29 AM writes...

@MO There is real value, the curated kinase inhibitor databases are a godsend us academics who need decent starting points and binding profiles to get funding in this valley of death.

As an academic trying to set up a drug discovery initiative to open source precompetitive data, I can attest that UK funders and UK universities remain skeptical, this won't happen unless big pharma and Wellcome (or similar) co-invest in an academic-led consortium that is committed and resourced to do the work. The SGC@Oxford is a great model.

To move forward I'd like to know if a useful first step would be to share quality fragment binding data matrices (hits/affinities/sites) across a target family, comparing full length and single domains as well as activated states and complexes with published inhibitors/leads. This could allow academics to derisk potential targets and develop new mechanism-driven strategies to identify early lead matter. If this might be of interest, please let me know.

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry