The ChEMBL database of compounds has been including bioactivity data for some time, and the next version of it is slated to have even more. There are a lot of numbers out in the open literature that can be collected, and a lot of numbers inside academic labs. But if you want to tap the deepest sources of small-molecule biological activity data, you have to look to the drug industry. We generate vast heaps of such; it's the driveshaft of the whole discovery effort.
But sharing such data is a very sticky issue. No one's going to talk about their active projects, of course, but companies are reluctant to open the books even to long-dead efforts. The upside is seen as small, and the downside (though unlikely) is seen as potentially large. Here's a post from the ChEMBL blog that talks about the problem:
. . .So, what would your answer be if someone asked you if you consider it to be a good idea if they would deposit some of their unpublished bioactivity data in ChEMBL? My guess is that you would be all in favour of this idea. 'Go for it', you might even say. On the other hand, if the same person would ask you what you think of the idea to deposit some of ‘your bioactivity data’ in ChEMBL the situation might be completely different.
First and foremost you might respond that there is no such bioactivity data that you could share. Well let’s see about that later. What other barriers are there? If we cut to the chase then there is one consideration that (at least in my experience) comes up regularly and this is the question: 'What’s in it for me?' Did you ask yourself the same question? If you did and you were thinking about ‘instant gratification’ I haven’t got a lot to offer. Sorry, to disappoint you. However, since when is science about ‘instant gratification’? If we would all start to share the bioactivity data that we can share (and yes, there is data that we can share but don’t) instead of keeping it locked up in our databases or spreadsheets this would make a huge difference to all of us. So far the main and almost exclusive way of sharing bioactivity data is through publications but this is (at least in my view) far too limited. In order to start to change this (at least a little bit) the concept of ChEMBL supplementary bioactivity data has been introduced (as part of the efforts of the Open PHACTS project, http://www.openphacts.org).
There's more on this in an article in Future Medicinal Chemistry. Basically, if an assay has been described in an open scientific publication, the data generated through it qualifies for deposit in ChEMBL. No one's asking for companies to throw open their books, but even when details of a finished (or abandoned) project are published, there are often many more data points generated than ever get included in the manuscript. Why not give them a home?
I get the impression, though, that GSK is the only organization so far that's been willing to give this a try. So I wanted to give it some publicity as well, since there are surely many people who aren't aware of the effort at all, and might be willing to help out. I don't expect that data sharing on this level is going to lead to any immediate breakthroughs, of course, but even though assay numbers like this have a small chance of helping someone, they have a zero chance of helping if they're stuck in the digital equivalent of someone's desk drawer.
What can be shared, should be. And there's surely a lot more that falls into that category than we're used to thinking.