About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« The Sangji/UCLA/Harran Case: Now Officially a Mess | Main | Finding Fast Fruit Fly Feasibility »

August 2, 2012

Public Domain Databases in Medicinal Chemistry

Email This Entry

Posted by Derek

Here's a useful overview of the public-domain medicinal chemistry databases out there. It covers the big three databases in detail:

BindingDB (quantitative binding data to protein targets).

ChEMBL (wide range of med-chem data, overlaps a bit with PubChem).

PubChem (data from NIH Roadmap screen and many others).

And these others:
Binding MOAD (literature-annotated PDB data).

ChemSpider (26 million compounds from hundreds of data sources).

DrugBank (data on 6700 known drugs).

GRAC and IUPHAR-DB (data on GPCRs, ion channels, and nuclear receptors, and ligands for all of these).

PDBbind (more annotated PDB data).

PDSP Ki (data from UNC's psychoactive drug screening program)

SuperTarget (target-compound interaction database).

Therapeutic Targets Database(database of known and possible drug targets).

ZINC (21 million commercially available compounds, organized by class, downloadable in various formats).

There is the irony of a detail article on public-domain databases appearing behind the ACS paywall, but the literature is full of such moments as that. . .

Comments (11) + TrackBacks (0) | Category: Biological News | Chemical News | Drug Assays


1. Anonymous on August 2, 2012 8:36 AM writes...

Another just published reference that might be of interest is:

Permalink to Comment

2. Matt on August 2, 2012 12:19 PM writes...

Does anybody know how to download and open SDF file from DrugBank ?

Permalink to Comment

3. Anonymous on August 2, 2012 1:28 PM writes...

go to dowloads and try opening with marvin-chemaxon

Permalink to Comment

4. weirdo on August 2, 2012 2:51 PM writes...

On a complete threadjack, Derek, my deepest sympathies on the Indians designating you for assignment. It was good while it lasted . . . .

Permalink to Comment

5. Anonymous on August 2, 2012 3:09 PM writes...

And another one here:

Permalink to Comment

6. ScientistSailor on August 2, 2012 6:19 PM writes...

I picked two drugs as random and looked them up in DrugBank. It has Sutent listed as a statin, and Plavix as a nutraceutical. I guess you get what you pay for...

Permalink to Comment

7. Derek Lowe on August 2, 2012 7:54 PM writes...

#9, that seemed strange, but you're right. Their summary of sunitinib's activity is fine, but under "taxonomy", it's listed as a statin. All the statins fall into that class too (as they'd better), but I note that there are other drugs misclassified as statins as well - conivaptan is one that I just found. What's up, I wonder? Have you sent them an email, or should I?

Permalink to Comment

8. ddddddd on August 3, 2012 2:22 AM writes...

Thank you so much for this, Derek. A simple, to-the-point blog post which contains really valuable information. Many of these I knew, but a good few I didn't, so thanks for sharing.

Permalink to Comment

9. anonymous on August 3, 2012 6:02 AM writes...

Could some kind soul provide a similar list for the more "biological" side of drug discovery?

Permalink to Comment

10. Josh on August 3, 2012 1:32 PM writes...

@8 and @Derek
Same from me. Very helpful stuff. Thanks for sharing

Permalink to Comment

11. Egon Willighagen on August 11, 2012 2:06 AM writes...

Dear Derek,

I always assumed you were located in the USA. Public Domain has a very special meaning there: anything in the public domain no longer has copyright. Similar, but often critically different, rules apply to the rest of the world.

However, most databases you list are *not* in the public domain. Some are not even free, as in free speech. I think they all are free, as in free beer. But public domain is really, really something different.

PubChem, for example, is one database that comes very close to public domain. On the other end there is ChemSpider, which you can look at, but not touch: you are not allowed to copy it. ChEMBL is somewhere in the middle: it is not public domain, but the authors decided you give it an Open license: the Creative Commons license with the ShareAlike and Attribution clauses; but, very importantly, anyone can download it, change it, and redistribute it under the same conditions.

And this matters: if you want to combine information from various database, you are typically downloading and changing (the format at least) the data. Typically, you are changing the data even more, preferably using software, such as salt removal, etc. Ideally, you would share these results with a colleague, and ask her (or him) to look at it to. That is redistribution.

Public Domain allows that. Open licenses, like the Creative Commons license allows that (depending on the exact additional clauses), and waivers like CCZero allow that.

Some of the above database do not allow that.

This matters for science and for getting down the cost of virtual drug discovery. Data you cannot properly study is not helping us find new medicine.

Hoping that people will finally stop calling databases randomly Public Domain, just because the can look at it,

with kind regards,

Egon Willighagen

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry