Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe
ZINC (21 million commercially available compounds, organized by class, downloadable in various formats).
There is the irony of a detail article on public-domain databases appearing behind the ACS paywall, but the literature is full of such moments as that. . .
6. ScientistSailor on August 2, 2012 6:19 PM writes...
I picked two drugs as random and looked them up in DrugBank. It has Sutent listed as a statin, and Plavix as a nutraceutical. I guess you get what you pay for...
#9, that seemed strange, but you're right. Their summary of sunitinib's activity is fine, but under "taxonomy", it's listed as a statin. All the statins fall into that class too (as they'd better), but I note that there are other drugs misclassified as statins as well - conivaptan is one that I just found. What's up, I wonder? Have you sent them an email, or should I?
Thank you so much for this, Derek. A simple, to-the-point blog post which contains really valuable information. Many of these I knew, but a good few I didn't, so thanks for sharing.
I always assumed you were located in the USA. Public Domain has a very special meaning there: anything in the public domain no longer has copyright. Similar, but often critically different, rules apply to the rest of the world.
However, most databases you list are *not* in the public domain. Some are not even free, as in free speech. I think they all are free, as in free beer. But public domain is really, really something different.
PubChem, for example, is one database that comes very close to public domain. On the other end there is ChemSpider, which you can look at, but not touch: you are not allowed to copy it. ChEMBL is somewhere in the middle: it is not public domain, but the authors decided you give it an Open license: the Creative Commons license with the ShareAlike and Attribution clauses; but, very importantly, anyone can download it, change it, and redistribute it under the same conditions.
And this matters: if you want to combine information from various database, you are typically downloading and changing (the format at least) the data. Typically, you are changing the data even more, preferably using software, such as salt removal, etc. Ideally, you would share these results with a colleague, and ask her (or him) to look at it to. That is redistribution.
Public Domain allows that. Open licenses, like the Creative Commons license allows that (depending on the exact additional clauses), and waivers like CCZero allow that.
Some of the above database do not allow that.
This matters for science and for getting down the cost of virtual drug discovery. Data you cannot properly study is not helping us find new medicine.
Hoping that people will finally stop calling databases randomly Public Domain, just because the can look at it,
1. Anonymous on August 2, 2012 8:36 AM writes...
Another just published reference that might be of interest is:
Permalink to Commenthttp://www.ncbi.nlm.nih.gov/pubmed/22821596
2. Matt on August 2, 2012 12:19 PM writes...
Does anybody know how to download and open SDF file from DrugBank ?
Permalink to Comment3. Anonymous on August 2, 2012 1:28 PM writes...
go to dowloads and try opening with marvin-chemaxon
Permalink to Comment4. weirdo on August 2, 2012 2:51 PM writes...
On a complete threadjack, Derek, my deepest sympathies on the Indians designating you for assignment. It was good while it lasted . . . .
Permalink to Comment5. Anonymous on August 2, 2012 3:09 PM writes...
And another one here:
Permalink to Commenthttp://www.ncbi.nlm.nih.gov/pubmed/22352914
6. ScientistSailor on August 2, 2012 6:19 PM writes...
I picked two drugs as random and looked them up in DrugBank. It has Sutent listed as a statin, and Plavix as a nutraceutical. I guess you get what you pay for...
Permalink to Comment7. Derek Lowe on August 2, 2012 7:54 PM writes...
#9, that seemed strange, but you're right. Their summary of sunitinib's activity is fine, but under "taxonomy", it's listed as a statin. All the statins fall into that class too (as they'd better), but I note that there are other drugs misclassified as statins as well - conivaptan is one that I just found. What's up, I wonder? Have you sent them an email, or should I?
Permalink to Comment8. ddddddd on August 3, 2012 2:22 AM writes...
Thank you so much for this, Derek. A simple, to-the-point blog post which contains really valuable information. Many of these I knew, but a good few I didn't, so thanks for sharing.
Permalink to Comment9. anonymous on August 3, 2012 6:02 AM writes...
Could some kind soul provide a similar list for the more "biological" side of drug discovery?
Permalink to Comment10. Josh on August 3, 2012 1:32 PM writes...
@8 and @Derek
Permalink to CommentSame from me. Very helpful stuff. Thanks for sharing
11. Egon Willighagen on August 11, 2012 2:06 AM writes...
Dear Derek,
I always assumed you were located in the USA. Public Domain has a very special meaning there: anything in the public domain no longer has copyright. Similar, but often critically different, rules apply to the rest of the world.
However, most databases you list are *not* in the public domain. Some are not even free, as in free speech. I think they all are free, as in free beer. But public domain is really, really something different.
PubChem, for example, is one database that comes very close to public domain. On the other end there is ChemSpider, which you can look at, but not touch: you are not allowed to copy it. ChEMBL is somewhere in the middle: it is not public domain, but the authors decided you give it an Open license: the Creative Commons license with the ShareAlike and Attribution clauses; but, very importantly, anyone can download it, change it, and redistribute it under the same conditions.
And this matters: if you want to combine information from various database, you are typically downloading and changing (the format at least) the data. Typically, you are changing the data even more, preferably using software, such as salt removal, etc. Ideally, you would share these results with a colleague, and ask her (or him) to look at it to. That is redistribution.
Public Domain allows that. Open licenses, like the Creative Commons license allows that (depending on the exact additional clauses), and waivers like CCZero allow that.
Some of the above database do not allow that.
This matters for science and for getting down the cost of virtual drug discovery. Data you cannot properly study is not helping us find new medicine.
Hoping that people will finally stop calling databases randomly Public Domain, just because the can look at it,
with kind regards,
Egon Willighagen
Permalink to Comment