Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Fungal Structures to the Rescue | Main | That Schering-Plough Lawsuit Isn't Going Away »

June 25, 2010

What To Do With The Not-Quite-Worthless

Email This Entry

Posted by Derek

Yesterday morning I went on and on about the low quality of much of what gets published in the scientific literature. And indeed, the low end is very likely of no use to anyone, except (apparently) the people publishing it. But what to do with the rungs above that?

For organic chemistry, those are occupied by papers that report new compounds of little interest to anyone. But you never know - they might be worth someone else's time eventually. It's unlikely that any of these things will be the hinge on which a mighty question turns, but knowing that they've been made (and how), and knowing what their spectra and properties are could save someone time down the line when they're doing something more useful. These are real bricks in the huge construction of scientific knowledge, and while they're not worth much, it's more than zero. That's the value I assign to the hunks of mud that some people offer instead, or the things that look like real bricks but turn out to be made out of brick, yes, but about one millimeter thick and completely hollow.

So what to do with work that's mostly reference data for the future? It shouldn't have to appear in physical print, you'd think. How about the peer-reviewed journal part? Well, peer review is not magic. As it stands, that sort of information is the least-reviewed part of most papers. If someone tells you that they've made Compound X and Compound Y, and the synthesis isn't obviously crazy, you tend to take their word for it. It's a rare reviewer that gets all the way down to the NMR spectra in the supplementary material, that's for sure. And if one does, and the NMR spectra look reasonably believable, well, what else can you do? Even so, every working chemist has dealt with literature whose procedures Just Don't Work, and all those papers passed some sort of editorial review process at some point.

No, peer review is not going to do much to improve the quality of archival data. If someone really wants to fill up the low-level bins with junk, there's not much stopping them. You could sit down and draw out a bunch of stuff no one's ever made before, come up with plausible paper syntheses of all of it, use software to predict reasonable NMR spectra (which you might want to jitter around a bit to cover your tracks), and just flat-out fake the mass spec and elemental analyses. Presto, another paper that no one will ever read, until eventually someone has a reason to make similar compounds and curses your name in the distant future. The problem is, such papers will do you no real good, since they'll appear in the crappiest journals and pick up no citations from anyone.

Perhaps there should be a way to dump chemical data directly into some archives, the way X-ray data goes into the Protein Data Bank. That wouldn't count for much, but it would capture things for future use. Having it not count much would decrease the incentive for anyone to fill it full of fakery, too, since there would be even less point than usual. And before anyone objects to having a big pile of non-peer-reviewed chemical data like this, keep in mind that we already have one: it's called the patent literature, and it can be quite worthwhile. Although not always.

Comments (31) + TrackBacks (0) | Category: The Scientific Literature | Who Discovers and Why


COMMENTS

1. Maks on June 25, 2010 7:41 AM writes...

Yes, please publish all the synthetic routes to all compounds ever synthesized. You got public money for doing research, so help saving public money by making your knowledge public.
But, I wasted so much time by reproducing the synthesis of compounds published in the "less impact factor" journals, just to realize that they authors didn't care about isomers which you only see if you are actually carefully looking at your spectra, finding your compound of interest "synthesized as previously described", after going from reference to reference I found several times there is no description whatsoever.
If your data is solid, put it out, otherwise keep it to yourself. Both ways are saving time of other people.

Permalink to Comment

2. Kit on June 25, 2010 8:13 AM writes...

I always thought it would be wonderful to have a place to dump all kinds of successful but unpublishable material. 90% of the reactions I ran in graduate school didn't end up in published papers. But all those dead-end synthetic pathways had successful chemistry along the way. Some of it was clever (and some of it was mundane), but surely it could be useful to someone ... somewhere ... sometime. How about some giant searchable database of the world's electronic notebooks? Instead, it is inaccessible to the world, perhaps doomed to be repeated by another grad student being led down a similar dead-end synthetic path.

Permalink to Comment

3. Uday Gokhale on June 25, 2010 8:29 AM writes...


Derek I agree with you 100%. About patents too, your comments are true. I distinctly remember two incidences in this context. 1. About writing patents : Dr. Houseley who was heading Boots pharmaceutical's R & D ( 1988ish)was fanatically insistent about writing the experimental description EXACTLY as was done in lab without bothering much for patentese. 2. One Prof. Michael Ben at U of Calgary had pointed out a completely fake synthsis sequence published in Tet. lett ; know not what was the impact factor for it in 1983. As green Grad Students we were tempted to doubt our own techniques when some procedures didn't work as reported in journals. I believe patents these days disclose the synthesis steps in more detail . Probably insistence by seniors in the lab and scientific scrutiny of claimed matter by the 'not so much in hurry ' examiners might help.

Permalink to Comment

4. Matthias Winkelmann on June 25, 2010 8:41 AM writes...

IIRC, the PDB requires publication for inclusion in the database. There's usually interest in structures anyway, because biologically and medically interesting targets are actively selected for crystallization.

Permalink to Comment

5. Tok on June 25, 2010 9:07 AM writes...

Kit,
If you wrote it up in your thesis, it's out there for everyone to see. I don't think there's a structure search engine for thesis material though.

Permalink to Comment

6. Jason on June 25, 2010 9:27 AM writes...

Here's an idea, feel free to steal it and make your fortune: set up a website that lets people review, comment on and discuss academic papers.

(There's no need for it to have direct access to the papers themselves, just a simple way of entering references)

Permalink to Comment

7. Andy on June 25, 2010 9:49 AM writes...

I'm with Jason, I had hoped that Mendeley desktop would incorporate this function. Mendeley seem to be developing the vast user-base needed to annotate the vast quantity of literature.

If ten people say that they couldn't reproduce the research, and one person provides a link to an alternative, they'd have saved you a couple of months of pointless and painful work.

I certainly know a few papers that I'd love to put a red flag next to, stating 'approach with caution'.

Permalink to Comment

8. Sili on June 25, 2010 10:16 AM writes...

I've long wondered why there aren't databases for NMR and IR spectra like the Cambridge Crystallographical Database Centre. Though, of course preferably free.

Permalink to Comment

9. p on June 25, 2010 10:23 AM writes...

Anyone who thinks unreproducable procedures are isolated to the realm of the low impact factor isn't doing enough chemistry.

The trouble with lots of lower impact factor journals is that there is NO procedure. The advent of ESI istaking care of that some.

Permalink to Comment

10. JasonP on June 25, 2010 10:49 AM writes...

How about throw out organic synthesis papers altogeather and have a unified database where you could even submit "dead-end synthetic pathways had successful chemistry along the way" and everyone would gain from it?

Leave the paper writing to us biologists. Much more interesting I'm sure you'll agree. :)

Permalink to Comment

11. RM on June 25, 2010 10:50 AM writes...

Peer review cannot detect deliberate, malicious fabrication performed with the intent of hoodwinking reviewers. Not even in the rarefied air of Science and Nature - and there have been a number of examples to prove it. And even if it can't detect *all* examples, it has the potential to detect some. Effectively, you're arguing that fences are worthless at containing animal because occasionally someone might forget to close the gate.

So bringing deliberate fabrication up to discount peer review is a bit of a straw man. The value of peer review is not primarily in the detection of deliberate fabrication (though it can do that to a limited success), but mainly in the detection of honest omission and the reining in of overzealous conclusions: "You should be doing corrections as described in Smith, et al." "How did you account for known oxidation products?" "The methods section is word salad - clarify." "You haven't considered interpretations A, B, and C."

Finally, patent literature *is* reviewed. Not by peers, but by patent clerks and, more importantly, by courts. *That's* what keeps the patent literature honest, despite the lack of peer review: the knowledge that your patent could be invalidated by the courts for even an honest omission, let alone deliberate falsification. It's a poor comparison, as I doubt your hypothesized unreviewed database carries the risk of similar judicial proceedings. (As others mentioned, the PDB requires publication, and thus peer review.)

Permalink to Comment

12. A Nonny Mouse on June 25, 2010 11:24 AM writes...

There is a scheme in the UK to make available all of the theses which were pre-electronic. I recently had to give permission to allow this to happen (though it would have gone ahead anyway as the university had already agreed to it).

From the letter

The Department of Chemistry has been contacted by representatives of SORD (Selective
Organic Reactions Database).
SORD aims to give access to so-called ‘lost chemistry’. The initiators of the project aim to
make a large amount of organic chemical reaction data accessible (currently inaccessible to
most chemists). This includes translations from non-English language sources and focuses
on compounds and/or procedures that are relevant with respect to drug development.
Through SORD, reactions that are currently only described in academic theses and
dissertations will become available to chemists all over the world.

Permalink to Comment

13. Cloud on June 25, 2010 11:44 AM writes...

Yes, the PDB requires publication. And publication of a structure requires submission to the PDB.

For those of you wanting databases to store things, but want them to be free: keep in mind that someone has to pay to design, develop and (perhaps most importantly) maintain the database.

From what I hear from my colleagues in academia: there isn't that much grant money for these sorts of things, and there is a real problem with finding grant money to pay for the ongoing maintenance.

Your "free" resources aren't really free. They are just funded by taxpayers.

Permalink to Comment

14. Kit on June 25, 2010 12:18 PM writes...

Tok,
No, my thesis isn't out there for everyone to see. As far as I'm aware, there only exists three hard copies of my thesis (mine, boss's, and Univ library's - all of which are just collecting dust) and no electronic copies available in any form.

Permalink to Comment

15. Michal Krompiec on June 25, 2010 12:39 PM writes...

There exist at least two free NMR databases to which anyone can add their spectra (published or not): nmrshiftdb.org (requires manual input of each shift, multiplicity etc) and nmrdb.org (mynmrdb.org), which accepts raw FID (both 1D and 2D) and has an online tool to process the spectrum, draw the structure and assign peaks to atoms. Nmrdb is sponsored by the EPFL, so it should stay up for at least a couple of years.

Permalink to Comment

16. Sean on June 25, 2010 1:15 PM writes...

@Cloud

The PDB requires publication? I don't see that requirement listed in their policy page for accepting structures.

http://www.wwpdb.org/policy.html

Permalink to Comment

17. Dan Barlow on June 25, 2010 1:31 PM writes...

As someone who makes web sites and databases but knows nothing about the data format you need, I would suggest writing an open-document specification for what you want. Specify exactly what data it would hold and how it would search. You might realize it's easy to do, or you might realize it's a huge project. Hosting such a database is a matter of a couple hundred dollars a month.

Permalink to Comment

18. Ron on June 25, 2010 6:04 PM writes...

To the above posters: there is a good spectral database for NMR and IR spectra:

http://riodb01.ibase.aist.go.jp/sdbs/cgi-bin/cre_index.cgi

I have used it many many times in the past, and, while it often does not have the spectra for the somewhat esoteric compounds I want, I think it's a good reference nonetheless.

Permalink to Comment

19. Martin on June 25, 2010 6:43 PM writes...

Surprised no-one's mentioned the one-page one-molecule concept started by Molecules I believe in 1998. These were originally intended to host compounds that would otherwise never have seen the light of day. Of course they then went and shot themselves in the foot by laying page charges or requiring the deposit of a physical sample of the compound but "it seemed like a good idea at the time"

Obviously the real world cost of hosting the journal had to be borne somewhere but it was supposed to be the normal article section of the journal, not the "lost compounds" concept.

Permalink to Comment

20. Mutatis Mutandis on June 25, 2010 8:00 PM writes...

I am not convinced dumping data in a database would be the right answer. It makes me think of the screening efforts coordinated by the NIH, in which various academic HTS labs register their compound activity data into a central database. From reviews I have heard and seen, much of this data can't be trusted. And that's not because these academic screening groups are dishonest or incompetent, but because the large majority of these putative "hits" have not been critically reviewed or experimentally confirmed. That is still a vital step, and even a low-grade effort is much better than none at all.

Permalink to Comment

21. anon on June 25, 2010 9:36 PM writes...

Maybe someone should propose these ideas to the Google Brats (Sergey and Larry). They need a project to occupy their time, otherwise they will feel compelled to fly around the globe in the corporate 767, spewing tons of carbon in their wake.

Permalink to Comment

22. Ante on June 26, 2010 12:15 AM writes...

ChemSpider could be (actually is) such a place to "publish" your compounds and reactions together with spectra.

Permalink to Comment

23. Cassius on June 26, 2010 3:00 PM writes...

But wouldn't such database screw over a lot of future patents? It would be easier to get it out there, so more people would likely submit compounds... thus crushing IP for that chemical space.... and chemical space is getting crunched up enough. If the synthesis was too crappy to publish or the compound was really that useless at the time, I don't think every random Joe should be able to put it out there with so little effort. I'm sure some people are thinking "who cares about IP?!".... but if the next wonder drug is tossed into the organic waste because the company couldn't improve their portfolio with a patent application, that's a real blow to the already struggling drug discovery efforts.

Permalink to Comment

24. Anonymous on June 28, 2010 4:47 AM writes...

Synthetic Pages/Chemspider is a good initiative as previously mentioned.

http://cssp.chemspider.com/

But let's not forget Organic Syntheses, where reproducibility isn't an issue as the review process depends on the reactions being repeated.

http://www.orgsynth.org/default.asp

Permalink to Comment

25. Taitken on June 28, 2010 6:19 AM writes...

I would think, as Ante mentions, that Chemspider and Chemspider synthetic pages (Chemspider.com), hosted by the Royal Society of Chemistry, would be the perfect places to put this data. You can upload structures, reactions, procedures, spectral data, and it's 'peer-reviewed' via crowd sourcing, like Wikipedia.

Very powerful, structure and text searchable too.

Permalink to Comment

26. Dick Wife on June 28, 2010 7:42 AM writes...

Not quite worthless? Lost Chemistry is we believe very valuable - if carefully selected before entering a database. Here is a plea to all acedmics - collaborate with SORD and let the whole world share your chemistry! From 1965 to today, we would like to see all this synthetic organic chemistry - just get in contact!

Permalink to Comment

27. p on June 28, 2010 8:11 AM writes...

I think the problem isn't with the literature itself. The problem is that it isn't what some (perhaps a lot of) people want. There is no reason at all that anyone should read every paper published or that any paper should be of vital importance to most people. Most papers are good, solid work that contribute, perhaps not greatly, to the area in which it's published. And a few of those will become increasingly important with time or when someone not yet born reads it and is inspired in some way.

In other words, we have billions of pennies with a few nickels and dimes, fewer quarters and the very occasional silver dollar. Individually, the pennies (and even nickels) aren't very valuable but combined they are immensely valuable. The problem is we have a bureaucratic set up and cultural mindset that wants to value each paper as if it's a 10K Barrow Bond.

There is simply no way to simply count up how many papers someone has published and learn anything about them or their work. You have to read the paper, follow up some references and be acquainted with the field (not to mention probably waiting a good long while to see the paper's effect on the field). But that is hard and the folks making decisions about hiring, promotion, awards, etc. don't want to have to think or make tough decisions. They want to fill in a spread sheet, count papers, sum impact factors and announce a "winner" as if it were a sporting event.

If we stop acting as if publishing is a sport and value the papers for what they are - and what they are not - all will be fine. With the publishing aspect, that is. The business aspect will still suck.

Permalink to Comment

28. LeeH on June 28, 2010 9:17 AM writes...

It would be quite simple to host a simple reaction database (in smirks format, for example) that holds reactions that have been published. This would serve as a mineable source of reaction knowledge, as well as a simple index that would point back to the paper itself. You wouldn't have to put much else in the database except for the pointer back to the paper.

It would also be a place to put failed reactions, which I believe would be a huge advantage. Imagine how many times reactions are duplicated. It would also serve as a mechanism to document how robust reactions are.

Permalink to Comment

29. Canuck Chemist on June 28, 2010 9:24 AM writes...

@LeeH: The database you describe is in place: It's called SciFinder. There's also Beilstein. The problem with any publication of "failed" results is that you can't really confirm a negative result, i.e. it could be a failed reaction for any number of reasons pertaining to user error.

Permalink to Comment

30. Cloud on June 28, 2010 12:21 PM writes...

@Sean- when I submitted a structure to the PDB, I was required to provide a reference.

Maybe they have changed that to better handle data coming out of structural genomics efforts.

Or maybe it is an undocumented requirement that you don't discover until you hit that part of the submission form.

Permalink to Comment

31. Anon anon anon on June 29, 2010 11:12 AM writes...

The suggestions here sound a lot like the Useful Chem project: http://usefulchem.wikispaces.com/

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
The Palbociclib Saga: Or Why We Need a Lot of Drug Companies
Why Not Bromine?
Fragonomics, Eh?
Amicus Fights Its Way Through in Fabry's
Did Pfizer Cut Back Some of Its Best Compounds?
Don't Optimize Your Plasma Protein Binding
Fluorinated Fingerprinting
One of Those Days