Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe
Bruce Booth has a look at some rules suggested by Glenn Begley of Amgen, who's been involved in trying to reproduce published data. He's had enough bad experiences in that line (and he's not alone) that he's advocating these standards for evaluating something new and exciting:
1) Were studies blinded?
2) Were all results shown?
3) Were experiments repeated?
4) Were positive and negative controls shown?
5) Were reagents validated?
6) Were the statistical tests appropriate?
Applying these tests would surely scythe down an awful lot of the literature - but a lot of the stuff that would be weeded out would deserve it. I really wonder, for example, how many n=1 experiments make it into print; I'm sure it's far more than anyone would be comfortable with if we knew the truth. As I've mentioned here before, different fields have different comfort levels with what needs to be done to assure reproducibility, but I think that everyone would agree that complex biology experiments need all the backing up that they can get. The systems are just too complex, and there are too many places were things can go silently wrong.
That "Were all results shown" test is a tough one, too. Imagine a synthetic paper where each reaction has a note next to it, like "4/5", to show the number of times the reaction worked out of the total number of times it was tried. There would be a lot of "2/2", which would be fine, and (in total synthesis papers) some "1/1" stuff in the later stages, which readers could take or leave. But wouldn't it be instructive to see the "1/14"s in print? We never will, though. . .
What the 6 points of reproducibility are really striving for is believability. In the medical literature you have to go beyond the summary to look at the data to see if this justifies the conclusion, and even if the conclusion is supported by the data (Begley's rule #6), whether the conclusion was cherry picked to trumpet a less than spectacular result.
The problem with specifying "fraction successful" is how you define the denominator. Do you include all attempts at the reaction? If you try 30 different Lewis acids, and find one that works which repeats perfectly nine additional times, is the success rate 10/39 or 10/10?
If it's the latter, I'm guessing you'll find a number of people who will "fudge" things by selective interpretation of conditions. "The in the unsuccessful trial you used 0.998 equivalents of reagent B, and in the successful ones you used 1.002 and 1.001 equivalents? We'll slap an "at least" into the methods section and that'll give us a perfect 2/2 versus a slightly suspect 2/3."
8. Anonymous on October 1, 2012 11:34 AM writes...
It'd be interesting to have a league table of average yields for the 'big hitters'. It would prove nothing of course, but Id still like to see it.
At the end of a lab book last year, I went back and worked out the average yield for all reactions where I'd obtained some of the desired product: 52%.
I guess I'm useless, but then that was custom synthesis where there was rarely time for optimisation.
Although it's not quite the same thing, one might want to take a really close look at some of the data 'analysis' that is presented to give weight to opinions and beliefs in the area of drug-likeness. Then gain you might not because you're probably not going to like what you find.
11. processchemict on October 1, 2012 3:45 PM writes...
just a recent real world story:
someone asking a scale up (3 Kg of product) of a synthesis carried on on a 500 mg scale, with yelds reported from 15 to 80%.... and they're thinking about submitting an INDA application.
You destroy research centers, flush down the toilet the experienced people, cut the budgets to an hilarious size, you have some results with 100 mg of product and then... you throw the dices saying: "we have a clinical candidate!".... wow! what a giant step in the progress of scences (or it's better to say " a little step for science, a great step for the bonuses of the managers involved"?)
There is something else academic researchers can do about reproducibility. With the advent of cheap android-based tablet PCs (with built-in HD cameras!), cheap cloud storage, and ubiquitous network connectivity for scientific equipment, lab notebooks in their entirety can be electronic. And they can be published *in their entirety* as a supporting info for the manuscript. A few gigs of data for every published manuscript is nothing these days. I do understand that there are some projects, involving lots of imaging, etc., that can generate hundreds of GB- or TB-sized stacks of data. Some computational work as well. I don't know what is the best way of dealing with these cases, but for majority of cell biology, biochemistry, chemistry, etc, a few GB should be enough. For instance, all my published papers generated =
I know there are some labs that are switching to purely electronic note keeping. But not to publishing everything in a full disclosure mode, AFAIK.
This is certainly not a panacea. A dedicated faker will fake original data - on a separate PC before uploading them as "lab data" into the e-notebook. For instance, I can simulate my NMR spectra with the appropriate signals, artefacts and noise and save them in varian format as if they came from the spectrometer. Ditto for gas chromatograms, if I really need to make those yields 90% on average.
Then, supposing entire notebook is published, you, the reader, will have to wade through pages (possibly hundreds and thousands) detailing irrelevant dead ends and what not, before you find the info that you were after. Dead ends made deaded still by poor note-keeping habits of an overworked postdoc, for instance. Proprietary data formats in equipment will be an obstacle too.
Finally, many will be reluctant to go for such full disclosure mode simply because raw data from project A may have plenty of clues to a hot interesting project B, thus not to be divulged to competitors.
Many of these objections and obstacles can be dealt with, at least partially:
Indexing and search capabilities will help to find that needle in the haystack of raw data.
Proprietary formats are a problem, but not insurmountable if you really need the data - I can go to the collaborators lab nearby, if I need to read raw fluorescence file from SpectraMax M5 plate reader; to process the data, many people export data from proprietary format into some sort of delimited text - and that should be there in the e-notebook, together with the processing scripts!
Finally, as for not giving the competitors clues for the follow-up: well, the very fact of publishing is a disclosure. Certainly, competitors could be prevented from following on anything if instead of you know, manuscripts, we could just publish press releases (Well, Nature and Science are halfway there already). And from the standpoint of the society as a whole, one PIs problem with competitors pursuing hot follow-on research is not a bug but a feature.
No technology can solve social problems on its own. Doing lousy science and publishing lousy papers will not magically disappear if we pile more files into the cloud storage. The social and professional pressures for cutting corners and doing your job poorly will not disappear. But technology can help to some degree. Overall I am inclined to think that benefits of such full disclosure scheme outweigh its drawbacks. Funding agencies should not limit their demands to making research performed with public funds available in the form of open access manuscripts. Keeping extensive data archives is easy. Ease of searching and interoperability need some attention, but full disclosure is eminently doable and in my not-so-humble-opinion is highly desirable.
I had always assumed that in most med chem papers, unless explicitly stated, chemical yields were always n=1. i.e. I got enough to test/move to next step.
"For instance, all my published papers generated less than 10GB of junk on the hard drive per manuscript (including gels, chromatograms, spectra, etc)."
13. ELNs are at once a blessing and a curse. I have said it before and here goes I'm saying it again. They only work, only work, where an IT infrastructure exists that can mandate at a company policy level:
"here is your work-issued PC/tablet. On it you will find Office and the ELN of brand X. No we don't use brand Y or Z. Don't ask. Our last 10 years data is in brand X's data on platform Q. You will submit all experiments in brand X's ELN on platform Q. Everything, no exceptions."
i.e. totally void in any company or academic institution that has a heterogenous PC/Mac/Linux/iOS/Android environment where "cross-platform" ELN's just don't work cross-platform (especially chemical-aware ELNs). Web-based solutions are available but the cost for small groups is prohibitive.
17. anonymous on October 1, 2012 6:58 PM writes...
Sweet. Not that all of the rules are possible in methods development chemistry: blind studies? How do you even do that with catalysts, never mind not trusting the other hyper-competitive students in your group.
At least the following correction wouldn't be an issue. And the first author wouldn't be a professor now as well, since it would never be highlighted in CEN or maybe not even published in JACS.
18. Beentheredonethat on October 1, 2012 7:50 PM writes...
#14: right on Bro. I've publish dozens of SAR papers with many yields in the 15-20% range with an n=1. I put my thumbprint on the structure and am extremely confident the structure is as advertised. Move on and try get get the drug.
Dear Elsevier, if you are looking for new ways to add value, may I suggest the "negative citation", like a downvote on Reddit, which authors of manuscripts that cite an unreproducible work can use to indicate the nature of their citation (of course given the social nature of science this wouldn't have to be public, just somehow appended web of science or scopus style to the offending manuscript). As it stands citations accumulated by a "controversial" paper go straight into the h-index, which doesn't exactly provide a disincentive to publish stuff that's wrong (if provocative). It would also provide pharma an opportunity to take academics to the woodshed, which we often richly deserve.....
Not that it matters much (since everyone reading here is aware of the issue), but there have been a few higher up academics probing their institutions to see if trainees and the like feel they are pressured into certain results.
I've always wondered why they don't just ask for a small sample of the product before publishing. Oh, you made 1,5-dimethyl-ohgodthatexistsicene? We need enough to run an MS or NMR or such on it. Would it really be that hard? Natural products would have different C14 ratios then those extracted from biological sources, right? How hard would it be to test this?
Lack of reproducible work is a bigger issue for studies with complex reagents and animal models. I believe selection bias is more common than people think. What worries me the most is how easy it is for people to throw out data because "experiment didn't work".
With the current trends, the scientific community is heading straight for the cliff at full speed, with no signs of hope. The essence of science is that the researchers themselves are truthful, critical and skeptical. I can imagine everyone has had the moment, when a supervisor/professor replies a report with "This result is wrong, please correct it". The right thing to do, scientifically, is to dissect the experiment until you find the reason why the result was as it actually was, and then reply the supervisor with a good analysis. But, the fact is that like the rest of the economy, science is being sinified. (I recently visited an American research group, which consisted of about 70% Chinese.) The problem with Asian cultures is that they place extreme weight to obedience and "saving face", even at the expense of correctness. This is disasterous in combination with the same pressures everyone else already has. If this problem goes unchecked, the scientific literature will be flooded - and eventually choked - with unrepeatable or irrelevant results. There are already respected professors who get 99% total yield from a 10-step total synthesis with 99% yield at each step (check the math). The problem isn't that this happens, it's that *nothing is being done about it*; even worse, there are no signs of improvement.
I suppose all of us being the sophisticates that we are, Rule #0 need not be mentioned?
0. What were the financial and career pressures for the author(s) in publishing the report?
On "negative citations": I was involved in some research in information retrieval in which we were using computers to analyse citations in biomedical papers to classify them into different categories according to the function the citations seemed to have: results the present paper is reproducing; vaguely related work cited to show background; work we disagree with; and so on. One obstacle was that the "negative" categories were almost entirely absent in the papers we analysed. Researchers, at least in the fields we were looking at, just don't cite each other in that way, or at the very least, they do it in such veiled language that it's hard for even skilled humans, let alone our machine-learning systems, to detect. Maybe in some other academic disciplines (I'm thinking ones like literary analysis where it might be easier to find opposing "camps" of opinion) it might be more common. But I fear that such a feature implemented by a publisher like Elsevier would simply go unused, as authors would be collectively unwilling to go on record with such comments about each other's work.
1. Anonymous on October 1, 2012 9:24 AM writes...
What about virtual or high throughput screening results where people don't report every compound tested? 1/500,000.
Permalink to Comment2. luysii on October 1, 2012 9:53 AM writes...
What the 6 points of reproducibility are really striving for is believability. In the medical literature you have to go beyond the summary to look at the data to see if this justifies the conclusion, and even if the conclusion is supported by the data (Begley's rule #6), whether the conclusion was cherry picked to trumpet a less than spectacular result.
For a horrible example of this sort of thing -- from Johns Hopkins no less please see http://luysii.wordpress.com/2009/10/05/low-socioeconomic-status-in-the-first-5-years-of-life-doubles-your-chance-of-coronary-artery-disease-at-50-even-if-you-became-a-doc-or-why-i-hated-reading-the-medical-literature-when-i-had-to/
Permalink to Comment3. petros on October 1, 2012 9:53 AM writes...
What about all the unreproducible yields reported in papers by an eminent academic?
Permalink to Comment4. RB Woodweird on October 1, 2012 9:55 AM writes...
Ahh... why did my copy of Tetrahedron Letters just burst into flame?
Permalink to Comment5. Anonymous on October 1, 2012 10:18 AM writes...
Yeah, or how about an "average % yield" instead of putting the highest % yield obtained from like 20 runs of the same reaction!
Permalink to Comment6. RM on October 1, 2012 10:45 AM writes...
The problem with specifying "fraction successful" is how you define the denominator. Do you include all attempts at the reaction? If you try 30 different Lewis acids, and find one that works which repeats perfectly nine additional times, is the success rate 10/39 or 10/10?
If it's the latter, I'm guessing you'll find a number of people who will "fudge" things by selective interpretation of conditions. "The in the unsuccessful trial you used 0.998 equivalents of reagent B, and in the successful ones you used 1.002 and 1.001 equivalents? We'll slap an "at least" into the methods section and that'll give us a perfect 2/2 versus a slightly suspect 2/3."
Permalink to Comment7. Chemjobber on October 1, 2012 10:57 AM writes...
Greg Fu's papers have an admirable habit of having "86% yield (average), n=3".
Permalink to Comment8. Anonymous on October 1, 2012 11:34 AM writes...
It'd be interesting to have a league table of average yields for the 'big hitters'. It would prove nothing of course, but Id still like to see it.
Permalink to CommentAt the end of a lab book last year, I went back and worked out the average yield for all reactions where I'd obtained some of the desired product: 52%.
I guess I'm useless, but then that was custom synthesis where there was rarely time for optimisation.
9. Pete on October 1, 2012 11:45 AM writes...
Although it's not quite the same thing, one might want to take a really close look at some of the data 'analysis' that is presented to give weight to opinions and beliefs in the area of drug-likeness. Then gain you might not because you're probably not going to like what you find.
Permalink to Comment10. @Derick on October 1, 2012 3:25 PM writes...
Do you mean Six RULES of Reproducibility?
Permalink to Comment11. processchemict on October 1, 2012 3:45 PM writes...
just a recent real world story:
Permalink to Commentsomeone asking a scale up (3 Kg of product) of a synthesis carried on on a 500 mg scale, with yelds reported from 15 to 80%.... and they're thinking about submitting an INDA application.
You destroy research centers, flush down the toilet the experienced people, cut the budgets to an hilarious size, you have some results with 100 mg of product and then... you throw the dices saying: "we have a clinical candidate!".... wow! what a giant step in the progress of scences (or it's better to say " a little step for science, a great step for the bonuses of the managers involved"?)
12. Falcon on October 1, 2012 4:34 PM writes...
Crikey - if every reaction had to have a success rate quoted next to it, inorganic journals would go out of business!
Some of those reactions are bordering on the anecdotal.
Permalink to Comment13. Algirdas on October 1, 2012 5:39 PM writes...
There is something else academic researchers can do about reproducibility. With the advent of cheap android-based tablet PCs (with built-in HD cameras!), cheap cloud storage, and ubiquitous network connectivity for scientific equipment, lab notebooks in their entirety can be electronic. And they can be published *in their entirety* as a supporting info for the manuscript. A few gigs of data for every published manuscript is nothing these days. I do understand that there are some projects, involving lots of imaging, etc., that can generate hundreds of GB- or TB-sized stacks of data. Some computational work as well. I don't know what is the best way of dealing with these cases, but for majority of cell biology, biochemistry, chemistry, etc, a few GB should be enough. For instance, all my published papers generated =
I know there are some labs that are switching to purely electronic note keeping. But not to publishing everything in a full disclosure mode, AFAIK.
This is certainly not a panacea. A dedicated faker will fake original data - on a separate PC before uploading them as "lab data" into the e-notebook. For instance, I can simulate my NMR spectra with the appropriate signals, artefacts and noise and save them in varian format as if they came from the spectrometer. Ditto for gas chromatograms, if I really need to make those yields 90% on average.
Then, supposing entire notebook is published, you, the reader, will have to wade through pages (possibly hundreds and thousands) detailing irrelevant dead ends and what not, before you find the info that you were after. Dead ends made deaded still by poor note-keeping habits of an overworked postdoc, for instance. Proprietary data formats in equipment will be an obstacle too.
Finally, many will be reluctant to go for such full disclosure mode simply because raw data from project A may have plenty of clues to a hot interesting project B, thus not to be divulged to competitors.
Many of these objections and obstacles can be dealt with, at least partially:
Indexing and search capabilities will help to find that needle in the haystack of raw data.
Proprietary formats are a problem, but not insurmountable if you really need the data - I can go to the collaborators lab nearby, if I need to read raw fluorescence file from SpectraMax M5 plate reader; to process the data, many people export data from proprietary format into some sort of delimited text - and that should be there in the e-notebook, together with the processing scripts!
Finally, as for not giving the competitors clues for the follow-up: well, the very fact of publishing is a disclosure. Certainly, competitors could be prevented from following on anything if instead of you know, manuscripts, we could just publish press releases (Well, Nature and Science are halfway there already). And from the standpoint of the society as a whole, one PIs problem with competitors pursuing hot follow-on research is not a bug but a feature.
No technology can solve social problems on its own. Doing lousy science and publishing lousy papers will not magically disappear if we pile more files into the cloud storage. The social and professional pressures for cutting corners and doing your job poorly will not disappear. But technology can help to some degree. Overall I am inclined to think that benefits of such full disclosure scheme outweigh its drawbacks. Funding agencies should not limit their demands to making research performed with public funds available in the form of open access manuscripts. Keeping extensive data archives is easy. Ease of searching and interoperability need some attention, but full disclosure is eminently doable and in my not-so-humble-opinion is highly desirable.
Permalink to Comment14. Martin on October 1, 2012 5:41 PM writes...
I had always assumed that in most med chem papers, unless explicitly stated, chemical yields were always n=1. i.e. I got enough to test/move to next step.
Permalink to Comment15. Algirdas on October 1, 2012 5:44 PM writes...
The sentence with equals sign should read:
"For instance, all my published papers generated less than 10GB of junk on the hard drive per manuscript (including gels, chromatograms, spectra, etc)."
Damn you HTML!
Permalink to Comment16. Martin on October 1, 2012 5:51 PM writes...
13. ELNs are at once a blessing and a curse. I have said it before and here goes I'm saying it again. They only work, only work, where an IT infrastructure exists that can mandate at a company policy level:
"here is your work-issued PC/tablet. On it you will find Office and the ELN of brand X. No we don't use brand Y or Z. Don't ask. Our last 10 years data is in brand X's data on platform Q. You will submit all experiments in brand X's ELN on platform Q. Everything, no exceptions."
i.e. totally void in any company or academic institution that has a heterogenous PC/Mac/Linux/iOS/Android environment where "cross-platform" ELN's just don't work cross-platform (especially chemical-aware ELNs). Web-based solutions are available but the cost for small groups is prohibitive.
Permalink to Comment17. anonymous on October 1, 2012 6:58 PM writes...
Sweet. Not that all of the rules are possible in methods development chemistry: blind studies? How do you even do that with catalysts, never mind not trusting the other hyper-competitive students in your group.
At least the following correction wouldn't be an issue. And the first author wouldn't be a professor now as well, since it would never be highlighted in CEN or maybe not even published in JACS.
http://pubs.acs.org/doi/abs/10.1021/ja3066094
Permalink to Comment18. Beentheredonethat on October 1, 2012 7:50 PM writes...
#14: right on Bro. I've publish dozens of SAR papers with many yields in the 15-20% range with an n=1. I put my thumbprint on the structure and am extremely confident the structure is as advertised. Move on and try get get the drug.
Permalink to Comment19. Flyovah on October 1, 2012 10:28 PM writes...
Dear Elsevier, if you are looking for new ways to add value, may I suggest the "negative citation", like a downvote on Reddit, which authors of manuscripts that cite an unreproducible work can use to indicate the nature of their citation (of course given the social nature of science this wouldn't have to be public, just somehow appended web of science or scopus style to the offending manuscript). As it stands citations accumulated by a "controversial" paper go straight into the h-index, which doesn't exactly provide a disincentive to publish stuff that's wrong (if provocative). It would also provide pharma an opportunity to take academics to the woodshed, which we often richly deserve.....
Permalink to Comment20. Student on October 1, 2012 11:07 PM writes...
Not that it matters much (since everyone reading here is aware of the issue), but there have been a few higher up academics probing their institutions to see if trainees and the like feel they are pressured into certain results.
Permalink to Comment21. Jose on October 2, 2012 3:43 AM writes...
re: "negative citation"
http://chem.chem.rochester.edu/nvdcgi/mojo.cgi
And re fraud, check out Benford's Law. I can guarantee the results from certain labs would fail.
Permalink to Comment22. Canageek on October 2, 2012 3:01 PM writes...
I've always wondered why they don't just ask for a small sample of the product before publishing. Oh, you made 1,5-dimethyl-ohgodthatexistsicene? We need enough to run an MS or NMR or such on it. Would it really be that hard? Natural products would have different C14 ratios then those extracted from biological sources, right? How hard would it be to test this?
Permalink to Comment23. alf on October 2, 2012 4:57 PM writes...
Lack of reproducible work is a bigger issue for studies with complex reagents and animal models. I believe selection bias is more common than people think. What worries me the most is how easy it is for people to throw out data because "experiment didn't work".
Permalink to Comment24. sepisp on October 3, 2012 3:33 AM writes...
With the current trends, the scientific community is heading straight for the cliff at full speed, with no signs of hope. The essence of science is that the researchers themselves are truthful, critical and skeptical. I can imagine everyone has had the moment, when a supervisor/professor replies a report with "This result is wrong, please correct it". The right thing to do, scientifically, is to dissect the experiment until you find the reason why the result was as it actually was, and then reply the supervisor with a good analysis. But, the fact is that like the rest of the economy, science is being sinified. (I recently visited an American research group, which consisted of about 70% Chinese.) The problem with Asian cultures is that they place extreme weight to obedience and "saving face", even at the expense of correctness. This is disasterous in combination with the same pressures everyone else already has. If this problem goes unchecked, the scientific literature will be flooded - and eventually choked - with unrepeatable or irrelevant results. There are already respected professors who get 99% total yield from a 10-step total synthesis with 99% yield at each step (check the math). The problem isn't that this happens, it's that *nothing is being done about it*; even worse, there are no signs of improvement.
Permalink to Comment25. James on October 3, 2012 8:17 AM writes...
And this is just the chemistry literature. One shudders to think about how reproducible the results in medical and psychological journals are.
Permalink to Comment26. Dave on October 3, 2012 11:40 AM writes...
I suppose all of us being the sophisticates that we are, Rule #0 need not be mentioned?
Permalink to Comment0. What were the financial and career pressures for the author(s) in publishing the report?
27. Matt on October 5, 2012 12:07 PM writes...
On "negative citations": I was involved in some research in information retrieval in which we were using computers to analyse citations in biomedical papers to classify them into different categories according to the function the citations seemed to have: results the present paper is reproducing; vaguely related work cited to show background; work we disagree with; and so on. One obstacle was that the "negative" categories were almost entirely absent in the papers we analysed. Researchers, at least in the fields we were looking at, just don't cite each other in that way, or at the very least, they do it in such veiled language that it's hard for even skilled humans, let alone our machine-learning systems, to detect. Maybe in some other academic disciplines (I'm thinking ones like literary analysis where it might be easier to find opposing "camps" of opinion) it might be more common. But I fear that such a feature implemented by a publisher like Elsevier would simply go unused, as authors would be collectively unwilling to go on record with such comments about each other's work.
Permalink to Comment