About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« The Animal Testing Hierarchy | Main | Drugs and Money »

January 30, 2008

Recycle, Reuse, Republish

Email This Entry

Posted by Derek

There’s an analysis in the latest Nature that puts some numbers on a problem that scientists the world over have suspected for some time: the number of duplicate papers that show up in the literature. The authors used this online text-similarity tool to go through papers in Pubmed, and found a small (but not as small as it should be) percentage of papers that seem to be the same damn things, recycled.

As it turns out, the “most similar papers” function over on the right-hand side of the Pubmed results was a good starting point for tracking these down, and this shortcut allowed them to search the entire Pubmed database. The authors have set up a web site where they've deposited their data and their lists of duplicate papers. Out of about 7 million abstracts, some 70,000 were flagged as being highly similar to their corresponding "most related article" on Medline. Manual checking suggests that about 50,000 of these are going to be true duplicates - they've gone through about 2700 by hand so far (statistics here).

They have drawn some preliminary conclusions from their data set. For one thing, duplication seems to have been steady or trending down in the database during the 1990s, but has been increasing since 2000 (and is currently at the highest level). Their explanation - the rising number of print and online journals, making copying easier to perform and harder to detect - seems right to me. Another interesting graph is the frequency of duplicates by country of origin, versus that country's relative contribution to the Medline database as a whole. Looked at that way, the US is under-represented in the duplicates (which is good to know), and Japan and China are quite over-represented. Several explanations for this are considered – original publication in a language less used for scientific publication, followed by a chance to expose the same work to a wider audience, for one. But the authors don't hesitate to cite "differences in ethics training and cultural norms" as a factor, too.

A further fascinating detail is that the papers which seem to have been duplicated in different journals by the same author (or authors) very often appear too soon after the first publication to have gone through the reviewing process sequence. In other words, they were most likely submitted simultaneously to both journals, which isn't a nice thing to do. By contrast, when the same stuff appears under someone else's name, there's generally an appropriate time lag.

This study notes that their manual inspections have, so far, found over seventy cases of what looks like outright plagiarism, and that they're starting to contact journal editors and universities for more details. And they also seem to have found a number of what they term "serial offenders", and are investigating those cases as well. They don't go into details, but my guess is that some of those people could possibly be found here.

Their hope is that if such authors realize that such tools exist, that plagiarism and duplication will be seen as more risky. Thus all the publicity. Want to try it out yourself? The list of potential duplicates can be found here. Here's the list of journals, and you can plug those into this search page and see what you come up with. Here are some of the manually checked papers - click on the left-hand side ID number to see a side-by-side comparison.

Comments (16) + TrackBacks (0) | Category: The Dark Side | The Scientific Literature


1. Greg Hlatky on January 30, 2008 12:50 PM writes...

Perhaps there's room for a new series of journals: Journal of Plagiarized [fill in discipline] Research.

Some years ago, I gave a paper at an ACS meeting, for which a preprint was required. I wanted the results published in a journal where it would get more visibility. When I later submitted substantially the same paper to that journal, I explained the problem to the editor and included a copy of the preprint. The editor cleared it for review and publication.

Permalink to Comment

2. A-non-y-mous on January 30, 2008 12:52 PM writes...

Wow, some of these serial offenders are brave . . . and lazy, they don't even re-word the abstract. I know I shouldn't be shocked, but I am. I can't not compare the articles. Thanks for the links.

Permalink to Comment

3. Gerald Bothe on January 30, 2008 1:07 PM writes...

My apologies for an off-topic question - I need a source for 4-Epidoxycycline and as a mere biologist I don't know how to find one. Could anybody help me? Can answer directly to



Permalink to Comment

4. reevej on January 30, 2008 4:48 PM writes...

this is a test.

Permalink to Comment

5. azmanam on January 30, 2008 7:23 PM writes...

Haven't had the time to go through very rigorously.

Do you know if their 'similarity tool' includes total synthesis communications later published as full papers?

The intros might look similar, as well as some of the language for navigating through reaction schemes.

Permalink to Comment

6. Rhenium on January 30, 2008 8:26 PM writes...

Wow... JACS and a slew of other high profile journals. I'm suprised no none has commented on this yet. Now it's out in public forever.

Still Etblast will be a handy tool for when I review journal articles in future.

Permalink to Comment

7. macabre on January 30, 2008 10:27 PM writes...

Mulzer's recent Pasteurestin A tot. syn.

Basically the same synthesis as Vollhardt published 15 years ago.

Not sure what is worse, doubling up your own work if papers are a bit slow that year or blatantly copying already published work

All in all, very depressing.

Permalink to Comment

8. Anonymous BMS Researcher on January 30, 2008 11:09 PM writes...

Somebody close to me was once a journal editor; on more than one occasion referees called her attention to likely plagiarism -- needless to say these manuscripts did NOT get published! I wonder how many cases of plagiarism get detected before publication versus the number that slip through without detection and get published?

Permalink to Comment

9. Anne on January 31, 2008 12:40 AM writes...

I hope I don't sound like a hopeless Neanderthal here, but what *are* the ethics of republishing a paper? Let's take, for example, a paper that goes in a conference proceedings in an abbreviated form and is then fleshed out and submitted to a normal journal. Okay or not? What about vice versa? Or a monster paper packed with technical details submitted to one journal, accepted and published, and then trimmed down to reveal the central facts and submitted for publication in a flashier journal? Or how about a thesis that generates a series of papers based on its chapters (tidied up to suit the audience)? Cribbing wholesale text used for one proposal (requesting telescope time) to go into another (both by the same author of course)? Cribbing "motivation" text from one paper to go into another paper studying the same phenomenon?

I'm genuinely unsure what the ethical rules are for, well, all those cases above; there are plenty of others that are plainly unethical (cribbing text from someone else's paper) or where they're clearly borderline (rewriting a paper so that it counters arguments made by another paper in press without citing the other paper). But it seems like a certain amount of textual similarity is inevitable. If your field of research is the mysterious 511 keV emission from the galactic centre, all your papers should have an explanation of what the observed excess is and why it's mysterious; is there any reason those introductions shouldn't be quite similar?

Permalink to Comment

10. RKN on January 31, 2008 7:02 AM writes...

Pretty interesting. I wonder if reviewers now blast the abstract/intro of submitted papers to check for this sort of thing before accepting them for publication.

Permalink to Comment

11. Ken Knott on January 31, 2008 1:59 PM writes...

I'm surprised by the lack of comments on this. To me this is fascinating... And some of the serial offenders are truly ridiculous with the sheer amount of plagiarism. I'd love to be a grad student in their group and call those professors asking for explanations... I would be very interested to hear the responses of the offenders and their universities, not to mention the journals....


Permalink to Comment

12. RKN on January 31, 2008 8:04 PM writes...

Pretty interesting. I wonder if reviewers now blast the abstract/intro of submitted papers to check for this sort of thing before accepting them for publication.

Permalink to Comment

13. Bunsen Honeydew on February 1, 2008 9:02 AM writes...

As troubling as a lot of this is, there is one part that I don't really have a problem with and that's duplicate publication in different languages. If someone wants to publish a paper in Chinese, Japanese, or Korean, and then publish it in English, I'm not sure I have a problem with that. Am I ever going to see that non-English paper? Aside from SciFinder, no. Could I ever read the non-English paper? No. Do I want to go through the hassle of getting that non-English paper and getting it translated? No. Am I happy that that paper appeared in English? Yes.

Now, all that being said (typed? written?), I can read papers in French and German without too much trouble. But if it's in a non-Latin text, forget it.

I also don't want to see national journals disappear. I believe that there is still a place for journals like Helvetica Chimica Acta, Australian Journal of Chemistry, and others- especially if the majority of articles are not in English. The authors are trying to reach two groups of people- the broader chemical community and their national community. Sometimes, in order to reach both groups you need to publish in two places and the groups are largely mutually exclusive.

Permalink to Comment

14. Jose on February 1, 2008 12:56 PM writes...

"Re-issue ! Re-package ! Re-package !
Re-evaluate the songs
Double-pack with a photograph
Extra Track (and a tacky badge)..."

Permalink to Comment

15. Charlotte on February 1, 2008 3:33 PM writes...

Derek, thanks for linking this - I'm a journal editor, and I've spent the last day or so going through the 60 hits in my journal - most of which I'm pretty comfortable with, as they're clinical guidelines and similar, but there's certainly a couple I'm expressly unhappy with.

I adore reviewers who are on the lookout for, and call attention to plagiarism and other examples of publishing dishonesty. They are treasured individuals. We've caught poor ethics at every stage, but a sharp-eyed reviewer is far more effective than I am when I triage a paper, or when my copyeditor's ploughing through it. We're looking into doing more (and all suggestions are welcome!), but it'll be a happy day in the editorial office when ManuscriptCentral and EES have an inbuilt text similarity scanner.

Permalink to Comment

16. anon on February 18, 2008 2:33 PM writes...

Fascinating article and it is fun to poke around in the results. The one search I did for my old advisor yielded a pair of almost identically worded abstracts but two articles with substantially different results (to my mind, at least).

Permalink to Comment


Remember Me?


Email this entry to:

Your email address:

Message (optional):

The Last Post
The GSK Layoffs Continue, By Proxy
The Move is Nigh
Another Alzheimer's IPO
Cutbacks at C&E News
Sanofi Pays to Get Back Into Oncology
An Irresponsible Statement About Curing Cancer
Oliver Sacks on Turning Back to Chemistry