There’s an analysis in the latest Nature that puts some numbers on a problem that scientists the world over have suspected for some time: the number of duplicate papers that show up in the literature. The authors used this online text-similarity tool to go through papers in Pubmed, and found a small (but not as small as it should be) percentage of papers that seem to be the same damn things, recycled.
As it turns out, the “most similar papers” function over on the right-hand side of the Pubmed results was a good starting point for tracking these down, and this shortcut allowed them to search the entire Pubmed database. The authors have set up a web site where they've deposited their data and their lists of duplicate papers. Out of about 7 million abstracts, some 70,000 were flagged as being highly similar to their corresponding "most related article" on Medline. Manual checking suggests that about 50,000 of these are going to be true duplicates - they've gone through about 2700 by hand so far (statistics here).
They have drawn some preliminary conclusions from their data set. For one thing, duplication seems to have been steady or trending down in the database during the 1990s, but has been increasing since 2000 (and is currently at the highest level). Their explanation - the rising number of print and online journals, making copying easier to perform and harder to detect - seems right to me. Another interesting graph is the frequency of duplicates by country of origin, versus that country's relative contribution to the Medline database as a whole. Looked at that way, the US is under-represented in the duplicates (which is good to know), and Japan and China are quite over-represented. Several explanations for this are considered – original publication in a language less used for scientific publication, followed by a chance to expose the same work to a wider audience, for one. But the authors don't hesitate to cite "differences in ethics training and cultural norms" as a factor, too.
A further fascinating detail is that the papers which seem to have been duplicated in different journals by the same author (or authors) very often appear too soon after the first publication to have gone through the reviewing process sequence. In other words, they were most likely submitted simultaneously to both journals, which isn't a nice thing to do. By contrast, when the same stuff appears under someone else's name, there's generally an appropriate time lag.
This study notes that their manual inspections have, so far, found over seventy cases of what looks like outright plagiarism, and that they're starting to contact journal editors and universities for more details. And they also seem to have found a number of what they term "serial offenders", and are investigating those cases as well. They don't go into details, but my guess is that some of those people could possibly be found here.
Their hope is that if such authors realize that such tools exist, that plagiarism and duplication will be seen as more risky. Thus all the publicity. Want to try it out yourself? The list of potential duplicates can be found here. Here's the list of journals, and you can plug those into this search page and see what you come up with. Here are some of the manually checked papers - click on the left-hand side ID number to see a side-by-side comparison.