I spoke here about Scigen, the program that'll concoct a load of total nonsense for you and make it look - from a distance - like a journal paper. It's a surprisingly valuable tool, since the scientific publishing world apparently has a bigger demand for total nonsense than you might think, especially after the checks clear.
The latest example of this comes from The Scholarly Kitchen, where Philip Davis generated "Deconstructing Access Points", a paper that's nothing but a string of gibberish and non sequitars from first to last. It's here (in PDF form) if you want to try reading it. You won't get far; no human could.
Ah, but what if no human bothered to? That's what happened when Davis submitted this compost pile to the Open Information Science Journal, which is one of the new Bentham "open access" journals. You see, Bentham (like some other publishing houses) has heard that this open access stuff is like, the new trend, so they've started a line of their own journals. Once your paper's accepted, anyone can access it. Of course, there is a fee up front - to be fair, there pretty much has to be, if someone is actually going to do the back-end reviewing and editing work of a real journal. But what if you don't do any of that, and just charge the fee anyway?
Yes, the paper was accepted - of course it was accepted. It was accepted despite it being an unreadable mass of pseudo-English, and despite the fact that it was sent in under the banner of the Center for Research in Applied Phrenology. (Nice touch!) Here's the acceptance letter from an assistant manager at Bentham. All Davis had to do was send $800 to a tax-free zone in the United Arab Emirates and this manuscript would be inflicted on the world.
He pulled back at this juncture, but the point had been made. As he puts it, in milder tones than I would have: ". . .it does raise the question of whether, at least in some cases, the producer-pays-to-publish model may unduly influence editorial decision-making." Indeed it does, especially with a lower-tier publisher. Too much of the scholarly publishing world is involved in this sort of thing (and too much of the conference-organizing world, too, for that matter). I know that it's hard for many people to realize this, but it really is better not to publish at all than to abet this sort of thing.
1. Lucifer on June 12, 2009 7:53 AM writes...
Lilly Sold Drug for Dementia Knowing It Didn’t Help, Files Show
http://www.bloomberg.com/apps/news?pid=20601109&sid=aTLcF3zT1Pdo
By Margaret Cronin Fisk, Elizabeth Lopatto and Jef Feeley
June 12 (Bloomberg) -- Eli Lilly & Co. urged doctors to prescribe Zyprexa for elderly patients with dementia, an unapproved use for the antipsychotic, even though the drugmaker had evidence the medicine didn’t work for such patients, according to unsealed internal company documents.
In 1999, four years after Lilly sent study results to the U.S. Food and Drug Administration showing Zyprexa didn’t alleviate dementia symptoms in older patients, it began marketing the drug to those very people, according to documents unsealed in insurer suits against the company for overpayment.
Permalink to Comment2. Indy on June 12, 2009 8:23 AM writes...
Well... They caught up fast on this since the PDF is not available anymore. :p
Permalink to Comment3. daen on June 12, 2009 9:06 AM writes...
Wonderful. I especially liked the references in Davis' paper - I would certainly read anything co-authored by the long-dead Alan Turing and the not-quite-so-long-dead Timothy Leary ...
Permalink to Comment4. John Spevacek on June 12, 2009 9:13 AM writes...
This was rather discouraging for me as I am a big fan of open access, but given the incentive (i.e., follow the money) it's going to be tough to avoid this problem recurring anywhere down the road.
Permalink to Comment5. Satya on June 12, 2009 9:34 AM writes...
Goodness, was that paper generated by a Markov chain generator?
(I pulled that term from my hat, but "Markov chain" is relevant.)
Permalink to Comment6. Anne on June 12, 2009 10:36 AM writes...
Satya: I don't think so. (Yes, the term has meaning; basically a generator with internal states that transitions randomly between them with probabilities given in a matrix.) The authors of scigen work on context-free grammars, and as I understand it scigen started as basically a joke application of what they were doing: they supplied one of their tools, which generates strings in a context-free language, to a context-free language they dreamed up for research papers.
Come to think of it, I'm not sure how you do that; it may well be some kind of Markov chain generator, though generalized to handle the potentially infinite number of states needed to handle a context-free language.
In any case, scigen was initially a lark. That its results are occasionally accepted... says something alarming about the process of science.
Permalink to Comment7. Palo on June 12, 2009 11:26 AM writes...
Derek, agree on everything, but, why "open access" instead of open access?
Permalink to Comment8. Inflatable Iquana Ready to Pounce on June 12, 2009 1:32 PM writes...
I'm just glad that the generated paper was not published, otherwise phrases like "we
Permalink to Commentdogfooded our application" may become part of the vernacular, the way Google has been made into various verb forms.
9. CMCguy on June 12, 2009 2:33 PM writes...
#8 IIRtP I think its to late as this is just the grow up version of the excuse "my dog ate my homework".
Permalink to Comment10. TFox on June 12, 2009 2:35 PM writes...
Some of the commentary on the blog notes the ethical difficulties with this kind of "experiment". Still, it's pretty funny. Judging by the long delay, and the fact that a previous nonsense submission *was* rejected after peer review, it's possible that the journal attempted to get a reviewer but failed. Maybe after four months the editor figures that the submitter has waited long enough, so decided to let it go through. Final responsibility for the quality of a paper rests with the authors, and no one else.
What I find interesting is the general quality of the writing. There are no unparsable sentences, mispellings, typos or grammatical errors. Every paragraph is well structured, opening with a topic sentence before diving into detail. There is also a surprisingly high level of internal consistency in the paper, eg the text refers to the heavy tail of the CDF in Fig 4, and Fig 4 indeed has a plot of a CDF. The nonsense is only apparent on the semantic level, but perhaps this is addressable too. It seems entirely possible that the future advances will allow computer-produced nonsense to approach that currently made by humans.
Permalink to Comment11. TFox on June 12, 2009 3:03 PM writes...
One more link: http://pdos.csail.mit.edu/scigen/blog/index.php?entry=entry050705-220421, where the author of the Scigen program describes giving a randomly generated talk to a seminar course at MIT, and having no one question it until he pointed it out...
Permalink to Comment12. Inflatable Iguana Ready to Pounce on June 12, 2009 5:34 PM writes...
CMC Guy,
Thanks for making me aware of the translation. It would have been a wasted day if I had not learned something today.
Permalink to Comment13. Anne on June 12, 2009 9:56 PM writes...
TFox, the writing is good at the syntactical level because that's how scigen works: the researchers' main line is in tools to work with abstract syntaxes, so they just wrote a syntax - word, sentence, paragraph, all the way up to whole paper - for a scientific article, and told their tools to generate one. You can actually download a tool, polygen, for doing this with your own syntaxes. Semantics are of course utterly beyond this sort of program. But I'm sure that the well-formedness of the papers helps scigen submissions get accepted; they're good enough to pass a casual glance from a non-expert. Any journal that had an editor remotely familiar with the topic wouldn't even send it out to peer review.
Permalink to Comment14. TFox on June 13, 2009 12:50 AM writes...
Okay, so I looked at their code. It looks more Mad Libs than Markov chains -- there's a lot of prewritten text embedded. Here's an excerpt:
SCI_INTRO_A Many SCI_PEOPLE would agree that, had it not been for SCI_GENERI
C_NOUN , the SCI_ACT might never have occurred XXX
SCI_INTRO_A SCI_BUZZWORD_ADJ SCI_BUZZWORD_NOUN and SCI_THING_MOD have garner
ed LIT_GREAT interest from both SCI_PEOPLE and SCI_PEOPLE in the last several years XXX
SCI_INTRO_A SCI_THING_MOD must work XXX
SCI_INTRO_A In recent years, much research has been devoted to the SCI_ACT;
LIT_REVERSAL, few have SCI_VERBED the SCI_ACT XXX
Still cool, but maybe less of an AI advance than I'd imagined.
Permalink to Comment15. Dorf on June 14, 2009 5:26 PM writes...
I recall "any sufficiently advanced (technology) will appear as magic to the _____ add your fave..."
Permalink to Comment16. zayzayem on June 14, 2009 9:23 PM writes...
I'm also happy these guys at least realised they should not actually publish the paper. It's sad to see people think they can make a quick buck by streamlining quality control procedures.
Can I ask you what you mean by: "I know that it's hard for many people to realize this, but it really is better not to publish at all than to abet this sort of thing."
Who is abetting it and how?
Permalink to Comment17. DrJimbo on June 18, 2009 6:17 AM writes...
So, it looks like there's been a casualty out of this affair, with Nature reporting that the editor has resigned. The publishers, Bentham, are bleating a bit, but basically they've been "Gotcha'd".
Permalink to Commenthttp://www.nature.com/news/2009/090617/full/459901a.html