Some papers are more equal than others
There's a Commentary in this week's Nature about detecting plagiarism in scientific papers (free access this week) by using eTBLAST, a strange but seemingly effective hybrid of alignment search and heuristics originally designed to help search PubMed. Basically you give it a paragraph of text and it finds papers that contain similar words and phrases.
As many as 200,000 of the 17 million articles in the Medline database might be duplicates, either plagiarized or republished by the same author in different journals, according to a commentary published in Nature today [... the authors] used text-matching software to look for duplicate or highly-similar abstracts in more than 62,000 randomly selected Medline abstracts published since 1995.
This the second place this week that I've read about bioinformatics techniques being applied to document processing; the other was Deepak's post about IBM using the Teiresias algorithm to detect spam emails with great success. Don't know if there are any other bioinformatics algorithms that have been applied to non-biological problems? BLAST, surely, must have other some novel uses...
Anyway, the authors do mention that:
In general, the duplication of scientific articles has largely been ignored by the gatekeepers of scientific information — the publishers and database curators. Very few journal editors attempt to systematically detect duplicates at the time of submission.
Sort of. CrossRef - the academic publishing industry body that looks after DOIs for scientific papers, amongst other things - is building a plagiarism detection service called CrossCheck in association with the company that makes Turnitin, a popular piece of software used by high schools and colleges to make sure student's don't crib off of each other. If you are going to submit exactly the same paper to multiple journals in the hope of getting multiple citations then do it now...

Comments
I left a comment briefly discussing the status of CrossCheck on a different thread on Nature Network. In short, I refer to another study in which the authors noted that some students who were told not to plagiarize still did so, yet when the students were told that their papers were going to be compared with a database, plagiarism rates dropped. Of course, there might be many reasons for this, but I wondered if there might be a similar effect where publishers who just tell their authors that they are participating in CrossCheck see a decrease in plagiarism.
Posted by: Hilary Spencer | January 28, 2008 01:50 PM