Peer-to-Peer

Report of Nature’s peer review trial

Despite enthusiasm for the concept, open peer review was not widely popular, either among authors or by scientists invited to comment.

by Philip Campbell et al.

On 1 June this year, Nature launched a trial of open peer review. The intention was to explore the interest of researchers in a particular model of open peer review, whether as authors or as reviewers. It was also intended to provide Nature’s editors and publishers with a test of the practicalities of a potential extension to the traditional procedures of peer review.

Several times during the exercise, researchers and journalists asked us whether the trial reflected a sense of dissatisfaction or concern about our long-standing procedure. On the contrary, we believe that this process works as well as any system of peer review can. Furthermore, in our occasional surveys of authors we receive strong signals of satisfaction: in the most recent survey, 74% agreed with the statement that their paper had been improved by the process, 20% felt neutral, while 6% disagreed.

Nevertheless, peer review is never perfect and we need to keep it subjected to scrutiny as community expectations and new opportunities evolve. In particular, we felt that it was time to explore a more participative approach.


The process

Nature receives approximately 10,000 papers every year and our editors reject about 60% of them without review. (Since the journal’s launch in 1869, Nature‘s editors have been the only arbiters of what it publishes.) The papers that survive beyond that initial threshold of editorial interest are submitted to our traditional process of assessment, in which two or more referees chosen by the editors are asked to comment anonymously and confidentially. Editors then consider the comments and proceed with rejection, encouragement or acceptance. In the end we publish about 7% of our submissions.

A survey of authors conducted ahead of the open-peer-review trial indicated a sufficient level of interest to justify it. Accordingly, between 1 June and 30 September 2006, we invited authors of newly submitted papers that survived the initial editorial assessment to have them hosted on an open server on the Internet for public comment. For those who agreed, we simultaneously subjected their papers to standard peer review. We checked all comments received for open display for potential legal problems or inappropriate language, and in the event none was held back. All comments were required to be signed. Once the standard process was complete (that is, once all solicited referees’ comments had been received), we also gathered the comments received on the server, and removed the paper.

At the start of the trial and several times throughout, we sent e-mail alerts to all registrants on nature.com and to any interested readers, who could sign up to receive regular updates. On several occasions, editors contacted groups of scientists in a particular discipline who they thought might be interested to review or comment on specific papers. The trial was constantly highlighted on the Nature website’s home page. Thus substantive efforts were made to bring it to the attention of potential contributors to the open review process.

Following this four-month period of the trial, during October, we ran several surveys to collect feedback as the final papers went through the combination of open and solicited peer review.

Outcomes

We sent out a total of 1,369 papers for review during the trial period. The authors of 71 (or 5%) of these agreed to their papers being displayed for open comment. Of the displayed papers, 33 received no comments, while 38 (54%) received a total of 92 technical comments. Of these comments, 49 were to 8 papers. The remaining 30 papers had comments evenly distributed. The most commented-on paper received 10 comments (an evolution paper about post-mating sexual selection). There is no obvious time bias: the papers receiving most comments were evenly spread throughout the trial, and recent papers did not show any waning of interest.

The trial received a healthy volume of online traffic: an average of 5,600 html page views per week and about the same for RSS feeds. However, this reader interest did not convert into significant numbers of comments.

Distribution by subject area

We categorized papers within 15 subject areas. The numbers of papers on the open server in each category were small. Thus any extrapolation from these numbers is uncertain.

The distribution of papers posted is shown in Figure 1. Most were from the fields of Earth/environment/climate science and ecology/evolution, with 14 and 13 papers, respectively, closely followed by physics with 11. Astronomy, immunology and neuroscience made up the bulk of the rest.

As predicted, there were fewer papers in most cellular and molecular fields, although arguably those that were open received as many comments as those in other disciplines. No papers were posted in biochemistry, chemical biology, chemistry, genetics/genomics, medical research, microbiology, palaeontology or zoology.

Figure 2 shows the distribution of average number comments received per paper. Ten subject areas received an average of more than one comment per paper: astronomy, cell biology, Earth/environment/climate, ecology/evolution, immunology, molecular biology, neuroscience, physics, plant sciences and structural biology. All of these fields featured one or two heavily commented-on papers. But it should once again be borne in mind that the absolute numbers of papers are small.

Editorial feedback

Each comment received was rated by the handling editor at the time of each paper’s decision, according to the following scale:

1. Actively unhelpful

2. Reasonable comments, but no useful information

3. Valid minor points and/or details

4. Major points in line with solicited reviewers’ comments

5. Directly influenced publication over and above reviewers’ comments

Each comment was given two such ratings: one for technical value and one for editorial value (that is, for comments regarding context and significance).

No editor judged the comments on their paper to be more than 4 in either regard, and only four comments received 4. Average scores were: editorial 2.6; technical 1.8. In other words, generally the comments were judged to be more valuable editorially than technically – this is partly due to several papers receiving comments only on editorial points. No editor reported that the comments influenced their decision on publication.

For a qualitative assessment, editors discussed the outcomes and reported the following views:

–A general sense of indifference from key contacts in their fields to the trial, and that it was like ‘pulling teeth’ to obtain any comments.

–Direct attempts to solicit comments met with very limited success.

–Biologist editors in particular were not surprised that authors in very competitive areas did not wish to be involved.

–Anecdotally, some authors were reluctant to take part due to fear of scooping and patent applications.

–Anecdotally, potential commenters felt that open peer review is ‘nice to do’ but did not want to provide any feedback on the papers on the server.

–Editors felt that most of the comments provided were of limited use for decision-making. Most were general comments, such as “nice work”, rather than adding to the review process.

Author survey

All authors who participated in the trial were sent a survey. Sixty-four people were contacted and there were 27 responses (a 42% response rate).

–20 respondents thought it was an interesting experiment.

–Of the 14 respondents who received open comments, four described them as ‘not useful’, six as ‘somewhat useful’, and four as ‘very useful’.

–Although most respondents received no additional comments by taking part in the trial (such as e-mail or phone), those who did (five people) found them either ‘useful’ (four) or ‘very useful’ (one).

–Some authors expressed concern about possible scooping and others were disappointed that they didn’t receive more comments.

–Of the 27 respondents, 11 expressed a preference for open peer review.

Conclusions

Despite the significant interest in the trial, only a small proportion of authors opted to participate. There was a significant level of expressed interest in open peer review among those authors who opted to post their manuscripts openly and who responded after the event, in contrast to the views of the editors. A small majority of those authors who did participate received comments, but typically very few, despite significant web traffic. Most comments were not technically substantive. Feedback suggests that there is a marked reluctance among researchers to offer open comments.

Nature and its publishers will continue to explore participative uses of the web. But for now at least, we will not implement open peer review.

The full version of this report, together with the figures, is in Nature‘s peer review debate web focus.

This report is discussed in an Editorial in Nature vol 444, page 971, 21/28 December 2006.

Please join the debate: let us know your views by commenting on this post, or use the category “peer review debate” on this blog to see an archive of all contributions to the peer-review debate, all of which remain open for comments and feedback.

Comments

  1. David Schoppik said:

    For what it is worth, I read two of the posted papers in my field (Neuroscience) hoping that I’d have something worth posting, as I’m a firm believer in trying new things, and agree with the sentiment that peer review can always be made better. As far as I can tell, the process was deeply flawed, though not for some the reasons that have been suggested (anonymity, competition). In my case, I knew the authors personally, and would not have had a problem providing forthright critiques of their papers in a public forum. Nor were these articles likely to yield any sort of commerically viable product: both were quite basic science.

    What OPR gets wrong was that the editors didn’t seem to have factored in that reasonable peer review (which would be the hallmark of a “good” comment) is quite time-consuming, and thus some incentive must be in order. None of my peers who are asked to review papers anonymously enjoy the process; most agree that it is a necessary time-sink, and at best, take pride in their role as arbiters of publication. To come up with an insightful commentary on a paper that gets sent to Nature (which tend to be lighter on details and heavier on hype) requires a tremendous overhead — much more than would be required to simply read the paper. Most practicing scientists don’t have the time to do what the editors seemed to be hoping would spontaneously happen. With OPR, there was absolutely no clear incentive for people to volunteer to review. Without an indication as to the value of the commentary, is it really surprising that Nature got so many “nice work” submissions? Why put effort forward to read the paper deeply enough to say more? In my case, it would have taken so much time to say something worth reading about the paper that quite frankly, I gave up on the idea. I’d read the paper as closely as I read any other paper that didn’t direcly impact my work, which, for better or worse, isn’t thorough enough to have anything to say anything in such a prestigious public forum.

    I was somewhat surprised to see so little depth in the evaluation of why there was such “indifference” in the article summary. I suggest the following: scientists are too busy, and the topics of science too broad, to expect a spontaneous discussion to erupt following pre-publication of the article. To overcome the first difficulty, it should have been clear what role the comments would play, and that role will have to be significant. I imagine that if the online community was told that each thorough review would count as if two anonymous reviewers had seen the paper, there would have been considerable commentary generated. To reiterate: without knowing that one’s voice mattered greatly, why go through the effort of reviewing a paper? Alternatively, Nature could open the forum to a select subfield, where publication of a paper in Nature could conceivably change the way people practice day-to-day science. Neither of the papers that I read would have that level of impact, but I can imagine certain sub-fields where they might. In that situation, the incentive to comment is made stronger by the immediate relevancy, which encourages a deeper reading of the articles.

  2. Radwan Abu-Issa said:

    Dear Editor,

    I think trying to improve the peer review process should not stop after the first trial and more trials with different approaches are needed.

    I would like to suggest another approach that enables the journal to evaluate the reviewers themselves. This can be achieved by asking the authors whose papers went under review to post their evaluation of the reviewers by a standard questionnaire that reflect the quality and competence of the reviewers and/or ask other reviewers to evaluate the review itself. The results of these tests can help the editorial team to pick up among the best reviewers over time!

  3. Jayanta Chatterjee said:

    The Peer review process has many drawbacks as highlighted by Nature and many other Journals.

    The main problem is that the submitted articles are reviewed by a group of scientists who are in the same field and, most probably, known to each other. This surely influences the decision, as the reviewer’s article also has the probability to be reviewed by the author who submitted the paper. It will be a good idea not to reveal the name(s) of author/co-authors of the proposed article to the reviewers.

    I have seen at least one article which was published in a high impact factor journal with wrong data and wrong material. But I had to keep my mouth shut (publicly) as I did not want to ruin my career as a postdoc in a US lab. In strict sense of honesty and ethics the paper should be retracted or at least, the authors should publish a “correction” to that effect. But it did not and will not happen. Most of the time the experiments published in papers are “verified” (only if needed) by the researchers who work in same or related field. Even if they find any “abnormality” or “misrepresentation” or so, they “try to sort things out” among themselves, without going in public (which makes “practical sense”!). This makes less known researchers or young postdocs at a very disadvantageous position. Politics in science is no less corrupt than any other field in life. Unless this is addressed in an urgent basis, common public will loose their remaining faith on scientists/researchers (but not on science) much faster and it will be tough to attract good and honest talents in the field of research (particularly if we consider the quality of life and salary of postdocs and young researchers). Probably we need to live with that to become “successful”. “Success” is much more sweet and desirable than honesty and ethics to many of the “scientists”.

    Note from Maxine Clarke:

    Most scientific journals, including the Nature journals, go to considerable efforts to ensure that the peer-reviewers of a paper are independent both from the authors and from each other. Peer-reviewers see each others’ reports at each revision stage of the submission.

    The articles in Nature’s web focus debate on peer review discussed many of these issues. See https://www.nature.com/nature/peerreview/debate/index.html.

    After publication, most journals, including the Nature journals, have a confidential corrections policy, which can and does include a peer-review component.

    Corrections of technical errors for the record are an important part of the scientific publication process. The evolution of the Internet has done much to improve general communication of such corrections, via online annotation of the paper.

  4. Robin Rice said:

    Thanks for this interesting study. It would be interesting to repeat at regular intervals in the future and see if there is change. In addition to the momentum of the open access movement gaining ground (or not) another interesting factor might be the particular generation or cohort of scientists involved. Perhaps those who are used to sharing and commenting on each other’s Facebook (or Myspace) entries at University on a daily basis would be more willing to openly comment and submit non-anonymous criticisms to research articles?

    Robin Rice

  5. CM Chang said:

    I’m currently working on my master thesis on the feasibility of a global peer review system for academics, currently where only members of universities are allowed to be “users”. So in fact the middle road between open and closed peer reviewing, as you may not know who’s peer reviewing, but you know that they’re on the academic level.

    A result that I believe could be practical under such a central, global system of academics all working with this system is that issues can get tackled more effectively and efficiently.

    For example, the issue of lack of incentive, mentioned on this blog by David Schoppik and an issue that a wholeheartedly agree on its significance, (for both open and closed peer reviewing, but probably more for open peer reviewing) can be somewhat tackled by a global system with a special credit rating. In a nutshell: Imagine a system that gives its registrants special ratings. (Anonymous) Referees who’ve contributed positively get ratings so the next time they post their papers, it’ll get ranked higher by default, and thus easier to access by the users of the system. Think google rank importance: whoever is seated on the first (couple of) pages will have a higher chance of being seen/read.

    That way, (anonymous) refereeing won’t be a thankless job anymore, and it won’t be all about the finances (such as paid peer reviewing) but actually giving them the academic credibility that they’ve earned. Since this also works with closed peer reviewing, it could be a very effective, academic reputable way of encouraging incentive to peer review and making their effort count. You invest time in the system, and the system will make sure your time is indirectly invested back in you. In a way, it might not only reduce the lack of incentive to peer review, but it might also encourage those who’ve built up their credits to do a better job on their own research and post them up for peer reviewing since they will have a higher chance of being read/reviewed.

    Naturally, this makes it that much more crucial for one central system to go global and not just another wikipedia kind of system that anyone can put on the web and “ask people to peer review”. The network effect would then be lost in the latter case.

    Well, that’s just one of the issues I want to work out in my research. I do believe this kind of concept could also contribute to reduce the time of the entire peer reviewing process, possibly increasing the quality (depending on how many are allowed to review 1 article), the open and closed issue could be somewhat tackled by the system’s ability to give those posting their work and those reviewing the option of whether they wish to be identified or not. Carefully avoiding the “open vs closed” debates and simply going with the “it’s your work: you decide what’s best for it”.

    As far as the theory goes: taking it one step further and also research the possibilities and added value of allowing journals to enter the system and have them be involved in this entire process. Possibly contributing to the cost reduction of the peer review process cycle.

    But one issue at the time, can’t be running ahead of myself just yet. I would like to say though, that peer reviewing can and should take more advantage of the things the internet/IT can offer. There’s more to IT than just open peer reviewing on the internet.

    Maxine adds: this is an interesting idea. I wonder what metric you would use for “contributing positively”. If authors would agree to post earlier versions of their manuscripts, with referees’ comments attached, this could be judged. But many authors won’t want to do that for various good (to them) reasons, and so you are back with the usual problem of how to “mark” a good referee – because of course, a “positive” referee is not the same as a “good” referee — a good referee will recognise a flaw in a paper. So, though your system sounds good in theory, it needs some working out in practice.

  6. CM Chang said:

    Thank you very much for your input and for allowing my post. I appreciate such comments a lot. I’m still early in the research, so most of it is still brainstorming in a very general sense, which makes brainstorming with other’s input that much more productive.

    My first thought on the credits/rating system was that the original author(s) would be the judge of whether comments were good or not. Perhaps with a rating system from 1 to 10, where 1 is “Not Useful” to 10 “Highly Useful”. Or if we want to be more “mathematically representative”, have it be the % impact of the change. Whereas 0 would be 0% changed based on the review to 100% changed based on the review. Strictly from the review’s perspective in this case, so if every change suggested in the peer review is actually implemented, it’d be 100% thus giving the peer review a full 10 points and if not, lower.

    Perhaps naturally, for every potential solution I can obtain, there are potential abuses. Such as how objective and fair the author would assess this? If the system is practical, no doubt the highly competitive nature of who can bring their papers to the attention of journals will simply be continued within the system, which means authors may not be falling over themselves to give good comments/reviews the right ratings. And as you say: letting the papers/reviews be accessible by everyone might not be something the authors prefer as well, for many different reasons.

    To address that issue more properly, I think I must briefly address another factor of the system: User Management. First, I’d like to say that currently, I have in mind a special “Board” or “Committee” that is responsible for overseeing certain tasks that must be carried out, but in a very discrete fashion, such as user management. As I said before: I don’t vision this system to be just for everyone, perhaps later, but currently it should be restricted to only those who seemingly have the minimum “qualifications”. In my context, that would be members of universities and other credible organizations with such members, such as NIH. But that means there has to be some sort of control on making sure that these people can register to the system in the first place. That would be one of the responsibilities and authorities of the “Board/Committee”. More precisely, they can directly or indirectly (by assigning Key Registrants, which could be special members of certain organizations, for example professors of the universities and so forth) register members to the system. I currently have no idea who they could be. Perhaps members of widely respected journals such as Nature or Science? Or some other by “the science community” widely respected organization such as NIH? It would actually conform quite a bit with what they’re already doing now, being the only ones to know who is doing what to which paper, at least for editors. The non Key Users will basically be in a blindfolded system of academics.

    Getting back to enforcing a credible credits/rating system in that same line of thought. In the “real world”, editors are the ones that oversee the papers and the reviews and make their final decision based on at least both of those types of “inputs”. One possibility could be that editors of journals (or grant funding organizations) can also be made a special type of “user” in the system. They will have access to the original/revised work, the reviews of the original/revised (if applicable) work and the ratings given by the specific author(s). For the credits/ratings system, the editors, or Key Editors, will literally be the “second opinion”. To support these Key Editors, they may be given the authority of giving access to a couple of reviewers in the system for this process (identities that only the editors can see/know). The authors may receive a function to restrict their submissions to only an x amount of users, or to x journals/grant funding organizations and so forth. One thing that this measure doesn’t address is the time between the author(s) rating the peer reviews and the “second opinion” that happens when editors process it (if at all). This could be addressed by perhaps the system not processing any ratings for a user until after those of the Key Editors class have confirmed it, or simply showing the ratings of the users with a marked color that indicates that one or more ratings on this user has not been confirmed yet.

    In any case, this measure could significantly lower the potential resistance to having “others” review their original/revised work, the reviews they’ve received for it and the ratings they themselves have given, as editors would have been able to do that in the first place anyhow. And while this may not be much of an issue because the system won’t allow them to resist getting a second opinion on the accuracy of their ratings in the first place, but from an incentive point of view, this will certainly be significant. And, more importantly, that could thus significantly improve the objectivity and thus accuracy of the credits/ratings for peer reviewing.

  7. CM Chang said:

    Another Key Editors’ tool to consider for tackling the accurate rating issue, is to somehow have IT assign ratings automatically. Comparing the original/revisited work in terms of the % of changes and its conformity with what the review covered. There are IT tools out there that can compare/synchronize different types of documents based on changes. To have IT do something like what I just suggested may not be farfetched, if not already possible. Granted, this is a very rough indication and since it doesn’t add any other kind of intelligence in comparing, nor do I see that level of appliance to be realistic for the current IT But combining the software assigned ratings and the rating assigned by the author(s) may give the Key Editor more “pre-insight”. If nothing else, using IT to sum up all the differences in a clearer format (think Word’s Track Changes but summarized), neatly summing up all the comments of the peer review and giving the % of the changes of the original/revised work, can make a significant difference in time and effort for “manually” confirming the accuracy of the ratings by the Key Editors.

    These kind of measures, which fits very well for this kind of global system, could also aid in tackling another issue: fraud/plagiarism. But that’s an issue for another day.

  8. CM Chang said:

    (I apologize for sending in multiple blog postings rather than just one. If you want to just put them all together in one post, I’m perfectly fine with that. I apologize for the inconvenience).

    Thinking some more on it now, I’ve realized that I’ve underestimated the depth of your question, and thus the depth of the issue. I’ve subsequently overestimated my proposed solutions, as it does not address the crucial difference between a “positive” referee and a “good” referee at all, or at least not accurately. As the solutions do not make a distinction between “good” reviews, which focus more on the (technical) substance, and other “positive” reviews, such as ones that focus more on issues such as grammar and spelling. Moreover, it doesn’t tackle other issues such as overall (scaling) impact on the paper. For the % of conformity rating, one positive/good comment could technically have 100% conformity and full points, while someone giving 10 good comments, but only have 9 of them actually used, receive a 90% conformity and thus lesser points. Without addressing that distinction, it’s hardly a good representation of the level of each review. If not handled properly, my solutions could in fact be simply loopholes for referees to take advantage of to get a good rating.

    Summarized, that means the system needs to account for at least 3 different types of value: substance, format/style and overall impact on the paper. As I do not want to make the same mistake as I did in the previous post on the rating mechanism, I shall refrain from commenting on it more deeply till I have something more practical in details. As many issues still need much more thought in order to provide a somewhat practical answer.

    One of the pitfalls of sharing an idea that’s still very young and where much more research is needed: addressing issues too quickly and the solution may not be a solution at all, but in fact something that misses the mark completely. Though from a scientific perspective, it’s still valuable, as by process of elimination I’ve simply found a way that doesn’t work.

  9. dibyadeep said:

    Reviewer associate some sort of prestige when reviewing for any prestigious journal , though they mightnot get any material benifit from doing so. As such open review, has no incentive neither for the author nor for the reviewer, but only for science. So though you will find a lot of enthusiasm from people to implementing this process, they personally wont want to be associated with it.

  10. Chris Blanchard said:

    I am not convinced of the applicability of the findings.

    For starters, what was different about the peer-review methodology conducted in the open than what occured in the time-tested fashion of sending hard copies to editors?

    If those two methodologies are different, then we have an experiment that won’t yield meaningful results. If the methodologies used in both open peer review and traditional peer review were the same, then what you have done is indicted the peer review system. That is, how is it that scholars found the open peer review process not at all helpful, but the traditional peer review process very helpful? If scholars use the same metrics in reviewing a paper under both schemes, then again, these findings don’t add up.

    A second point, which is probably enough for one comment, is that it makes very little sense to say that the response rate on the open peer review was low. In fact, this article states that one of the articles reviewed received 10 comments. That is somewhere between 3 and 5 times the comments that would be seen from a traditional peer review.

    Editors typically have a paper reviewed by 2-3 experts. If 10 have commented on a paper in the open process, you’ve done a remarkable thing. If those comments weren’t helpful, then again, it’s the peer-review methodology that your participants object to. It’s simple enough to ask people to fill in a form that solicits meaningful feedback.

    There is also probably quite a bit of the Halo Effect happening here as well. After enduring a 1-3 year wait on the traditional peer review process, most authors are just happy that the process is over. Thus, I am not convinced by the opening statement that a vast majority of people were satisfied with the comments they received from Nature’s editors or reviewers. Try asking your authors the same question when they are in the middle of a two year wait on feedback and they need the pub before tenure review, and I bet you see that number fall quite a bit.

    Nature should not give up on open-peer review. “mend it, don’t end it!”

    Report this comment Cancel report
    Your details

    Please confirm the words below

    In order to reduce spamming, this process ensures you are a real person and not an automated program.

  11. Noam Y. Harel said:

    In light of Nature’s well-intentioned open peer review trial in 2006, EMBO’s recent ‘transparent peer review’ announcement, and the overall spread of the Open Science movement over the past two years, perhaps it’s time for another attempt at OPR?

    Some of the comments in this thread, especially about adding incentive and quantifying contributions, could substantially improve the process.

    In the ideal future, not only manuscripts but also grant applications will be posted online for open, communal review in real time. Of course, this would require significant attitude shifts by all those involved – journals, funding agencies, universities, and scientists themselves…

Comments are closed.