Quality and value: Statistics in peer review

David Ozonoff

Researchers need reviewers to check their stats.

Statistical methods are widely used in many areas of natural science, especially in my field of research, epidemiology. Although statistical procedures are often viewed as a black art, or as a black box, they are not limited to specialists. With today’s computing power and software, researchers can and do use computationally intensive methods of great complexity, often leading to the use of techniques that are more sophisticated and powerful than necessary. Many researchers have trouble interpreting the results, or interpret them incorrectly. Clearly, this is a matter for peer review.

Yet an enduring problem for journal editors is obtaining the services of expert reviewers. It is conventional to have at least two subject-area reviewers for a submitted paper, and their expertise tends not to be in statistics (except for purely methodological papers).

A lack of experts

Even in observational sciences, such as epidemiology, which routinely make heavy use of statistical methods, expertise is focused on accounting for the subtle workings of different kinds of systematic error (bias) that can affect comparisons between groups of people. Only secondarily are epidemiologists concerned with random variability and noise, despite the importance of these factors in the interpretation of their results.

Epidemiologists customarily rely on biostatisticians when designing, analysing and interpreting their studies. They learn the fundamentals of handling random variation just as molecular biologists learn physical chemistry, by taking often difficult courses and by using statistical methods in their work. But they are not statisticians any more than molecular biologists are physical chemists.

Obtaining two reviewers with appropriate specialist expertise is difficult enough without requiring yet another reviewer to evaluate the use of statistics in a paper. Statisticians are in unusually high demand. For the many fully engaged in (and sometimes overwhelmed by) service and support of clinical trials or other studies, the unpaid labour of peer review is at the bottom of their priority list.

Some journals tackle this problem by adding a ‘triage’ checkbox for reviewers to indicate whether additional statistical reviewer is needed. If a reviewer knows the statistical methods are beyond his or her expertise this may be helpful, even though it lengthens the overall review process. But the reviewer may be unaware of some of the subtle pitfalls of the methods. She or he must know enough statistics to be aware of his or her own limited expertise; many reviewers are reluctant to confess they do not know whether standard methods have been properly applied.

Unusual techniques

There are also many papers reporting new methods, or novel applications of existing methods. The authors often include the person or people who pioneered the technique and are hence uniquely placed to judge it; and the technical issues may be difficult and specialized.

In such cases there is a strong temptation for peer reviewers, especially when confronted with a brief technical appendix dense with integral signs or linear algebra, to give the authors a ‘pass’. But author expertise in an area is no reason to waive peer review. If it were, all review could be done by examining a curriculum vitae, skipping the paper itself. In such cases the only solution is to seek the services of a specialist, entailing additional delay. As the number of journals and articles keeps increasing at a greater rate than the reviewer pool, the situation gets worse.

Some high-circulation journals, such as the American Heart Association’s journal Circulation, the New England Journal of Medicine and the Journal of the American Medical Association employ paid statistical consultants or editors. But this is beyond the reach of most. And it raises another question: how important is it that the statistical methods are ‘correct’ by conventional practice? This may seem a strange question and in a perfect world no one would ask it. But the world is far from perfect. Audits of the extent of errors present in statistical methodology in the literature have been done, but as far as we know none has evaluated the consequences (A. Vail & E. Gardner Hum. Reprod. 18, 1000–1004; 2003).

If the answer is that statistical errors are not much different from those in other methods (spectroscopy, bioinformatics, X-ray diffraction), the issue becomes a more general problem. But I believe we would find that the major cost is in researcher (reader) misinterpretation of statistical results. It is still distressingly common to read that effects ‘not statistically significant’ are due to chance. Scientists need better instruction in the interpretation of statistical results.

Like most other things, statistical peer review is a trade-off in time and expense versus some unknown practical pay-off. Perhaps, in the end, we will have to fall back on the observation of one of my colleagues. “Real peer review”, he says, “begins after publication.”

David Ozonoff is an environmental epidemiologist with a special interest in mathematical methods for use in small populations. He is co-editor of Environmental Health ( and professor of environmental health at Boston University School of Public Health in Massachusetts.

Read more See this article in Nature’s web focus here


  1. Report this comment

    Bob O'Hara said:

    As a statistician who works closely with ecologists and evolutionary biologists, I feel your pain. Cetainly there are a lot of papers published with lousy stats in them, and others which could have benefitted tremendously from input from a trained statistician, but it is difficult to blame the scientists themselves. I am sure they are doing their best with the knowledge they have, but few people go into science to do statistics, and are rather shocked when they have to start analysing their data during their PhD.

    What’s the solution? Certainly better statistical training at the undergraduate level will help. On top of this, better organisation of statistical consultancy will make a big difference. Of course this needs departments and faculties to take the problem seriously, and invest significant resources. With budgets being limited, we can see why this doesn’t happen very often.


  2. Report this comment

    Douglas Kell said:

    This problem has become much worse in the post-genomic era with thousands of variables being measured on a small number of samples. Bias is especially insidious, and is not solved by increasing sample numbers (Ransohoff, D. F., 2005, Bias as a threat to the validity of cancer molecular-marker research: Nat Rev Cancer, v. 5, p. 142-9), as is the huge potential for false discoveries when multiple hypotheses are tested simultaneously (Ioannidis, J. P., 2005, Why most published research findings are false: PLoS Med, v. 2, p. e124). While individual reviewers cannot necessarily spot these kinds of problem, and certainly not in the absence of having the raw data, making all the raw data and metadata available solves this problem at a stroke as THE ENTIRE SCIENTIFIC COMMUNITY becomes the referees. If groups have to send sequence and microarray data to repositories before publication, why should this not apply to ALL omics (including spectral) data and metadata as well? Nature and other journals have important roles to play here in insisting on it. Such an availability makes problems easy to spot (e.g. Baggerly, K. A., J. S. Morris, and K. R. Coombes, 2004, Reproducibility of SELDITOF protein patterns in serum: comparing datasets from different experiments: Bioinformatics, v. 20, p. 777-85). Modern multivariate statistics and machine learning methods are powerful and dangerous, and their results need proper validation and cross-validation.

  3. Report this comment

    Douglas J. Keenan said:

    It is common to see peer-reviewed publications that have inadequate statistical analyses. As a possible solution for this, David Ozonoff discusses a “checkbox for reviewers to indicate whether [an] additional statistical reviewer is needed”. As he describes though, this solution has significant problems of its own.

    A slight variant of that checkbox, however, should work much better. This would be to ask each reviewer sign a statement saying “I have checked all the statistical analyses in the manuscript”. Of course, no reviewer would be required to sign. And those that did sign might still make a mistake (just as they might with non-statistical aspects of the paper). It is clear though that this approach would go a long way towards addressing the problem.

  4. Report this comment

    Michelle Hudson said:

    David Ozonoff quite rightly highlights the problems associated with the lack of sufficiently qualified statisticians to conduct peer review in the natural sciences. The FRAME Reduction Steering Committee (FRSC, see footnote) would like to point out that nowhere is this more important than in the case of research and testing involving laboratory animals, where the incorrect application and interpretation of statistics can lead to the wastage and unnecessary use of animals, and also to the possible invalidation of the results obtained with them.

    In addition to these problems we wish to draw attention to the crucial issues of strategic planning and good experimental design, which need to be addressed before any experimental work is undertaken. Strategic planning involves a critical appraisal of the necessity of an experiment. The experimental design should provide a clear definition of the overall objectives of the planned work, and a rationale for the way in which the study will be undertaken. The design stage should also be used to inform the application of the most appropriate statistical methods to ensure that experimental data are appropriately and efficiently obtained, analysed and interpreted.

    As with statistics it can often be difficult for reviewers to identify whether strategic planning and good experimental design have been implemented due to the usual format of published manuscripts describing work undertaken in the natural sciences. The current practice of producing an overall materials and methods section which encompasses all the experiments carried out for a particular study and then separating this from the results and discussion, hinders the assessment of the data presented in papers by both referees and the general reader alike. This is because it is often difficult to identify which data relate to which experiment, how a particular experiment, of several described in the manuscript, was undertaken, and how the experiment was planned to address one of several objectives of the investigation. A lack of adequate consideration of each of these objectives can result in poor or wasted experiments and the use of inappropriate group sizes (either too many or too few animals than are actually necessary).

    The FRSC proposes that researchers should be encouraged to give more consideration to strategic planning, experimental design and statistical accuracy through a system in which journal editors modify their instructions to authors to require a clear statement, not only detailing the overall aims and objectives of the study, but also the objectives and design of each experiment described, including how the data were collected and how they relate to the overall aims and objectives of the work. We believe that an efficient and reasonable way in which this could be achieved would be by asking authors to use more informative individual legends to data presented.

    The above is a solution that Nature has gratifyingly already implemented to some extent, but we believe that this practice needs to become much more widespread. In this way, it should be possible to improve: a) the quality of research; b) the peer review process; and c) the planning, execution and reporting of biomedical research using laboratory animals.

    Michelle Hudson MbiolSci

    Secretary to the FRSC

    Professor Robert Combes

    Chair of the FRSC

    Footnote. Members of the committee are representatives from industry and academia, with expertise in statistics, experimental design, animal welfare and alternatives research. The Committee’s main objective is to reduce the number of animals used in research, education and testing without compromising the quality of research or hindering scientific progress.

Comments are closed.