Correlation of metrics with expert judgement

This Correspondence by Stevan Harnad of the universities of Montreal, Canada and Southampton, UK, was published in Nature last week (Nature 457, 785; 2009):

Your Editorial ‘Experts still needed’ (Nature 457, 7–8; 2009, free to access online) is correct in that no metric alone can substitute for expert evaluation, because no single metric (including citation counts) is correlated strongly enough with expert judgements for it to take their place. But some individual metrics, such as citation counts, are nevertheless significantly correlated with expert judgements. It is likely that a battery of multiple metrics, when considered jointly, will be even more strongly correlated.

The UK Research Assessment Exercise (RAE) provides such an opportunity, alongside the wealth of potential performance indicators that are increasingly available online. Both enable a candidate battery of metrics — such as citations, co-citations, downloads, tags and growth/decay metrics — to be systematically validated against expert judgements, field by field. The 2008 RAE has also provided data that make it possible to do this validation exercise now, across all disciplines, on an important nationwide scale.

Measuring the scientific integrity of nations

How to evaluate a nation’s scientific integrity is the question tackled in one of the Editorials in the current issue of Nature (457, 512; 29 January 2009, free to access online). From the Editorial: "Like many emerging countries, Saudi Arabia measures itself by indices, and has developed its own index for ‘responsible competitiveness’, based on a number of metrics. But fostering strong science-based innovation requires its own metrics of inputs and achievement. So here, for any country concerned about the reputation and integrity of its research base, are some metrics that might be developed into an index for responsible scientific competitiveness."

Four main sets of metric are identified: (1) misconduct such as fraud, fabrication and plagiarism; (2) transparency and objectivity of a nation’s systems of evaluation, funding, staff appointments and promotion; (3) a nation’s framework for science policy, and the extent to which it allows talented scientists to follow their noses in the pursuit of what makes the world tick while also giving societal values and economic needs their due priority; and (4) the elusive concept of ‘openness’ — a key corollary of trust.

The Editorial concludes that, “taken together, these qualitative metrics would amount to an index of responsible science for any country, whatever its stage of scientific development. They could be measured by the documentation of structures and practices and by independent surveys of scientists.”

Previous Nautilus posts on quality measures (mostly focusing on individuals’ scientific research output, rather than on a country’s ‘scientific integrity’.)

Authors on authorship, collaboration and output measures

Publishing a paper in a journal has traditionally marked the end of a research project, but increasing numbers of academics are becoming interested in the publication process itself, according to the Editorial in the November issue of Nature Nanotechnology (3, 633; 2008). Many of these ‘papers about papers’ are concerned with citations and impact factors — researchers looking to get more citations for their papers are advised to write longer papers, work in teams or write the first paper on a topic (references in the Editorial). However, other authors have started to look behind the scenes at issues such as the changing nature of collaboration. The Editorial goes on to discuss some of these issues, including the h-index, a relatively recent yet controversial method of assessing a scientist’s output.

Previous Nautilus posts about the h-index.

Previous Nautilus posts about authorship.

Previous Nautilus posts about citation analysis.

Nature editorial on Zotero v Thomson Reuters

This is the text of an Editorial in Nature (455, 708; 9 October 2008), concluding that proprietary data formats may be legally defensible but open standards can be a better spur for innovation.

“A historian of science and computing, and a scholar whose PhD thesis was on “professionalization of cooking among domestic servants in eighteenth-century France”, might seem unlikely characters to find at the centre of a multimillion-dollar lawsuit. But that is exactly what has happened in the suit brought against George Mason University (GMU) in Fairfax, Virginia, by Thomson Reuters, the company probably best known for its ISI science indicators.

Dan Cohen, director of GMU’s Center for History and New Media, and Sean Takats, a GMU history professor, are also directors of Zotero: open-source software developed by the history centre that lets researchers organize and share their digital information iTunes style, whether it is in the form of citations, documents or web pages. Zotero is free and popular, and has attracted some 1 million downloads since its launch in October 2006.

Thomson makes the proprietary bibliography software EndNote, and claims that Zotero is causing its commercial business “irreparable harm” and is wilfully and intentionally destroying Thomson’s customer base. In particular, Thomson is demanding that GMU stop distributing the newer beta-version of Zotero that allegedly allows EndNote’s proprietary data format for storing journal citation styles to be converted into an open-standard format readable by Zotero and other software. Thomson claims that Zotero “reverse engineered or decompiled” not only the format, but also the EndNote software itself.

The company is seeking a minimum of US$10 million in damages annually until GMU halts distribution of Zotero’s new feature. It also demands that GMU “terminate” the ability of each Zotero user to use or distribute any open-source files converted from EndNote’s own data format. GMU seems ready to fight the suit; a spokesperson told Nature that the university believes it is “well within its rights”, but declined to go into further detail given the ongoing litigation. Thomson was contacted but declined to comment, saying: “It is the policy of Thomson Reuters that we do not comment on pending litigation.”

Continue reading

Video journal to be indexed in Medline and PubMed

The Journal of Visualized Experiments (JoVE) has announced that its online video protocols will be indexed in the popular US National Library of Medicine repositories MEDLINE and PubMed.

Founder and chief executive Moshe Pritsker views the MEDLINE–PubMed listing as a sign that the scientific community has accepted video-based publications. “It was a very important decision for us, and for scientific publishing,” he says.

Since JoVE was founded in 2006 with support from an angel investor, the journal has published more than 200 videos, most produced by professional videographers. It aims to improve the reproducibility of scientific results by using videos to clarify subtle experimental details. The journal was itself an experiment in video publishing and remains the only video-based scientific journal.

From Nature 455, 13 (2008).

Citation patterns in geoscience

Nature Geoscience’s September editorial (1, 563; 2008) broaches the subject of impact measures. From the editorial:

The ripples of the revolution in science evaluation have long reached the relatively uncompetitive backwaters of the geosciences. Indeed, Nature Geoscience received questions regarding its likely future impact factor before it was even accepted into Thomson Scientific’s Web of Science in April this year. So here are a few thoughts on the topic from us, long before our own impact factor (due in 2010) may skew our perspective.

Citation patterns vary hugely between disciplines. The impact factors of Nature and Science have ranged between 26 and 32 in the past few years. But a quick estimate, based on a sample of papers, suggests that geoscience papers in these journals score an impact factor of around 15 when evaluated on their own. This is high considering that the impact factors of journals publishing exclusively geoscience research have not exceeded 5 in the past several years. But far higher citation counts in the biological sciences drive up the statistics of journals that publish across disciplines.

The timescales of the publication cycle in a field determine a journal’s impact factor. These are defined as all citations in one year to citable content published in the two preceding years, thereby excluding all references more than two years from publication. This can be problematic for the slower-moving sciences. For example, the ten most cited papers in Geology in 2004 were collectively referred to about 1.5 times more often in 2007 than in 2006 — citations that have never entered the index.

For geoscientists, taking guidance from impact factors alone would mean favouring interdisciplinary journals (whereas many biologists would, for the same reasons, favour their own disciplines). It would also lead to reading preferentially short-lived, quickly cited papers over those that develop more slowly — not necessarily a good idea. Other more time-consuming ways of assessing quality are therefore needed to supplement the quick and easy number check.

2007 Journal Impact Factors are announced

The 2007 Impact Factors are now out (published on 17 June 2008). The ten Nature Publishing Group journals with the highest Impact Factors are as follows:

1 NAT REV MOL CELL BIO 31.921

2 NAT REV CANCER 29.190

3 NATURE 28.751

4 NAT REV IMMUNOL 28.300

5 NAT MEDICINE 26.382

6 NAT IMMUNOLOGY 26.218

7 NAT GENETICS 25.556

8 NAT REV NEUROSCI 24.520

9 NAT REV DRUG DISCOV 23.308

10 NAT BIOTECHNOLOGY 22.848

The Impact Factors of the Nature journals that publish original research are:

1 NATURE 28.751

2 NAT MEDICINE 26.382

3 NAT IMMUNOLOGY 26.218

4 NAT GENETICS 25.556

5 NAT BIOTECHNOLOGY 22.848

6 NAT MATERIALS 19.782

7 NAT CELL BIOLOGY 17.623

8 NAT NEUROSCIENCE 15.664

9 NAT METHODS 15.478

10 NAT NANOTECHNOLOGY 14.917

11 NAT PHYSICS 14.677

12 NAT CHEM BIOLOGY 13.683

13 NAT STRUCT MOL BIOLOGY 11.085

(Nature Photonics and Nature Geoscience are not old enough to have been awarded an Impact Factor this year.)

Readers can create their own lists of journals by subject area, title, Impact Factor or publisher, at ISI Web of Knowledge.

There is a free-access account at the ThomsonISI website which explains how the Impact Factor for journals is calculated.

Discussion of the 2007 Impact Factors, and of citation in science in general, is taking place at the Nature Network Citation in Science group, which you are warmly invited to join.

Nature Neuroscience on web traffic and citations

The June editorial in Nature Neuroscience (11, 619; 2008) discusses the relationship between web traffic and citations. The journal’s preliminary analysis indicates that the number of downloads a paper receives immediately following its appearance online correlates very well with its citation frequency years after publication. Noah Gray, one of the Nature Neuroscience editors, has written a post at Action Potential, the journal’s blog, to provide more of the details behind the data and analysis, and to initiate discussion. He writes (edited for length):

Everyone has their own pet problem with impact factors, but despite these concerns, these numbers are typically used to rate the importance or prominence of a particular journal, and thus by proxy, the importance of the individual papers published within. This is a seriously flawed use of association, leading scientists to often equate the total number of citations with scientific impact, which can be fraught with problems. Searching for an alternative measure of impact that is perhaps free of the “bias of authority” (citing a paper because it is from a famous lab) or the “lemming bias” (citing a paper just because everyone else seems to do so whenever broaching a particular subject) led us to explore readership….

The “number of downloads” measure potentially provides a piece of an alternative solution for deciphering the impact of an individual paper. In this current scientific climate where tenure and grant funding decisions are influenced by flawed metrics like impact factor, it is important to make good use of all available technology in an attempt to realize a better system of measuring the scientific impact of any particular paper. This analysis is obviously preliminary and flawed in its own ways, but perhaps metrics such as paper downloads can find a place in a compilation of aggregated stats, painting a more accurate and informative picture of manuscript influence.

The Nature Neuroscience editorial.

The Action Potential post and discussion.

Nature Network Citation in science forum discussion.

Nature Network Citation in science group homepage.

Futher reading: Connotea bookmarks “citation”

Further reading: Connotea bookmarks “impact factor” : thanks to Bob O’Hara for this library.

EMBO Reports on research ranking metrics

In his Editorial ‘Measuring success’ in the April issue of EMBO Reports (9, 301; 2008), Frank Gannon looks at the pluses and minuses of metrics used to measure the success of scientists, their institutions and whole nations. He writes: “Notwithstanding the imperfection of the metrics, the resulting league tables are having real effects: university presidents world-wide await with trepidation the outcome of the latest scores. They know that it is easier to attract staff to a university that is moving up the ranking tables and this, inevitably, is leading to policy changes. Research areas that contribute little to the overall ranking might be closed and the appointments of new faculty members will reflect, to some extent, their potential to contribute to the university’s metric success. Perversely, universities are entering a time of greater competition when co-operation might in fact be more appropriate. Governments also watch what is happening in the league tables, which translates into funding decisions. In this way, the power of the tables becomes amplified—although in keeping with the maxim quoted at the beginning of this article, such measurements will probably improve research in the long run because they stimulate competition. Often an external wake-up call is needed to end complacency and instigate much needed changes.”

Citation rates not appropriate for funding assessment

Peter A. Todd of the National University of Singapore, and Richard J. Ladle of Oxford University, write in Correspondence this week (Nature 451, 244; 2008):

On 22 November, the Higher Education Funding Council for England announced that the assessment and funding of science-based disciplines will in future be “based on citation rates per paper, aggregated for each subject group at each institution”.

Changes in performance indicators always strongly influence individual and institutional behaviour and ‘citation game-playing’ will no doubt become a staple of coffee-room conversation. What is less clear is how the citation practices of authors may influence bibliometric indicators.

Citation practices are known to be imperfect. The documented problems include excessive citation of an author’s own work. Papers cited can be inappropriate or ambiguous in their support and, in some cases, the authors may not have read the papers they cite. Authors may form ‘citation coalitions’ within research networks. They may fail to provide citations to intellectual precursors or to work reporting conflicting conclusions. There are geographical and language biases. The increasing number of many-authored papers makes it impossible to have a clean-cut general metric in which one author is associated with one paper.

Taken together, these factors represent a problematic degree of error for the proposed bibliometric system of assessment. They place added responsibility on journal editors and reviewers as arbiters of appropriate author conduct.

Unfortunately, there are no simple solutions. Currently, identifying poor citation practices is not emphasized in the peer-review process, so perhaps journals could adopt a system of random citation audits, or periodically request evidence of citation appropriateness from authors. In reality, time constraints and the sheer volume of submissions to many journals mean that such measures are unlikely to be implemented soon.

Until referencing practices improve, we would argue that using citation rates to assess performance is fundamentally flawed.