Guest post from Ellen Collins, Research Information Network.
The Research Information Network is a small independent policy consultancy working on scholarly communications. We’ve existed since 2005 in various guises, working with librarians, publishers, research funders and academics themselves to understand how researchers want to find, use and share information.
Our aim has always been to create an evidence base that will help others to make informed decisions about the best way to support researchers. We’ve worked with a number of methodologies and techniques over the years to do this, qualitative and quantitative.
When Nature Publishing Group approached us earlier this year to undertake a brief and independent statistical analysis of usage and citation data for Nature Communications, we were happy to do it. They wanted a report that they could use to kick off a bigger conversation about what the data might tell us about open access and what this means for article use and citation.
The data about the 2,878 articles published in Nature Communications was easily machine-harvestable, and therefore fairly basic. For every article published between the journal’s launch in April 2010 and the end of 2013 we were given its open access status (open or not), discipline, year and date of publication, Web of Knowledge citation data and, where available, Altmetric scores. For the articles published in the first half of 2013, we were also given the number of HTML views and PDF downloads, 90 and 180 days after publication.
From this, we did some pretty simple analysis. We grouped the articles, first by subject area and subsequently by year of publication, and compared citation and usage information for open access and subscription articles in each group. We used SPSS – a standard and widely-used statistical package originally designed for social scientists and thus appropriate for this piece of social science research – and the Wilcoxon rank-sum test to identify statistically significant differences between OA and subscription content, and a simple formula to calculate effect sizes, which standardise the differences and allow comparison within and across studies (for more detail, see the full report).
We found, on the whole, a small positive effect on citations for the OA articles, and a larger positive effect for the online usage measures of HTML views and PDF downloads from the Nature Communications website. This chimes with the findings of other studies, which also suggest that OA has a positive effect on the visibility of work that is hosted online.
But it’s not a definitive answer to the question about what happens when an article is made OA, and it would be a mistake for anyone to read it as such. Our analysis was limited by a number of factors but most importantly by data availability. There are a number of other variables that could be really important in determining citation counts and especially usage from the publisher’s own website. We didn’t have any data relating to these variables so our findings can only ever claim to be partial.
As we identify within the report, open access is not limited to gold publishing models. Green OA is a very important and well-established route in disciplines such as physics, and it may be that online usage for the articles published under the subscription models would have been a lot higher if we’d had some data from, for example, arXiv. (The same might be true for biological sciences and PubMed Central.)
Usage and citation figures might also be affected by certain characteristics of the authors or their articles which may correlate with the choice of open access or subscription business models for publication. We identify some in the report – for example, an author might choose to make their best work open access, explaining the citation advantage for OA articles. But there are many others. Perhaps more senior authors are both more likely to be cited and more likely to have access to the funding needed to pay for gold OA. Authors might be more likely to share their OA articles on social media, leading to more clickthroughs to the HTML or PDF on the publisher’s website.
There will be many other factors that we haven’t considered – and that’s why we’re so pleased that Nature Publishing Group are releasing the data for other researchers to undertake their own analysis. Our work is very much a starting point, and we look forward to seeing how others take it forward. That, after all, is what research is all about.
