News blog

The decline and fall of Microsoft Academic Search

Five years after it launched, Microsoft’s free scholarly search engine has fallen into shabby disrepair, failing to track even a fraction of papers published since 2011. But the team behind the product says that they are shifting their focus to a yet-to-be-released, next-generation version of the service.

A few years ago, Microsoft Academic Search (MAS) was vying with Google Scholar to be the web’s pre-eminent free scholarly search engine. Both products indexed tens of millions of scholarly documents, tracked their citations, and made profile pages for academics. MAS, which seemed to be envisaged as a research project as well as a free tool, seemed to have the edge on some features — visualizing connections between research fields, for instance. The stage was set for bibliometric battle.

But the competition never happened.  A team of Spanish researchers who study science communication at the University of Granada, led by Emilio Delgado López-Cózar, decided to compare Google Scholar and MAS. They discovered — to their surprise — that Microsoft’s product had been failing to efficiently index scholarly documents since around 2011. (Last year, it captured only 8,000-odd documents.) “Is Microsoft Academic Search dead?” they asked in a working paper published on the arXiv preprint server on 28 April.

Others had noticed the issue too, judging from complaints left on the service’s message board last year, to which the only answer given was that the company was “actively working on indexing additional content”.

A phoenix may be rising from the ashes. Asked about the collapse, a spokesperson for Microsoft Research declined to address the problem directly, writing in an e-mail:

“Microsoft Academic Search (MAS) continues as a research project within Microsoft Research. Over the years, we have used the service as a mechanism to explore various challenges related to searching scholarly works, including author disambiguation, relative influence of publications, and graphs of related authors.”

But, he added:

“In parallel, Microsoft Research began an initiative on a next-generation version of MAS, which focuses on enhancing the user experience and evolving it from a research project to an integrated offering within Microsoft’s services portfolio.  During this transition, Microsoft has maintained the features, functionality, and the ability for third parties to enter new and updated content into the existing search engine, but the majority of our focus has now shifted to this new initiative.”

He later clarified that the new version, yet to be released, would remain free. At one stage, the company had wondered whether to “evolve the service through third-party collaborators”, he said, but in the end decided to keep the product within Microsoft. The Spanish team notes that the lack of fuss about MAS’s sudden decline suggests not many people were actually using it.

Indeed, Google Scholar has far outstripped MAS by now.  It can find about 99.3 million, or 87%, of an estimated 114 million English-language scholarly documents on the web, according to an estimate published last week by Lee Giles and Madian Khabsa at Pennsylvania State University at University Park (PLOS ONE 9, e93949; 2014). ‘Documents’ include books, technical reports and other grey literature, and the computer scientists estimated the number by combining results from Google Scholar and MAS.

At least 24% are freely available, they added. In a score of well-known journals (those classified as ‘multidisciplinary’ under MAS, which includes not only Nature, Science, Proceedings of the National Academy of Sciences and PLOS ONE  but also Nano Letters, Journal of Applied Meteorology, Journal of the Royal Society Interface and others), 43%  are free, give or take an estimate error of 10%.

Even Google Scholar has its weaknesses, however, the team notes. One is that it doesn’t provide an automated way for computer programs to make searches in the tool through an application programmable interface (API), so searches must be made by hand. It was only by using MAS’s API that the team could download and randomly sample documents for their survey. And of course, quantity is not necessarily quality: Google Scholar indexes more documents than do subscription products such as Thomson Reuters’ Web of Science or Elsevier’s Scopus databases — but it may not yet match their reliability.

Comments

Comments are closed.