« February 2006 | Main | April 2006 »

March 24, 2006

Wikipedia vs Britannica Part Deux

Further to the comparison of Wikipedia and Britannica in the 15 December 2005 issue of Nature, Britannica has taken issue with that study. Nature has responded with this statement.

I don't really have anything to add — people can judge for themselves. But for anyone who wants to read more, I'm collecting to links to this stuff on Connotea under the tag "wikipedia".

March 22, 2006

How computing will change science

This week's Nature, which has just landed on my desk (yup, I still like to see the dead-tree version ;), is a special issue on scientific computing. It's well worth a read, and all the relevant articles are freely available.

Declan Butler, one of the journalists involved in putting it together, has the lowdown on this blog:

Barely a month after Google Earth made the front cover of Nature, computing is back on the cover again. Tomorrow’s issue contains a big special on the future of scientific computing. All the articles are free, thanks to sponsorship from Microsoft; the special was produced in conjunction with the 2020 report published today by an international group of experts convened by Microsoft. The special is, however, of course completely editorially-independent of Microsoft

The special, by journalists and top computing experts, looks at some of the key emerging technologies and concepts that look set to have a major impact on scientific computing by 2020. I’ve a three pager on “sensor webs” – “2020 computing: Everything, everywhere” — in it; there is also a short pop-up box — “Batteries not included” — on the problems of powering these small remote devices.

The full list of article is here. There's also an editorial, but it doesn't seem to be online as I write. (I'll post a link in the comments as soon as I see it appear.)

Disclosure: I contributed a bit to the 2020 Science report that Declan mentions above. Whether or not people agree with its conclusions, I think I speak for all the authors when I say that we hope this Nature special issue proves to be just the first of many analyses and activities inspired by it. Microsoft Research, particularly Stephen Emmott, deserve credit for the huge effort they put into this initiative.

Jon Kleinberg on web search

I'm running a bit late with this, but last week's issue of Nature includes a book review by the ever-excellent Jon Kleinberg. He writes about The Search by John Battelle and The Google Story by David Vise & Mark Malseed.

I understand that this review will be open to non-subscribers for the next week or two, so if you don't have a subscription to Nature then (i) shame on you ;), and (ii) enjoy Jon's article while you can.

March 17, 2006

David Lipman visits Nature

We were very lucky a couple of weeks ago to have David Lipman, director of the NCBI, come to visit us in London. David was kind enough to give a talk to assembled NPG staff. Here are my notes:

PubMed/Medline records have grown linearly since the late 1960s. But GenBank and other databases show closer to exponential growth. NCBI serves up to 1.4m users a day and these users are downloading ~2.25 Terabytes each day. Generally speaking, growth in usage parallels closely the growth in the amount of data in the database.

NCBI spends 90% of its budget on people. 20% is for basic research with the other 80% involved in 'production'. 75% or more or the production side is involved with sequence data.

Overview

Main area of activity is the 'Sequence Core', which is composed of:

  • Sequence repository (e.g, GenBank)
  • Assembly, integration, annotation & curation (e.g., RefSeq)
  • Comparison and classification (e.g., BLAST)

Other activities include:

  • Retrieval Core (e.g., Entrez)
  • Text Core (e.g., PubMed)
  • Visualisation Core

Recent projects:

  • PubChem
  • GAIN (with Pfizer)
  • Framingham: Long-term study (since 1940s) of a town in MA, especially heart data. Now surviving participants being genotyped.

PubMed Central

PMC started as an archive for journals who choose to deposit their content. XML DTD adopted by HighWire, JStor, PLoS, Atypon and others.

Portable PMC allows quick setup of a local mirror of PMC (e.g., Wellcome Trust and BL in the UK).

Literature Archiving Software Suite (LASS): Takes books and articles in NLM DTD and allows search, rendering for the web, etc. Now working on a Word-based authoring tool.

PMC submission system. PMC submission rate is still very low (<5% of NIH grantees), which was predictable because it's not mandatory and makes no difference to future funding. A lot of discussion now about whether it should be made mandatory. 80-85% of grantees know about the policy, but are often sketchy on the details.

Making Entrez more user-friendly

Real-world example of using Entrez: Searches in Entrez return results from across the various databases (e.g., "Fanconi renal failure"). The entry in OMIM (annotated bibliography) mentions a family in Wisconsin with a genetic version of the disease. One of the papers in OMIM locates the gene. Skipping to Entrez Gene, we can get information about this gene: it is hypothetical and there have been no experiments on it to date. Precomputed BLAST results show this protein to be very well conserved across eukaryotes but with little experimental data. Zeroing in on yeast to look for data shows that it is sodium stress-repsonse regulator. This makes sense given its apparent role in a renal disorder.

This took about 10 mins to discover, but unfortunately most Entrez users wouldn't do this. They follow a simpler, Google-like pattern of typing queries and browsing results. In this respect, scientists don't use the web in a very different way to other types of users.

A while ago Entrez introduced a search term spellcheck (like Google's). This works really well and a lot of people click on the link to a search that uses the correct spelling. In contrast, the "Links" option provided in GenBank is very obscure and little used. The expectation that scientists would work out how to use this sort of feature proved to be incorrect.

Now trying to determine which additional links are most valued by users and give these greater prominence. For example PubMed's "Related Articles" link is used by only 4% of users. If some information (e.g., partial titles of the related articles) were presented then they ought to be used a lot more. NCBI will be trying out these kinds of changes with small proportions of users (e.g., 1% = 10k users a day) and measure the effect.

Taking full advantage of the connected information space requires more work by the server to determine what might be of most value to users (in the NCBI's case, scientists) in any given context.

March 16, 2006

Tagging Tool on the Southampton ECS EPrints Service

The University of Southampton's Electronics and Computer Science department have added the Tagging Tool to their EPrints repository.

See, for example, this item, where the tags and related articles seem to be connecting it nicely with other relevant material.

March 13, 2006

Tagging and Bookmarking In Institutional Repositories

If you're familiar with Connotea, our free online information management tool, or with the general idea of social bookmarking then you'll know what we mean when we say that we've released some software that adds tagging and social bookmarking to EPrints-based institutional repositories. On top of that, it uses tags and bookmarks to recommend related articles.

If not, then I'll try to explain.

Institutional repositories are online document archives in which researchers can deposit and share copies of their work. Perhaps the best known is PubMed Central, run by the NIH, but there are many others and NPG encourages its authors to use them. EPrints is a popular package, developed at the University of Southampton, for creating these repositories.

Social bookmarking is the process of saving your bookmarks (or links, or favourites, whichever term you prefer) on a website and making them available for others to see. Most social bookmarking services use tags to help users organize their collections. Tags are just keywords or labels for bookmarks — for example, if you bookmarked this article you might tag it with "institutional repositories", "connotea", "bookmarking" and "NPG". Del.icio.us is perhaps the best know social bookmarking service. Connotea is a similar service that is tailored specifically for use by scientists and other academics.

But back to institutional repositories: each article in a repository has an information page listing the title, authors, where and when it was published, and giving a link to the author-deposited copy. What we've done is to create an extension to EPrints that supplements this information with a Tags and Related Articles section, plugging it into either del.icio.us or Connotea. In fact, since the software behind Connotea is open source, this tool will work with any service based on Connotea Code.

The Tagging Tool, as we call it, lists tags that have been applied to the article you're viewing in the repository. Clicking on a tag brings up a list of other articles that share that tag. You can
also quickly save the article to your own collection on del.icio.us or Connotea. All this happens within EPrints, without you having to leave the article information page.

The tool also shows related articles directly. It calculates these based on shared tags and popularity — articles that share more tags with the one you're looking at will appear higher in the list, with articles that have been bookmarked by more people appearing higher than less popular articles. This method is a bit experimental, and we're hoping for feedback from repository users about this, as well as the other features. If you're an institutional repository administrator, download and install the tool — it has minimal impact on the rest of EPrints, so is easy to experiment with. If you're an IR user, please point your administrator in the same direction.

What does this development mean for institutional repositories? At a functional level, it offers an alternative way of navigating the repository content and finding relevant material — one that is based on readers' behaviour and opinions. At the wider lever, because repository content will be bookmarked directly in online, public services like Connotea or del.icio.us, it will increase the exposure of that content, and connect it more directly with the rest of the academic literature. All good things, we hope.

The work behind this development was funded by the Joint Information Systems Committee, as part of their PALS Metadata and Interoperability Projects 2 programme.

"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

Recent Comments

Powered by
Movable Type 3.2