« Thinking our way to the future | Main | Social Tagging for Science »

Nature.com adds metadata

Nature.com has now added metadata (using HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C's Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to "provide metadata to add semantic information to pages and sites".

Metadata is provided in both DC and PRISM formats as well as in a Google bespoke metadata format. This generally follows the DCMI recommendation "Expressing Dublin Core metadata using HTML/XHTML meta and link elements", and the earlier RFC 2731 "Encoding Dublin Core Metadata in HTML".

The actual HTML metadata sets from an example landing page are presented below.

If you view the HTML page source you should see something like the text below. (Note that you may have to scroll past whitespace which is emitted by the HTML template generator.)

<link title="schema(DC)" rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
<meta name="dc.publisher" content="Nature Publishing Group" />
<meta name="dc.language" content="en" />
<meta name="dc.rights" content="&#169; 2008 Nature Publishing Group" />
<meta name="dc.title" content="Crystal structure of squid rhodopsin" />
<meta name="dc.creator" content="Midori Murakami" />
<meta name="dc.creator" content="Tsutomu Kouyama" />
<meta name="dc.identifier" content="doi:10.1038/nature06925" />
					
<link title="schema(PRISM)" rel="schema.prism" href="http://prismstandard.org/namespaces/1.2/basic/" />
<meta name="prism.copyright" content="&#169; 2008 Nature Publishing Group" />
<meta name="prism.rightsAgent" content="permissions@nature.com" />
<meta name="prism.publicationName" content="Nature" />
<meta name="prism.issn" content="0028-0836" />
<meta name="prism.eIssn" content="1476-4687" />
<meta name="prism.volume" content="453" />
<meta name="prism.number" content="7193" />
<meta name="prism.startingPage" content="363" />
<meta name="prism.endingPage" content="367" />

<meta name="citation_journal_title" content="Nature" />
<meta name="citation_publisher" content="Nature Publishing Group" />
<meta name="citation_authors" content="Midori Murakami, Tsutomu Kouyama" />
<meta name="citation_title" content="Crystal structure of squid rhodopsin" />
<meta name="citation_volume" content="453" />
<meta name="citation_issue" content="7193" />
<meta name="citation_firstpage" content="363" />
<meta name="citation_doi" content="doi:10.1038/nature06925" />


While it is not expected that search engines will index these terms directly and that no direct SEO (search engine optimization) is intended, we think there is enough value for applications to make use of these terms. The terms are reasonably accessible to simple scripts, etc. Note that even in RFC 2731 (published in 1999) there is a Perl script listed in Section 9 which allows the metadata name/value pairs to be easily pulled out. Running this over the example page yields the following output:

@(urc;
@|MISSING ELEMENT NAME; text/css
@|MISSING ELEMENT NAME; text/html; charset=iso-8859-1
@|robots; noarchive
@|keywords; Nature, science, science news, biology, physics, genetics, astronomy, astrophysics, quantum physics, evolution, evolutionary biology, geophysics, climate change, earth science, materials science, interdisciplinary science, science policy, medicine, systems biology, genomics, transcriptomics, palaeobiology, ecology, molecular biology, cancer, immunology, pharmacology, development, developmental biology, structural biology, biochemistry, bioinformatics, computational biology, nanotechnology, proteomics, metabolomics, biotechnology, drug discovery, environmental science, life, marine biology, medical research, neuroscience, neurobiology, functional genomics, molecular interactions, RNA, DNA, cell cycle, signal transduction, cell signalling.
@|description; Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.
@|dc.publisher; Nature Publishing Group
@|dc.language; en
@|dc.rights; #169; 2008 Nature Publishing Group
@|dc.title; Crystal structure of squid rhodopsin
@|dc.creator; Midori Murakami
@|dc.creator; Tsutomu Kouyama
@|dc.identifier; doi:10.1038/nature06925
@|prism.copyright; © 2008 Nature Publishing Group
@|prism.rightsAgent; permissions@nature.com
@|prism.publicationName; Nature
@|prism.issn; 0028-0836
@|prism.eIssn; 1476-4687
@|prism.volume; 453
@|prism.number; 7193
@|prism.startingPage; 363
@|prism.endingPage; 367
@|citation_journal_title; Nature
@|citation_publisher; Nature Publishing Group
@|citation_authors; Midori Murakami, Tsutomu Kouyama
@|citation_title; Crystal structure of squid rhodopsin
@|citation_volume; 453
@|citation_issue; 7193
@|citation_firstpage; 363
@|citation_doi; doi:10.1038/nature06925
@)urc;

We look forward to seeing applications making use of this metadata and providing new value for users.

Postgenomic TrackBack

Similar items from Scintilla

Comments

And the citation_year??

Hi!

Great stuff, however, i wonder why you haven't used microformats or RDFa? - these seem to be more state of the art in embedding metadata in a html pages

Hi Noel:

Good catch. We thought we had included the citation date but it seems to have gone missing as you rightly point out. We'll get it added back in.

Cheers,

Tony

Hi Valentin:

Well, we're certainly constantly tracking the development of microformats and RDFa and will be adding these to our pages as and when we can. For the moment the page bodies are produced by a separate process and we need to bring that up to speed. We thought that adding in HTML meta tags to the page heads would be a good start point.

You might anyway want to look at the Nature Network and/or Nature News sites where we have begun deploying microformats. Feel free to comment on our implementation. We'd like to hear what folks think of this.

Cheers,

Tony

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 368 total comments.
The most recent three were on:
Powered by
Movable Type 3.2