XMP Labelling for Nature
There was an interesting piece by Steve Mollman on CNN.com yesterday (Making sense of the 'semantic Web') which put forward this example:
"The kiosk takes advantage of the fact that MP3 files are "things" that have already been described in ways that machines can understand. That's because they have ID3 tags, which supply information on the artist and album. An MP3 file on an iPhone is already a semantic annotated object, which means it's easily read by a computer."Now if that story had instead talked about a PDF instead of an MP3, and if XMP packet was substituted for ID3 tags, then any scholarly article could lay claim to being a "semantic annotated object ... easily read by a computer".
Yesterday's issue of Nature was the first NPG title to go live with such marked-up PDFs. The screenshots below from Acrobat (File > Properties, CMD-D / CTL-D) show what the user might see both with (bottom-left) and without (top-right) semantic markup.

Fair enough as far as that goes, but to a machine it's a whole other game. We now have a complete bibliographic record (including DOI) embedded in the PDF using structured markup. And, moreover, we also have a solid bedrock for adding in any additional metadata should the need arise. This semantic labelling is available on all new issues of Nature and will be added to other NPG titles over the coming months.
XMP as a labelling technology could well go a long way towards addressing concerns raised by Olivia Judson in an op-ed piece earlier this week in the New York Times: Defeating Bedlam. The author decries that "downloading papers from journal Web sites" means that "access to information is easier and faster than ever before ... but there’s been no obvious way to manage it once you’ve got it." Those days may soon be over.
Now with XMP all manner of scholarly content - documents, images and other media types - can be properly labelled and many programs (not just Zotero and Papers which she reviews) can directly profit from the richness of semantic web descriptions.

Comments
This is interesting, thanks. All journals should do this; it's clearly good practice! A few comments: you only include one author; I can't see why the whole set of authors can't be included (unless 100 authors breaks PDF ;-). And no keywords, why not?
Also, the PDF is not a Tagged PDF. Tagged PDF is better for screen readers, and for other kinds of text mining, apparently allows better semantic analysis of the content (or so we're told). It's a simple check box, why not turn it on be default?
But drat the niggles: thanks!
Posted by: Chris Rusbridge | December 19, 2008 11:11 AM
@Chris: Thanks for comments. I'm not sure though what you mean by "only include one author" as we do include all authors (in order) as you can see (maybe just) by peering at the image in the post and also at this RDF sample I sent to the semantic-web list. What are you using to view the metadata with?
Keywords we would like to add but do not presently have them. That's something we're looking at.
Thanks for the comment on tagging. I'll follow up on that to see what can be done.
Posted by: Tony Hammond | December 19, 2008 11:26 AM
I would like that the "subject" section, which actually corresponds with the "description" section in Bridge (TM), were populated with the abstract of the paper instead of the DOI. DOI and references can be placed in another section (i.e "Headline").
Thanks
Posted by: Diego | May 15, 2009 03:04 PM
Hi Diego:
We took the opportunity to add in a full article citation to the "subject" section as we are not currently distributing abstracts in the XMP packets. Should we start adding in abstracts to the XMP we would do so using this field and relocate the citation information. I will anyway forward on your comment for internal review.
Thanks for the feedback.
Posted by: Tony Hammond | May 16, 2009 12:05 PM