« Work for Nature, Go to SciFoo | Main | Nature Network bloggers among the best! »

XMP Labelling for Nature

There was an interesting piece by Steve Mollman on CNN.com yesterday (Making sense of the 'semantic Web') which put forward this example:

"The kiosk takes advantage of the fact that MP3 files are "things" that have already been described in ways that machines can understand. That's because they have ID3 tags, which supply information on the artist and album. An MP3 file on an iPhone is already a semantic annotated object, which means it's easily read by a computer."
Now if that story had instead talked about a PDF instead of an MP3, and if XMP packet was substituted for ID3 tags, then any scholarly article could lay claim to being a "semantic annotated object ... easily read by a computer".

Yesterday's issue of Nature was the first NPG title to go live with such marked-up PDFs. The screenshots below from Acrobat (File > Properties, CMD-D / CTL-D) show what the user might see both with (bottom-left) and without (top-right) semantic markup.

pdf_props.png

Fair enough as far as that goes, but to a machine it's a whole other game. We now have a complete bibliographic record (including DOI) embedded in the PDF using structured markup. And, moreover, we also have a solid bedrock for adding in any additional metadata should the need arise. This semantic labelling is available on all new issues of Nature and will be added to other NPG titles over the coming months.

XMP as a labelling technology could well go a long way towards addressing concerns raised by Olivia Judson in an op-ed piece earlier this week in the New York Times: Defeating Bedlam. The author decries that "downloading papers from journal Web sites" means that "access to information is easier and faster than ever before ... but there’s been no obvious way to manage it once you’ve got it." Those days may soon be over.

Now with XMP all manner of scholarly content - documents, images and other media types - can be properly labelled and many programs (not just Zotero and Papers which she reviews) can directly profit from the richness of semantic web descriptions.

Postgenomic TrackBack

Similar items from Scintilla

Comments

This is interesting, thanks. All journals should do this; it's clearly good practice! A few comments: you only include one author; I can't see why the whole set of authors can't be included (unless 100 authors breaks PDF ;-). And no keywords, why not?

Also, the PDF is not a Tagged PDF. Tagged PDF is better for screen readers, and for other kinds of text mining, apparently allows better semantic analysis of the content (or so we're told). It's a simple check box, why not turn it on be default?

But drat the niggles: thanks!

@Chris: Thanks for comments. I'm not sure though what you mean by "only include one author" as we do include all authors (in order) as you can see (maybe just) by peering at the image in the post and also at this RDF sample I sent to the semantic-web list. What are you using to view the metadata with?

Keywords we would like to add but do not presently have them. That's something we're looking at.

Thanks for the comment on tagging. I'll follow up on that to see what can be done.

I would like that the "subject" section, which actually corresponds with the "description" section in Bridge (TM), were populated with the abstract of the paper instead of the DOI. DOI and references can be placed in another section (i.e "Headline").
Thanks

Hi Diego:

We took the opportunity to add in a full article citation to the "subject" section as we are not currently distributing abstracts in the XMP packets. Should we start adding in abstracts to the XMP we would do so using this field and relocate the citation information. I will anyway forward on your comment for internal review.

Thanks for the feedback.

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 382 total comments.
The most recent three were on:
Powered by
Movable Type 3.2