There was an interesting piece by Steve Mollman on CNN.com yesterday (Making sense of the ‘semantic Web’) which put forward this example:
"The kiosk takes advantage of the fact that MP3 files are “things” that have already been described in ways that machines can understand. That’s because they have ID3 tags, which supply information on the artist and album. An MP3 file on an iPhone is already a semantic annotated object, which means it’s easily read by a computer.“
Now if that story had instead talked about a PDF instead of an MP3, and if XMP packet was substituted for ID3 tags, then any scholarly article could lay claim to being a ”semantic annotated object … easily read by a computer".
Yesterday’s issue of Nature was the first NPG title to go live with such marked-up PDFs. The screenshots below from Acrobat (File > Properties,
CTL-D) show what the user might see both with (bottom-left) and without (top-right) semantic markup.
Fair enough as far as that goes, but to a machine it’s a whole other game. We now have a complete bibliographic record (including DOI) embedded in the PDF using structured markup. And, moreover, we also have a solid bedrock for adding in any additional metadata should the need arise. This semantic labelling is available on all new issues of Nature and will be added to other NPG titles over the coming months.
XMP as a labelling technology could well go a long way towards addressing concerns raised by Olivia Judson in an op-ed piece earlier this week in the New York Times: Defeating Bedlam. The author decries that “downloading papers from journal Web sites” means that “access to information is easier and faster than ever before … but there’s been no obvious way to manage it once you’ve got it.” Those days may soon be over.
Now with XMP all manner of scholarly content – documents, images and other media types – can be properly labelled and many programs (not just Zotero and Papers which she reviews) can directly profit from the richness of semantic web descriptions.