Nascent

Agile Descriptions

plymouth_st1.jpg agile_descriptions.jpg meeting_room.jpg

Last Thursday week, March 8, Library of Congress held a public meeting of the Future of Bibliographic Control Working Group hosted by Google at their Mountain View home. The theme of the meeting was ‘Users and Uses of Bibliographic Data’. This was the first in a series of three meetings which is part of a one-year review (Nov. ’06 through No.v ’07) where LC (and by extension other libraries) may want to invest their future budgetary allocations on providing descriptors of curated works of intellectual property both for collection maintenance and resource discovery. This quote from Karen Coyle’s meeting summary may be apt:

“This committee has a huge task: to define the future of “bibliographic control.” No one defined the term bibliographic control during this meeting, and in fact it was rarely voiced as a term. That may be for the better, because it describes something that libraries have traditionally done, and at least some people are suggesting that we shouldn’t do it in the future. Thus the “future of bibliographic control” may be an oxymoron.”

Meeting agenda is posted here. Besides the working group and speakers there were some 50 members of the public mainly from California.

So, a disclaimer up front. This post is focused on my presentation (powerpoint is here) as the only publisher speaking and reflects our ongoing interests in fielding machine-readable metadata to support value-add services. This is in contrast to the traditional library focus on human cataloguing, an activity about which we as a publisher might be said to be largely ignorant. (For an impressive overview of the other presentations see Karen Coyle’s notes of the meeting on her blog Coyle’s InFormation, March 7-9 entries. Also a WG summary of the meeting is given here.)

I felt humbled to apologize at the beginning of my talk for the title (but it was just simply irresistible and called out to be played upon before the ‘agile’ meme gets all used up), also for being predominantly a journals publisher (although we are now branching out into a number of database publishing projects as this blog posts on from time to time) with a fairly narrow perspective on descriptions of objects and processes, and lastly for being just plain unremittingly technical with a weird emphasis on angle brackets – but then that’s just me. So, to the talk.

Digression #1 – That Photo

I could not, however, come to apologize for the photo which I used as a backdrop to the slides (see middle panel above). It was shamelessly lifted from Flickr and I thought suited well the general theme of my talk. I initially noted that the photo was available for public use but did endeavour one Saturday afternoon to track it down for proper attribution sifting endlessly through never-ending mounds of Flickr photo clusters – alas, to no avail. I later contrived to lose the original JPEG and with it all the EXIF data goodness buried within which would have provided me at least a timestamp. I had thought that the photo was from Berlin and this seemed to be confirmed at the meeting by Murtha Baca, Head of Standards & Vocabulary Programs at the Getty Research Institute who immediately recognized the portrait as being that of the artist Joseph Beuys and was kind enough to tell me after and to follow up with further details in an email.)

Digression #2 – That Room

The large meeting room (see right panel above) at 1500 Plymouth Street (see left panel above) was interesting in its own right. Large and bright with the required primary colour seating in centre and chill-out chairs arranged higgledy-piggledy at back, it had meeting rooms immediately off to left and right which were in general use with people coming and going. On the left was an elevator in use throughout the meeting with at one time a young man and a motorized scooter. Behind on left was an open kitchen area and at back was a general passageway with a stream of bright young things coming and going. (Hey, I’m only remarking that as I’m rather envious not to be one of them. 🙂 So, while the main meeting was in progress there was a constant tide of comings and goings conducted very openly and generally unobtrusively. I’m not sure I’ve seen anything quite like this before and wondered if it might be some kind of metaphor for a new busy-broom-sweeps-clean, bristling-and-bustling enterprise which is growing up all in a hurry.

So, what did I talk about? Well, the usual suspects: web feeds – tagging – microformats – use cases on collaborative filtering & social networking – and OTMI, our proposed open text mining interface.

Web feeds are well established and can be viewed properly as pure rivers of metadata. (This found an echo later in Lorcan Dempsey’s summaing up in which he talked about Eric Hellman’s picture of lakes and rivers, with library collections likened to lakes contrasted with rivers of open data flows circulating through the Web.) Feeds differ from web pages in being generally well structured (XML), semantically rich (markup is descriptive rather than presentational), and updated on a periodic basis.

As a descriptive language tags and tagging are decidedly ‘street’. (Hence the background image of graffiti.) Tags are often treated as simple text tokens and passed off as dc:subject or prism:category or somesuch, but sometimes they are accorded a unique identifier – a URI, although there are naming authority and persistence issues. Examples of tagging markup from del.icio.us, Flickr, CiteULike, Connotea, Unalog and HubMed are revealing – each showing their own proclivities.

Microformats – the poorboy’s semantic web. An XMDP profile here, a GRDDL transform there, and we can upgrade (or should that be downgrade?) to an RDF serialization that can be merged into a larger, more general graph. Throw away the APIs, the data can speak for itself. Pentecostal, almost. Smart data. Examples of microformats actively being used (hCard, hCalendar, etc.) were shown for the new Nature Network social networking site.

And so to OTMI. A couple of slides for context, and then a general anatomy of what goes where within the Atom Entry document that we use as the vehicle for delivering OTMI. Basically, OTMI makes scholarly articles available as word vectors (terms with counts) and ‘snippets’ (the full text in non-document order). These are provided by subsection: ‘methods’, ‘conclusions’, ‘other’. Also available are figure captions and references.

This was just a brief tour of some of the data services that Nature is currently looking at. The emphasis is very much on adding production-grade metadata routinely during the publishing workflow rather than on handcrafting of document records.

ps/

And many thanks to Helga – our GPS car nav system voice – for getting us there. She did an excellent job although once or twice got a trifle tetchy when we deliberately flaunted her directions and she had to rein us back.

Comments

Comments are closed.