« April 2008 | Main | June 2008 »

May 30, 2008

WikiProteins - are a million minds listening?

A couple of days ago Barend Mons and colleagues published an article in Genome Biology about WikiProteins - a new way of asking a "million minds to annotate a million [biomedical] concepts". On the face of it, it seems like an fine idea: combine text mining and other database trawling (Medline, GO, UniProt and others), distill some concept maps from that (Knowlets, they call them) and invite scientists to chip in via a wiki.

The Genome Biology article has a stellar author list, with big names in scientific databases and wikis. That said, they're taking on a tricky task in getting scientists to contribute to a scientific wiki. Of course scientists do contribute to Wikipedia heartily, but I don't know of any specialized wikis (or Professional wikis, to slip into WikiProtein's terminology) that are burning rubber on the information highways.

WikiProteins is a product of KnewCo, and it's not clear what their business model is. You'd have to guess that some level of paid-for services are planned. I could imagine that working -- if the user interface is easy enough for non-informaticians, and the Knowlets prove genuinely useful, people might pay to get automatic Knowlet Updates or to see reviews from commissioned experts. Maybe I'm wrong and Knewco has other plans. Let's hope they stay around long enough for us to find out.

Now, anybody who has read this far down this post must have at least a passing interest in literature services, so some of you might like to know about a job going at the EBI for a Team Leader in the Literature Service Development.

May 29, 2008

Stamen talk

Stamen is a hip online design outfit based in San Francisco. They're well known for working on data visualizations for Trulia and Digg, and their own high profile websites like Oakland Crime Map and Cabspotting. Last week we were lucky enough to get founder Eric Rodenbeck to come in to give us a talk, which I will now liveblog eight days after the fact...

Welcome Eric! Eric is the founder of Stamen. Stamen is a 7 person studio that mostly does mapping and visualization work, mostly on live data. They're based in San Francisco.

They try not to take stuff that's 'complete' and draw conclusions from it, rather they prefer to take flowing data and build structures for it to flow into.

ejmarey.jpg

Eric talks about EJ Marey and his work visualizing movement. It's pretty cool stuff. Single photographic plate showing multiple frames of movement.

This was the 1840s. He invented the first device to take the pulse non-invasively. Studied the flights of insects and birds, then moved onto air and water, smoke studies. Some of the time he was obviously just having fun - Eric shows a motion study photograph of two fencers.

There's a correlation with the kind of thing Stamen does. A more modern twist on this kind of thing. Eric shows some images from a Koren artist showing planes taking off from an airport - as each plane takes off it has been overlaid on the same photograph. Another example is somebody who has played a console racing game hundreds of times, recording each path through the level then overlaid them all on top of each other, as if there are hundreds of cars racing against each other.

Moving on to what Stamen does. After Tufte nobody has an excuse to produce a bad chart. But there's lots of data out there, flowing, live, bumping up against the limitations of existing viz tools.

Eric shows us the work Stamen has been doing at Digg Labs and on Cab Spotting (I would blog descriptions of these but it's far faster and cooler to click through the links and play around with the visualizations themselves). For Cab Spotting they've got access to GPS data from 400 yellow cabs in San Francisco, updated once a minute.

They approached the data in lots of different ways. First, most obvious thing: cabs are animated dots on a black screen, when a cab picks somebody up that dot flashes, yellow dots are passengered cabs, gray dots are empty cabs.

Second: leave traces according to the speed that cabs travelling at: red roads are 35+ mph on average, white roads are slower on average.

They could do a 'where's the nearest cab now' type site but somehow seems less interesting than this sort of pulsing, flowing data, showing cities as living organisms.

GPS stops working sometimes, like when cabs travel on the lower deck of the Golden Gate bridge. The data gets messy. Stamen like this, it'll average out with enough data, don't try and fuzzy it out.

Eric brings up the Trulia Hindsight housing maps. Trulia wanted to establish themselves as an expert in the real estate data field and show off the data they'd collected. They've got the location, price and build dates of properties across the US.

Stamen's first pass: they displayed dots on a map as houses were built. You can see the city growing like a mossy organism. Then they tried incorporating price data, but this didn't really work not least because the value of money over time changes.

Those were both paid projects. They do spare time stuff too. Michal Migurski sat down over the xmas holidays one year and created the first iteration of Oakland Crimespotting, which plots crime reports from the Oakland police department on a map with different colours and icons for different types of crime. Oakland city council already had a site that did this, but the interface sucked.

The Stamen version allows you to see where quality of life crimes (bums on street corners, drinking), violence, prostitution and theft correlate. Which streets have which kind of crime. Eric mentioned how as a side effect you could pick out patterns of how the Oakland PD operate - from the dates of the arrests you can see how they move up a particular highway arresting prostitutes, for example.

The site was up for a fortnight before the the city found out and shut off access. Because it made Oakland look bad, perhaps. After discussions and a local paper getting involved the city opened up the data again.

Eric suggests that Crimespotting works well because you can see everything at once and then filter unwanted stuff out, rather than having to select exactly what you need a priori.

(for some more Crimespotting goodness check out Tom Carden's blog)

Finally he brings up the London property visualizations (scroll down to see the actual applet), made in conjunction with MySociety. These mashup data on house prices and travel times in London. Sadly they're fixed in that you can only get travel times to either the olymic site at Stratford or London Transport HQ (from where they scrape the data).

Again, rather than specifying exactly what you want - to live within 45 minutes of x in a house costing no more than y - you have all of the relevant data plotted on a map of London straight away. By moving two sliders (one for maximum house price and one for travel time) you can visualize multiple scenarios quickly and easily.

Eric wraps up. He shows a venn diagram of "useful" and "cool". He reckons Stamen works in the overlap, between analysis and spectacle.

Thanks Eric! We move on to some Q&A.

May 28, 2008

Talk about Citations

Last night the British Library held the first of their Talk Science events. The topic of the evening was 'Citation in Science - don't quote me on that' and it was hosted by Tim Birkhead, is Professor of Evolutionary Biology at the University of Sheffield. Back in January Tim wrote a piece for the Times Higher Education suppliment discussing citation in science, and specifically mis-citation in science. This formed the core of the discussion last night. Tim chatted about his views on the topic for about 20 minuets and then opened it up to the floor. There were a great diversity of people present, from publishers, funders, senior academics, journalists, PhD students to library staff. There may even have been some elusive members of the genreal public (if so they were keeping quiet.) The discussions mainly revolved around the burden of perr review, the way that citation metrics change the practice of science, and how citation should be viewed and practised. I found the discussions very informative. It's a big topic, and one that is bound to get most academics animated, one way or antoher. One area the we have discussed on this blog from time to time is how people can get credit for work outside of the traditional peer reviewed literature, and this didn't really come up last night, but that again is a topic worth spending more than a little time discussing. They also served some free booze, and we all got a choclate bar (which I am polishing off as I type this). The discusison has rolled over to Nature Network so you can head over and join in if you wish.

May 20, 2008

Social Tagging for Science

bengood.pngOver the next few weeks we're going to run a short series of guest posts from people working on the sharp edge of science 2.0 who we think are particularly cool and interesting.
Our first guest author is Ben Good, a grad student at the University of British Columbia where he works on bioinformatics projects and semantic web goodness for the life sciences. As well as producing a number of cool apps and mashups he's almost certainly the first person to have used Greasemonkey in anger in a peer reviewed journal article.
 
Social Tagging for Science, now with added meaning!

By now, most readers of Nascent will be familiar with the concept of social tagging as it is displayed in services like Connotea, Flickr, and Delicious. Readers may also be familiar with the concept of the semantic web, AKA the web of data, the giant global graph, web 3.0, linked data, the structured web and so on. At a high level, both concepts are fundamentally about the same thing, the creation and use of associations between terms (broadly speaking) and the things that are represented on the Web; however, they traditionally approach the problems of creating and representing those terms and associations in different ways. In this post I will explore an emerging species of social tagging system that, as a symbiosis of social tagging and the semantic web, seems to offer powerful enhancements to both.

You say 'potatoe', I say 'potato'
You say 'sonic hedgehog', I say 'SHH'

Social tagging systems effectively let anyone connect whatever string of symbols they like to whatever Web resource they like. When I bookmark a web page with Connotea, I am free to tag it however I choose. While this makes it very easy for me to quickly create metadata that can help to organize the personal yet public information in my bookmark collection, it has the unfortunate result that, in the context of the whole web or even just in the context of Connotea, the tags are ambiguous. When you tag cell' and I tag 'cell' we may mean completely different things and, of course, when you tag 'SHH' and I tag 'sonic hedgehog' we might actually mean the same thing. Not to mention that if I tag something with 'koala', I will never be able to find it with a search for 'marsupial'. This ambiguity has the unfortunate effect that search and navigation through information collections organized by tags can sometimes leave much to be desired.

As information organization professionals have known for some time, there are quite effective ways to improve upon uncontrolled terms for the purposes of indexing. Controlled vocabularies of varying shapes and sizes are applied throughout the world where effective retrieval over large databases is required. (Such indexing is particularly important when the content indexed has no text suitable for automatic processing). Terminological structures, such as the MeSH thesaurus and the Gene Ontology, provide professional indexers in many different domains with authoritative sets of terms, linked together with meaningful relationships, that facilitate the process of indexing for later retrieval.

As part of my laboratory's research into socially distributed mechanism's for creating, maintaining, and using semantic web content for applications in bioinformatics, we are building tools to explore how the millions of terms already represented in Web-accessible controlled vocabularies can be used to enhance the process and resultant products of social tagging. Our approach is simple, provide the users of social tagging services with a way to use terms from controlled vocabularies, which we call 'semantic tags', as easily as they now use tags created by themselves. In principle, this could allow for much better organization of tagged collections through the inference and disambiguation made possible by the semantic tags. Tags that meant the same thing to their users at the time they were applied like, 'SHH', 'sonic hedgehog', and 'hedgehog', would mean the same thing at the time they were used for retrieval with consequent improvements not only in search but also in 'related-user' and 'related-tag' functionality. This is fundamentally the same approach as that taken by ZigTag, one of the first companies to incorporate around the idea of semantic tagging.

Challenges of building a semantic social tagging system

To achieve such a happily meaningful state of social semantic symbiosis, we need effective ways to connect users to semantic tags in intuitive, non-alienating ways. The experience of using a semantic social tagging system needs to be as pleasingly simple and fast as that produced by the current free-text tagging systems. The challenges to creating such a system are not insignificant. Here are a few that we've faced in the development of an extension to Connotea called the Entity Describer, (which is in its second version and in what you might call a permanent public alpha stage of development):

  • you need lots of tags
Once you have made the commitment to use a semantic tagging system, it is quite frustrating when you can't find the term you want to use in the system.
  • you need to present the users with a fast, non-annoying way to use these tags during the tagging process
The first iteration of the Entity Describer was too slow to present the tags to the users when large terminologies were made available. Of course you need large terminologies to handle the first point, so...
  • you need to devise new interfaces that let the users take advantage of their new, meaning-enhanced tag collections
Without demonstrating improved functionality for the individual, users are left wondering "what was the point of that" as they wander off to other services that attend directly to their needs. Especially for researchers, the truly intriguing aspects of these systems emerge at the level of the collective. Because of this, there is a danger of forgetting the Delicious lesson and not focusing first and foremost on improving the experience of individual users. The trees must come before the forest.
  • you need a way for the users to contribute new semantic tags or to alter the definitions for existing tags
Without such a mechanism, the system can not grow and change to meet the needs of the community. Ideally a social semantic tagging system should be both a beneficiary of and a contributor to semantic repositories.

How it works

To create a semantic tagging application that lets users author the tags for their posts (rather than trying to predict them), the user has to be presented with a way to quickly and easily find the tags she needs. So far, all of the interfaces I've seen use some form of a type-ahead query over a centralized repository of semantic tags. The user begins to tag something as they normally would, but then they are presented with a list of candidate semantic tags based on what they type and are then allowed to select the tag that they mean to use. If the term they want can not be found, they are, at a minimum, allowed to enter it as a normal, free-text tag.

To make the type-ahead work, you need a database of semantic tags (the bigger and faster the better) on the backend and some clever JavaScript on the front. The current development version of the Entity Describer draws its semantic tags and a lot of its JavaScript from Freebase, an "open shared database of the world's knowledge". Other semantic tagging efforts, like ZigTag, Fuzzzy, and the first version of the Entity Describer, rely on their own databases of semantic tags, but, for now, the size, the openness and the quality of the provided API made Freebase a natural choice. It already contains more than 3 million terms (many garnered from Wikipedia), anyone can load whatever terms they want into it (for example, some one loaded the whole gene ontology), anyone can edit the textual definitions and the relationships that exist between the terms (called 'Topics'), and it provides a very fast JavaScript component for type-ahead search that can easily be embedded in external Web pages.

The Entity Describer works by embedding a Freebase search in the tagging form that appears in the Add2Connotea bookmarklet. Using this modified bookmarklet, user's have the choice of specifying which Types of semantic tags to search for, searching through them all, or adding in their own free text tags. Types can be used to limit the search to, for example, terms from the Gene Ontology Group or Anatomical Structures or films. When a bookmark is posted through the system, it is stored both in Connotea using just the free text forms of the tags, but also in an RDF database that captures the semantic relationships that define these strings. This database is served as a SPARQL endpoint which essentially functions as a generic HTTP API for accessing the collected data. Applications that provide access to this data are starting to be written and we welcome others to contribute their own.

So far, we've had a very positive experience using Freebase in this manner, but there are some problems.

  • data isn't provided according to semantic web standards and thus requires translation machinery to import and export from information sources that do
  • once a vocabulary such as the Gene Ontology is loaded onto the Freebase platform, there is a chance that the definitions of the terms could diverge from those in the external source
  • as Freebase isn't actually designed as a terminological resource per se, it does not, by default, contain support for common relationships such as broader-than and narrower-than
However, we feel that each of these problems can be dealt with effectively and that, in comparison to the costs of composing, maintaining, and hosting another semantic repository with similar functionality the benefits far outweigh the costs.

"... and in the darkness bind them"

To conclude, the worlds of social tagging and the semantic web are rapidly crashing together in the form of a new generation of intelligent, personal/public information organization systems. In this new symbiosis of the often separate worlds of the social and the semantic web, social tagging systems can benefit from the increased precision and opportunities for reasoning provided by semantic representations and the semantic Web can benefit from the flow of user-generated content currently shaping the rest of the Web.

May 19, 2008

Nature.com adds metadata

Nature.com has now added metadata (using HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C's Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to "provide metadata to add semantic information to pages and sites".

Metadata is provided in both DC and PRISM formats as well as in a Google bespoke metadata format. This generally follows the DCMI recommendation "Expressing Dublin Core metadata using HTML/XHTML meta and link elements", and the earlier RFC 2731 "Encoding Dublin Core Metadata in HTML".

The actual HTML metadata sets from an example landing page are presented below.

If you view the HTML page source you should see something like the text below. (Note that you may have to scroll past whitespace which is emitted by the HTML template generator.)

<link title="schema(DC)" rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
<meta name="dc.publisher" content="Nature Publishing Group" />
<meta name="dc.language" content="en" />
<meta name="dc.rights" content="&#169; 2008 Nature Publishing Group" />
<meta name="dc.title" content="Crystal structure of squid rhodopsin" />
<meta name="dc.creator" content="Midori Murakami" />
<meta name="dc.creator" content="Tsutomu Kouyama" />
<meta name="dc.identifier" content="doi:10.1038/nature06925" />
					
<link title="schema(PRISM)" rel="schema.prism" href="http://prismstandard.org/namespaces/1.2/basic/" />
<meta name="prism.copyright" content="&#169; 2008 Nature Publishing Group" />
<meta name="prism.rightsAgent" content="permissions@nature.com" />
<meta name="prism.publicationName" content="Nature" />
<meta name="prism.issn" content="0028-0836" />
<meta name="prism.eIssn" content="1476-4687" />
<meta name="prism.volume" content="453" />
<meta name="prism.number" content="7193" />
<meta name="prism.startingPage" content="363" />
<meta name="prism.endingPage" content="367" />

<meta name="citation_journal_title" content="Nature" />
<meta name="citation_publisher" content="Nature Publishing Group" />
<meta name="citation_authors" content="Midori Murakami, Tsutomu Kouyama" />
<meta name="citation_title" content="Crystal structure of squid rhodopsin" />
<meta name="citation_volume" content="453" />
<meta name="citation_issue" content="7193" />
<meta name="citation_firstpage" content="363" />
<meta name="citation_doi" content="doi:10.1038/nature06925" />


While it is not expected that search engines will index these terms directly and that no direct SEO (search engine optimization) is intended, we think there is enough value for applications to make use of these terms. The terms are reasonably accessible to simple scripts, etc. Note that even in RFC 2731 (published in 1999) there is a Perl script listed in Section 9 which allows the metadata name/value pairs to be easily pulled out. Running this over the example page yields the following output:

@(urc;
@|MISSING ELEMENT NAME; text/css
@|MISSING ELEMENT NAME; text/html; charset=iso-8859-1
@|robots; noarchive
@|keywords; Nature, science, science news, biology, physics, genetics, astronomy, astrophysics, quantum physics, evolution, evolutionary biology, geophysics, climate change, earth science, materials science, interdisciplinary science, science policy, medicine, systems biology, genomics, transcriptomics, palaeobiology, ecology, molecular biology, cancer, immunology, pharmacology, development, developmental biology, structural biology, biochemistry, bioinformatics, computational biology, nanotechnology, proteomics, metabolomics, biotechnology, drug discovery, environmental science, life, marine biology, medical research, neuroscience, neurobiology, functional genomics, molecular interactions, RNA, DNA, cell cycle, signal transduction, cell signalling.
@|description; Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.
@|dc.publisher; Nature Publishing Group
@|dc.language; en
@|dc.rights; #169; 2008 Nature Publishing Group
@|dc.title; Crystal structure of squid rhodopsin
@|dc.creator; Midori Murakami
@|dc.creator; Tsutomu Kouyama
@|dc.identifier; doi:10.1038/nature06925
@|prism.copyright; © 2008 Nature Publishing Group
@|prism.rightsAgent; permissions@nature.com
@|prism.publicationName; Nature
@|prism.issn; 0028-0836
@|prism.eIssn; 1476-4687
@|prism.volume; 453
@|prism.number; 7193
@|prism.startingPage; 363
@|prism.endingPage; 367
@|citation_journal_title; Nature
@|citation_publisher; Nature Publishing Group
@|citation_authors; Midori Murakami, Tsutomu Kouyama
@|citation_title; Crystal structure of squid rhodopsin
@|citation_volume; 453
@|citation_issue; 7193
@|citation_firstpage; 363
@|citation_doi; doi:10.1038/nature06925
@)urc;

We look forward to seeing applications making use of this metadata and providing new value for users.

Thinking our way to the future

723px-Plato_i_sin_akademi_av_Carl_Johan_Wahlbom_ur_Svenska_Familj-Journalen.png

Prediction Markets have had an interesting role to play in many areas of business and in some cases have produced interesting results about the markets, not least the predictions.

We experimented with Inklinkg for a while to run a very small prediction market amongst web publishing people a few months ago, and we correctly predicted that it was going to snow, but we only got about 5 people using it so it sort of died off quietly. In spite of this there was a strong feeling that this kind of thing could be really interesting for science and so a few weeks ago when I came across the X2 club I was really intrigued. It's a project from the Institute of the Future group which is an independent research group based in Palo Alto.

The X2 club is named after the X club, a dinner club of influential scientists in London in the 1850's. The aim of the project is to assemble predictions about the near and medium future through crowdsourcing. You can add a signal to the site, which is essentially a link to a piece of news with a short explanation of why this news points to some future trend.

That's not so novel, but what is is that you can then assemble together signals that come in to the site to create a Forecast about the future. What is also really innovative is that editors on the site pull together specific signals and forecasts and present games to the community. A game is an invitation to write about the future form a perspective imagining that that future had come into existence.

The amount of content on the site is currently quite small, a few hundred members signed up, a few dozens of signals contributed, but the quality of the contributions to date has been very high.

May 15, 2008

Indelible Ink

gutenberg-1.jpg

Considerable time and effort goes into producing print copies of journals, both here at Nature and at other scientific publishers. It's something that pains my web publishing heart. Is print really necessary? Do the benefits outweigh the costs? If they do, are those benefits to consumers... or really just to us publishers? If we dropped print altogether could the savings fund a free bar at the next NPG xmas party?

Certainly print still has the edge over online in some situations. I'm a recent convert to the print version of Nature journal - it's far easier to browse bitty front half content (research highlights, news and views, book reviews) by flicking through pages than it is to navigate nature.com. It's also more aesthetically pleasing - the layout is nicer.

Nature and other magazine style journals are the exception to the rule though. I also read OUP's Nucleic Acids Research (have to keep a hand in...) but I've never felt the need to pick up a paper copy to browse in the bathtub. When a journal is all papers then a simple eTOC conveys everything you need to know, conveniently and efficiently.

Most consumers seem to agree. Last year a study in the excitingly named Serials Review by Chandra Prabha noted that the percentage of journals held by libraries that were only available online had jumped from 5% in 2002 to 37% in 2007. For every one researcher reading a paper on paper there are hundreds reading it online and the gap is even more pronounced if you only look at students and young postdocs.

When publishers talk about the article of the future - interactive figures, semantic markup, replicable workflows, aggregated conversation - they're talking electronic versions. Until journals come out on e-paper improvements are going to be restricted to articles online. Print is already the poor cousin when it comes to functionality: it's far easier to collect a citation, follow a reference, quote or use supplemental data from an article that you're reading online.

Costs are higher when you're maintaining print versions. Though print and online have the same workflow up to a point printing, binding, storing and mailing out journals isn't cheap (economically or environmentally). Nor is cataloging, shelving or building huge new extensions to your library to house your growing back catalog.

So given the costs, limitations and lack of consumer enthusiasm, why bother?

a brief disclaimer: I'm talking about scientific publishing in general, not Nature in particular.

Tax: is one slightly disappointing reason. In much of Europe VAT is higher on electronic items than on printed ones, so to remain competitive publishers simply bundle 'free online access' with a print subscription. In the Netherlands VAT is 6% on print and 19% on electronic items - it's potentially cheaper to buy a print subscription and then bin copies of the journal as they arrive (saving on shelving costs) than to buy online access alone.

Elsevier has noticed a migration to 100% e-only in countries like Sweden (where there are VAT exemptions or reimbursements) and a more gradual change in countries like the UK (where only particular libraries can reclaim some VAT).
(from Elsevier Library Connect)

Fear of losing your subscriber base: the vast majority of a journal's audience is happy working online but what about the five percent of elderly, persnickety professors who eschew PCs and rely on paper copies? What if they're the same persnickety professors who sit on the board of the society whose journals you publish? What if you've already annoyed that society by suggesting that from now on their members won't get a nicely bound hard copy of the latest issue, simply an eTOC with links to ScienceDirect or some other mega-repository which has swallowed their brand and identity?

Prestige: Given the choice between being published on paper (hang the journal cover on your office wall, send your mum a reprint) or online only who goes for the latter? Journals that are printed also imply a broad readership. Ad buyers prefer paper too - despite (or perhaps because of? ;) ) the better measure of ROI you get from being able to track views and clickthroughs. Authors and advertisers may be a relatively small percentage of consumers... but they have a relatively large effect on a journal's bottom line.

Supposedly a solution to the first issue is forthcoming. The others I'm not so sure about.

Some further reading (from which most of this this post was cribbed):

The E-Only Tipping Point for Journals
Richard Johnson and Judy Luther

May 13, 2008

A book publisher’s manifesto for the 21st century

Sara Lloyd from Pan Macmillan (one of Nature's sister companies) is serializing on The Digitalist blog an article that she wrote for Library Trends, 'A book publisher’s manifesto for the 21st century'. I've already had the pleasure of seeing the whole thing, and can only say that you should definitely read it.

Update on 20/5/08: Sara has now posted all six parts: 1 2 3 4 5 6.

Where are we, where are we now?

screen-capture-1.png

When Jeff Jonas came in a few weeks ago to give a talk he stressed the importance of what - where - who - when questions for understanding what is going on within corporations. Science too is generally concerned with figuring things out, and earlier this month Nature ran an editoral pointing out that "Among the basic elements of scientific record-keeping, too often the 'where?' gets neglected. Now advances in satellite-positioning technology, online databases and geographical information systems offer opportunities to make good that neglect", so it was coincidentally very timely that in the same week we had Tom Coates and Seth Fitzsimmons come in to talk to us about some of the work that they have been doing at Brickhouse, the Yahoo! R&D incubator located in San Francisco. They specifically came in to talk about Fire Eagle, a location brokerage service, but before getting into the specifics of Fire Eagle Tom talked a little about Brickhouse and some of the stuff that is coming out of there.

Innovation is important for every company if for no other reason than the "Red Queen" hypothesis, in which no evolution leads to extinction by default. Finding a way to innovate and to feed new ideas into an exiting company is a classic optimization problem, you want to cast about your local space with a random sampling of ideas that will get you out of your local minimum, but not spend all of your efforts pursuing new ideas that won't settle down to anything. Many companies have tried different methods, with Google's 20% rule being a notable case in point.

One of the approaches that Yahoo! have taken is to set up a semi-autonomous group with the feel and spirit of a start up. This is Brickhouse and is located away from the main company in the heart of San Francisco. Tom characterized it as an attempt to bring the great ideas together with great people who could make these ideas come to life. He pointed out that sometimes the people who come up with great ideas are not in the best position to execute them and vice-versa.

Indeed some of the products that they have shipped, and are in the process of shipping, are pretty impressive. This approach to innovation has led to Yahoo Live, Pipes, Bravo Nation and Fire Eagle. You can get a list of some other projects here.

Fire Eagle is a location brokerage service. It helps you share your location, and gives you a high degree of control over this information. It is easy to build on top of, and to use to make location based services. The way the system works is that you tell Fire Eagle where you are. You tell Fire Eagle who, or what, you want to share that information with, and then Fire Eagle tells all of these people or services where you are. By making an open framework for the service it is easy for other people to create tools that plug into Fire Eagle for both input and output. Instead of building a N^2 infrastructure where every interconnection in the communication network has to worry about passing, parsing and verifying location-based data, the Fire Eagle service takes care of this, and reduces the complexity of allowing rich location aware services to emerge.

So why would you want to have your location known to other parties? The numbers of applications are only limited by the imagination of developers. There are many cool social, scientific and commercial things (I'm averse to using the term application in this context, as what we are talking about are things that lie on top of a piece of infrastructure, so they could be higher level pieces of infrastructure, physical objects, tools, toys, or a combination of of all of these) that could be created. Being able to see where at a conference or in a city all of your friends are, finding the closest available taxi or bus, sending your location back to a recorder in real time while you are in the field collecting data, listening to music that friends of yours listened to, on that spot, at some point in the past, being alerted to papers or stories that are being published about the region that you are visiting (Connotea supports geo tagging).

Tom said that three things informed their design decisions, creation of a service that can manifest anywhere that the network touches, a service that will play well with other services, and a service that decouples the creation and use of data.

The problem that Fire Eagle solves is that getting data is pretty hard, and the people who are good at getting the data often only have an interest in a small number of use cases for the data. Other people who have lot's of ideas of what to do with data like this can't because they can't get the data.

This is where having a location broker comes in.

Although the service has only been in beta for a few weeks, there have already been a lot of applications and tools integrated with it. Manually updating your location is going to be a burden, so the obvious sensor that can connect your location to Fire Eagle is your mobile phone, and already some mobile phone applications have built in integration with Fire Eagle. These include Zonetags and Navizon. Other services that auto-update your location to Fire Eagle include Dopplr , Plazes and Loki. Seth pointed out that this does raise the question of whether one is then recording the position of a person or of a device, but so many of the electronic trails that we leave (e.g. emails, blog posts, twitter updates) are only reflections of our existence, we are pretty used to thinking of them as sufficient representations of persons to the extent that fictionalized characters often now have blogs of their own in effect to deepen the illusion of their existence. One could easily expect to see a Fire Eagle update for some fictionalised persona at some point in the future.

Some services that consume your location include Brightkite, Lightpoe and a moveable type plugin.

Many of these other services are connecting to Fire Eagle using OAuth and Wikinear is an example of an OAuth + Fire Eagle + Google Maps + Wikipedia mashup that the prolific Simion Wilson put together in about half a day.

The Fire Eagle guy's are currently working on "Friends on Fire", a facebook app that will show you where you facebook friends are (presumably so you can avoid the zombie horde).

That kind of wrapped up the talk and then there was a quick Q&A. (my notes are a bit brief here, so I am only paraphrasing the questions and answers)

Q: Timo: All the examples concentrate on the location of people, what about the location of objects? in the scientific world we think of buoys in the ocean transmitting geo data.
A: Tom: We are thinking about that, but the next kind of things we track will probably be pets.

Q: Ian: Are there any concerns about privacy of data?
(The bottom line to the answer is that the Fire Eagle guy's believe that the data belongs to the user, and that every step and decision made in development is focused on keeping the user in control of their data and privacy, as an example Fire Eagle will stop following you if you don't periodically grant your permission to it to do so)
A: Tom: There is a code of conduct for the kinds of applications that interface with Fire Eagle, like having to announce that they are tracking people, and that they are sharing their data. If we see an external app that is not playing by these rules we can turn it off.

Q: Matt Brown: are there are any good entries point sites for new comers, and specifically London based?
A: Tom: Fire Eagle is only 7 weeks old, wiki near was built in a half day, there is an app gallery on the site,

Q: Timo: what is the advantage to Yahoo! to develop this yourselves, why should yahoo bear the costs of setting something like this up?
A:Seth: we have a really extensive database of palcenames, which makes something like this possible.
A: Tom: I'm a strong believer that a rising tide lifts all ships.
A: Seth: if location based services become mainstream we also win.
A: Tom: We try to work out what the future that is going to happen anyway is going to be like, and make it happen faster.

Q: Peter: You could liken this to paypal, is there a risk that you are building a monopoly?
A:Seth: anyone can implement our api so there is no chance of this being a monopoly, but we think our name db provides better value
A:Tom: A good example are auction sites, bigger are better, but it doesn't make any sense to charge for Fire Eagle, there are open data sources in the world, someone could go and build another version of this, however people already trust yahoo with a lot of information. Yahoo is the biggest provider of email is the world

Q: Alf: what happens with people hammering the services, are you going to be pushing updates, or will people always have to poll for data?
A: Seth: we are thinking of building an xmpp interface.

And that was it for the talk. There is no question but that location is hugely important, and number of API's for map and location services is continuing to grow. Just before posting this write up Yahoo! launched a preview of the Yahoo! Internet Location Platform, and there is a good write up about that on O'Reilly Radar.

May 08, 2008

Nature.com wins a Webby

As some of you may already have noticed, Nature.com has won a Webby. Yeah! :) Here are some more details.

There are so many things that we still want to do with the site that it feels very much like a work in progress to those of us spending our days (and nights) on it. But we're delighted that the judges already consider it useful and impressive. And thanks also to David P for the kind namecheck on BoingBoing.

May 07, 2008

Science in the Streamosphere

Picture 8.png

I was hoping to coin 'the streamosphere' but it's already in Google. Neh. Anyway...

The last month or two has seen many science 2.0 (for lack of a better term) bloggers pick up Twitter and FriendFeed.

If you've never heard of the former then you probably shouldn't be reading Nascent. The latter is an activity aggregator: you sign up, tell it which other services you use (del.icio.us? last.fm? blogs?) and it generates a page listing all of your public activity across those services like the Facebook mini-feed writ large. You can see feeds from your friends and attach short comments to their activity.

Services like these are less effort than blogging and you get more instant feedback in the form of little smiley faces from other users. The downside is that with everybody communicating in SMS-length 128 character bursts it can feel a little bit like one of those 'txt cafe' premium hotlines you see advertised on satellite TV late at night, albeit with fewer muscley bikers and bikini'ed hot-tubbers (Nature staff excepted).

If that description hasn't put you off too much it's worth dipping a toe into the activity stream. For the social networking aspect to work you really need a social network, so I've listed a couple of science bloggers below with links to their FriendFeed accounts. Not quite a blogroll... a twitlist? Maybe not. ;) You can follow people without having them follow you, so don't be shy (but don't expect a follow in return straight away):

See you there!

May 01, 2008

50ft podcasting

Nature Billboard.bmp

Nature put an awesome billboard Podcast advert up in the Gaslamp district of San Diego around the time of the FASEB and AACR conferences. Just in case you're wondering, the poster girl silhouette is that of Nature Reviews Neuroscience editor Monica Hoyos Flight.

"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 368 total comments.
The most recent three were on:
Powered by
Movable Type 3.2