Nascent

July 01, 2009

"I am not a scientist, I am a number"

On Monday I was at the BioLINK Special Interest Group at the Intelligent Systems for Molecule Biology meeting in Stockholm. Amongst the many thought-provoking talks was one by Phil Bourne, he of the Protein Data Bank, SciVee and other goodies. Phil made a cogent plea for a system of unique identifiers for scientists.

""I am not a scientist, I am a number"" »

June 16, 2009

Welcome to the Streamosphere

river-of-news.jpgWeb publishing as a discipline has few tenets but I think release early, release often and don't be afraid to fail are pretty sound. That was the philosophy behind Connotea when Timo and Ben Lund launched it in 2004 and it's the spirit in which I've just put up an early version of Streamosphere.

Streamosphere is a pet side project which I'm running according to what I guess you could call the Paul Graham principles (it'd be disingenuous to say "as a start-up" as most startups don't have NPG level resources. OTOH we lack a fussball table and free M&Ms). Think of it as a pre-alpha alpha.

The elevator pitch

Streamosphere lets you track scientific discussion on the web, in real time.

What it does

If you visit streamosphere.nature.com/preview.php#24 you'll see a page of stacked timelines like these:

Picture 5.png

Each timeline shows discussion around a particular item, for now always a web page. The portrait on the left is of one of the people who first started talking about the item. The slice of time in which the discussion was active (people were leaving comments, tweeting, liking or bookmarking it) is coloured a shade of magnolia. Behind the active slice is a graph - this shows you how much activity there was at any one point.

Click on an item's active slice to pop up more details about it including an activity breakdown and a selection of associated comments and tweets. If the item is a video or photograph it should be embedded in the popup. If the item description is in a foreign language hover your mouse cursor over it to get the English translation.

Picture 6.png

Streamosphere only ever shows the most active items in a given time period. Use the controls on the right hand side of the screen to see the most active items in the past few hours, day, week or month. You can also filter items by domain or by keywords in their description.

In smaller time periods you'll see some items that aren't anything to do with science: recently there's been stuff about Iran and a viral video for example. I'm not sure if this is a bug or a feature, or how to filter out non-science stuff is that's a requirement - suggestions welcome.

In the future I'd like to see the page update dynamically as new activity gets tracked but for now to refresh the page you need to reload or choose a new time period.

How it works

Streamosphere tracks ~ 4k accounts on half a dozen different social media sites including Friendfeed, Twitter and bookmarking services like Delicious. The account owners have all self-identified (sometimes implicitly) as scientists or people interested in science.

It uses a combination of polling, web hooks (via GNIP) and SUP feeds to aggregate public updates from tracked accounts as soon after they happen as possible. Average latency is ~ 3 minutes for Friendfeed and a few seconds for Twitter.

Right now there's only one view on the data: by item. Items are the URIs associated with or mentioned in updates: if I tweet "I love http://lolcats.com" and you bookmark it on delicious then the streamosphere database will record a single item (lolcats.com) associated with two updates.

Items are currently always websites but in the future I'd like to add views for users and topics; these are non-trival because of problems with account owner disambiguation and classifying short messages respectively.

Owner disambiguation relies on the Google Social Graph API. We need to disambiguate owners because otherwise the same person could post a single link on multiple services and Streamosphere would believe it's amazingly popular.

Sometimes users have set up rules to automatically route updates from one service to another (e.g. they share an item on Google Reader which appears in their Friendfeed stream which gets pushed out to their Twitter account). Rules like this are the bane of Streamosphere's existence - it's non-trivial to detect this kind of thing and handle them correctly.

I'm collecting hashtags, tags and extracting key terms from all updates but don't quite know what to do with them yet - still need a good algorithm to detect trending topics. Links are extracted from updates but right now there's no disambiguation for papers (Buggotea is alive and well in Streamosphere). There's a best effort attempt to resolve shortened URLs though occasionally one will slip through.

There's no API but if anybody has a good use for the data I'm happy to set something up using GNIP or long polling to support real time updates if necessary - just send me a use case.

May 29, 2009

Which web 2.0 services do scientists use?

Which web services are scientists actively contributing to?

There are ~ 1,240 Friendfeeders in science related rooms (the-life-scientists, scienceapps, science-2-0, science-online...). What percentage have listed usernames associated with the science related tools supported by Friendfeed?

Picture 10.png

Service Count
citeulike 41
connotea 31
delicious 431
digg 208
googlereader 394
reddit 68
slideshare 143
twitter 675
youtube 341

Why this dataset isn't very good...

There's a bias towards services formally supported by Friendfeed - it's easy to add feeds from supported services. Connotea and CiteULike aren't formally supported though you can add your library RSS feeds manually. Many Friendfeed users won't bother to do this.

People may be contributing to services (like YouTube...) for reasons that have nothing to do with science.

People who use Friendfeed aren't a representative sample of scientists (though they may well be a representative sample of blog friendly, web savvy scientists).

People sometimes remove their Twitter feeds from Friendfeed to help keep the conversations that they start there in one place.

I picked the set of services to look at which is why you don't see, say, Wikipedia or OpenWetWare above (some preliminary analysis suggested that the numbers would be negligible).

That said...

We can still use it to guess at broad trends.

Almost a third of Friendfeed scientists have delicious bookmarks. Don't discount non-academic bookmarking services as a source of paper metadata.

A similar number use the share functionality in Google Reader.

Despite rumors to the contrary not everybody is on Twitter.

A surprising (to me) number of people are uploading and favouriting items on Slideshare.

We're hiring (May/June 2009 edition)

Interested in a senior position on the growing, hard-working, award-winning Nature.com team? If so, we have two vacancies that you should check out: Head of Online Communities and Assistant Publisher.

Enquiries and CVs to the contact address given in the ads, or to me via my Nature Network page.

May 26, 2009

Gobbledygook Interview

gobbledygook-interview.jpg
I was interviewed by Martin Fenner, a Clinical Fellow in Oncology at Hannover Medical School, for his column Gobbledygook on Nature Network. The interview is mainly about our new OAI-PMH service (which I blogged on earlier here) but also touches on the broader picture of Public Interfaces.

"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 392 total comments.
The most recent three were on:
Powered by
Movable Type 3.2