Data portability for scientific web apps


In theory I know these people

Executive summary: we created a new room on FriendFeed which might be a good place to discuss data portability from a scientific networking perspective.

Having the ability to share one network (or a particular subset thereof) of friends and contacts across different social networking sites is a good idea. It has been kicking around for a while and it’s a feature Nature definitely wants to support in its social software.

The issue has cropped up before on FriendFeed and in the Network forums, and Cameron has highlighted it in his open letter to the developers of scientific web apps.

We’ve been thinking quite hard about how we could enable this (as well as debating whether or not we should do anything before a wider standard is developed). Data portability is not a trivial problem. If there was a relatively simple way we could share networks between scientific web apps, though, it might be worth just pushing ahead with something imperfect but workable in the short term.

What is that simple way, though?

Currently we expose a user’s social network on Nature Network through XFN (though right now there’s a bug that limits it to the first six people in any network, doh). You can use the Google Social Graph API to read this: here’s my network.

We use Nature Network identifiers (the ‘Uxxxxxx’ part). How can you correlate those with identifiers on other social networks?

You could attempt to match on real names, but then you’ll hit a wall the first time you make friends with a Mr Smith or Xu.

Should we use plain text email addresses? We can’t without access control – or spammers would harvest the user database – and privacy levels – or I could make somebody anonymous (say, Charles Darwin) a contact and then pull their (otherwise hidden) email address out of Network with the portability API.

Could we restrict use of the portability API to ‘trusted’ apps that promise not to respect user privacy? Probably not. Apart from anything else it would mean no mashups, no Greasemonkey scripts, no fast bedroom coder prototypes.

We could hash email addresses, but we’d need a dynamic salt or they’d be vunerable to being cracked by somebody with a good enough rainbow table. Would rehashing every email address in a user database every time somebody wants to import their social graph be scalable for larger apps?

Could we use Bloom filters somehow? Is the precision good enough? Are there security considerations?

Could we just tell users that if they want to keep their email address private they won’t ever be connected automatically to their contacts on other sites?

Could correlations between different usernames be defined by the users themselves? The user could tell each site what their username is on all of the other scientific websites that they use. Alternatively they could create and host a public FOAF file containing this information and just point sites to that (or it could be harvested automatically by a crawler). Is that pure fantasy?

Your thoughts are welcome. We’ve created a room on FriendFeed to try and help organize any discussion (feel free to share relevant links and blog posts there!).


Comments are closed.