« Best of Nature Network for Monday, August 4: Life on Mars, Patent Law for Scientists, Science and Humanities | Main | Wikiwikiwah »

Data portability for scientific web apps

network.png
In theory I know these people

Executive summary: we created a new room on FriendFeed which might be a good place to discuss data portability from a scientific networking perspective.

Having the ability to share one network (or a particular subset thereof) of friends and contacts across different social networking sites is a good idea. It has been kicking around for a while and it's a feature Nature definitely wants to support in its social software.

The issue has cropped up before on FriendFeed and in the Network forums, and Cameron has highlighted it in his open letter to the developers of scientific web apps.

We've been thinking quite hard about how we could enable this (as well as debating whether or not we should do anything before a wider standard is developed). Data portability is not a trivial problem. If there was a relatively simple way we could share networks between scientific web apps, though, it might be worth just pushing ahead with something imperfect but workable in the short term.

What is that simple way, though?

Currently we expose a user's social network on Nature Network through XFN (though right now there's a bug that limits it to the first six people in any network, doh). You can use the Google Social Graph API to read this: here's my network.

We use Nature Network identifiers (the 'Uxxxxxx' part). How can you correlate those with identifiers on other social networks?

You could attempt to match on real names, but then you'll hit a wall the first time you make friends with a Mr Smith or Xu.

Should we use plain text email addresses? We can't without access control - or spammers would harvest the user database - and privacy levels - or I could make somebody anonymous (say, Charles Darwin) a contact and then pull their (otherwise hidden) email address out of Network with the portability API.

Could we restrict use of the portability API to 'trusted' apps that promise not to respect user privacy? Probably not. Apart from anything else it would mean no mashups, no Greasemonkey scripts, no fast bedroom coder prototypes.

We could hash email addresses, but we'd need a dynamic salt or they'd be vunerable to being cracked by somebody with a good enough rainbow table. Would rehashing every email address in a user database every time somebody wants to import their social graph be scalable for larger apps?

Could we use Bloom filters somehow? Is the precision good enough? Are there security considerations?

Could we just tell users that if they want to keep their email address private they won't ever be connected automatically to their contacts on other sites?

Could correlations between different usernames be defined by the users themselves? The user could tell each site what their username is on all of the other scientific websites that they use. Alternatively they could create and host a public FOAF file containing this information and just point sites to that (or it could be harvested automatically by a crawler). Is that pure fantasy?

Your thoughts are welcome. We've created a room on FriendFeed to try and help organize any discussion (feel free to share relevant links and blog posts there!).

Postgenomic TrackBack

Similar items from Scintilla

Comments

I'm always amazed by the concerns of the modern net generation. The next net generation in middle school & highschool, have no qualms about sharing private information with everybody. An email address is just as private or public as a postal address to them. Creating all these fancy hash scripts would be silly for that type of data mentality.

I think this generation has a responsibility to protect the next generation and be grateful there wasn't a facebook/myspace when we were young and stupid..

Are people still worried about spammers capturing email addresses?? For real? My gmail a/c (baoilleach@gmail.com for all you spammers out there) lets through one spam a week or so, and captured 1200 in the last month. Anyone who's 'connected' to the web has their address all over the place, from mailing lists, to subversion commit messages, etc., etc.

Having a scanned list of email addresses and having a list of email addresses that map to a host of other useful identifying information are different things.

Going from the former to the latter in an automated way is obviously possible but I see little point in helping this along.

In the end the trade off between convenience and privacy often just comes down to a personal judgement but I think its important that services make clear what those trade off are.

If you think "the modern net generation" is handing out private information indiscriminately, have a look at this study: http://www.pewinternet.org/pdfs/PIP_Teens_Privacy_SNS_Report_Final.pdf.

I just don't think e-mails are stable enough for this purpose. Maybe something like open id or for science, if it ever takes off, something like researcher ID or the crossref equivalent

@Eric,

Thanks for the link, interesting report, I take from this report that for a substantial number the answer is yes. Of course the report cannot ask these teenagers if their future selves ten years down the line would be happy with what they made available online.

Letting the users define the correlations might be a good thing to try, both because it's flexible and also because it gives each user more control over their ID.

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Powered by
Movable Type 3.2