Data Matters: Interview with Stephen Friend

Stephen Friend is the President, Co-Founder, and Director of Sage Bionetworks, USA.

How open has biomedical research been to the open sharing of data?

Totally unopen. Compare the culture of biomedical research with the cultures around synthetic biology, around model organisms, and around the generation of genomic data. These are three examples where particular leaders in each of those fields have driven a large amount of sharing and open collaboration that is orders of magnitude more open than biomedical research. In biomedical research there are individuals who work for five to ten years to develop a cohort, which they then believe is their cohort, it’s not anyone else’s, it’s theirs, and they have the right till the last post doc leaves their lab, to not let others look at the data until they get authorship on a paper. They will share it, but they’ll only share it if they can make sure they can get their tenure off of it.

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

How important do you think openness is for the credibility and efficiency of science?

In the last three or four years people have opened up the silo a little bit and have looked down into issues of transparency and reproducibility in all research. Particularly in biomedical research this has been just the beginning of what I think is going to be an ever expanding light being shone down on reproducibility and transparency. Open approaches can allow a greater transparency and reproducibility, they’re not nice to have, they’re essential to basically counteract what I think is going to be an increasing, emotional high profile going after biomedical research and how it’s done.

“This has been just the beginning of … an ever expanding light being shone down on reproducibility and transparency.”

For the efficiencies of research, the open aspects of it are important for accelerating research. It’s not just about access to the data; it’s to do with open data as part of a shift in the reward incentive structures. A shift that will take us from do not let others see what your data, or your insights, or your hypothesis, or your solution are, until you make your own personal publication of the complete data analysis solution, to where anytime you have data, the first thing you should do, is figure out who else can help you solve your problem. So instead of saying ‘I need to keep my problem, my data, my analyses to myself’, we have to shift it to, ‘who else in the world could complement my data, who could I get to do the analysis, could help solve the problem?’, and to get as many eyeballs, as many minds, as many insights on that data as soon as possible.

What needs to be done to bring about this change?

There are three things that can begin to help that transition. I don’t know if they are going to be sufficient, but they are necessary. One is that we fundamentally need to uncouple the data analysis from the data generation, those that generate the data should not be assumed to be the only ones that analyse it. Secondly, the funders need to have terms of agreement on their grants that state that the data needs to be in the hands of others, before publication.

Thirdly, there needs to be the equivalent of incentives that match that. How do we incentivise it so when you get a set of data the first thing you want to do is ask ‘how many people can I get to look at my data?’ How do we invert the reward system so as soon as you get data the first thing you want to do is have other people using that data instead of the way that it currently is. If we solve that, the whole thing becomes enormously efficient.

“How do we invert the reward system so as soon as you get data the first thing you want to do is have other people using that data?”

Who is responsible for creating a more open science?

You cannot do this unless the funders change their rules, because they have the currency. It’s like you can’t change the marketplace by saying we’re going to change the price on a product unless you have the resources that allow that to happen. It’s essential that the way we communicate grows, but if you built the perfect system, and the ultimate system is built for how that recognition could be given, I don’t think that it will work until the funders say I want you in your research plan to describe how in the first year you’ll have at least 10 people working on your data. If every grant had that paragraph, how do I get 10 people other than me analysing this data within 3 months, you watch the whole system will change. It doesn’t cost them anything, that’s why they have to do it. A funder could do that today; it’s their money.

How do you think a product like Scientific Data contributes to a more open science?

Without Scientific Data it’s very hard for funders to say I demand that you do this or that if there’s no vehicle. Until there is a formal mechanism that allows data to be transmitted prior to full data plus analysis, plus conclusion, it’s going to be done in informal ways. Scientific Data provides a mechanism to provide the robust reproducible transmission of data for the data itself.

Interview by David Stuart, a freelance writer based in London, UK

Scientific Data

Scientific Data updates