What are the data sharing practices in your field at the moment?
At the moment in biomedical research it’s a hodgepodge. There are some broad policies across all of NIH for example, and there are policies that pertain to research funded by particular foundations, but there is no single policy. There are areas of biomedical research, such as genomics, that have a rich history and an active data sharing policy, and there are some particular initiatives, or projects that NIH is funding, that have requirements for data sharing, for example, the Human Connectome Project and the Alzheimer’s Disease Neuroimaging Initiative. But there is increasing interest across much of federal government in how to increase access to the results of research. For example, a memo came from the Office of Science and Technology Policy in February 2013 asking agencies that fund significant amounts of research for plans to increase access to both data and publications. In the not too distant future, in response to such policy initiatives, increasing technical capabilities to share, recognition of the importance of data sharing from the scientific perspective, and changes in the expectations and perspectives of society, we’re going to see a big increase in data sharing in biomedical research and other kinds of scientific research as well.
How are these policies and initiatives changing attitudes to data?
I think a lot of people haven’t thought broadly about the benefits that will come from data sharing. Once we have a comprehensive set of information about data, as we do about the scientific literature, it’ll let us start looking at the landscape of biomedical research from a different perspective. It will give us another metric for assessing science and progress and it will allow us to find data that might be useful for any of a number of scientific purposes. Now, for most biomedical research the public products are the conclusions and the interpretations about data, and those conceptual aspects are probably the most fragile part of that scientific process. The data are probably the most robust part, and yet that is the part of the process that is seldom seen by the public. The value of sharing data in the biomedical research enterprise is becoming more apparent overall, but this value seems to be best appreciated in the new generation of scientists. This generational change is going to come forward and converge with changing policies and technical capabilities to share data. So, in a few years from now, I see a lot of the biomedical research enterprise becoming much more data centric.
How much thought is being given to data discoverability at moment?
We have launched at NIH something called the Big Data to Knowledge (BD2K) initiative. Many initiatives at NIH have focused on supporting research and development and training, in particular areas of priority, but I see BD2K as being different and potentially transformative because in addition to support for research and training, it also has a component that is exclusively meant to facilitate the broad use of biomedical data, not just big data, but all research data. We see three activities that we’ve got to do: make data available, and that’s being done through focusing on data sharing policies and practices; make the data usable, and it’s not usable unless it can work with other data and tools and data resources through the use of standards; and bring data into the research and scholarship ecosystem, and that means making it discoverable, citable, and linking it to information about other datasets, about software tools, about data resources, and importantly, linking information about particular datasets to the scientific literature.
We have pushed this idea forward with regard to data discoverability, citability, and linking it to various resources through the Data Discovery Index Coordination Consortium. The idea here is to support a group of experts who will help to think through, plan and engage the community and other stakeholders in putting together something that will make data discoverable, citable, and linkable.
What role do you see for traditional publishers and Scientific Data in the new data ecosystem?
There are lots of possibilities, and all of these things are interdependent, so if it were done one way from the funders point of view, then the publisher’s role would be X, and if it were done a different way by the funders role, then the publisher’s role might be Y. So I think the relative roles will have to be figured out over time, but the key thing to do is to make sure that they are figured out and that publishers and funders and other stakeholders – both scientists and non-scientists – discuss these things in a way that will make it happen efficiently and effectively. We’re all in this together and without having these peer-reviewed publications looking at data as Scientific Data does, I think that would be an important piece that would be missing.
The incentive structure in science is one that relates to publications and up until now publications have been all about the concepts. With journals like Scientific Data, it reinforces the value of talking about the data. It allows for appropriate attribution and therefore incentives to people that are doing data related things, and by having these things peer-reviewed it adds value by giving an indication of quality. For all those reasons, Scientific Data and other journals that relate to data are really crucial in the whole picture.
All the things that we can do with the literature right now, citing it, discovering it, and linking it to resources and other information, has empowered science. Once we can do the same kinds of things with data, and when that whole ecosystem of data is linked to the ecosystem of the published scientific literature, I think a lot of interesting ideas are going to pop out, and whole perspectives will change. As this moves forward, the enterprise of biomedical research will become more data centric and that’s going to be transformative. That’s why I’m so excited about seeing journals like Scientific Data, and initiatives like BD2K at NIH.
Interview by David Stuart, a freelance writer based in London, UK