Scientific Data | Scientific Data

Data Matters: interview with Simon Hay

May 27, 2014 | 9:35 am | Posted by Andrew Hufton | Category: Data Matters

Simon Hay is Professor of Epidemiology at the University of Oxford, UK

How open has your field traditionally been in the sharing of data?

The field I’m working in is tropical medicine, and more generally, epidemiology and public health. Historically they have not been great at sharing data, but I think there is progress. It has been going on for about the last 5-10 years, but I would also still say that we have got a way to go.

My personal perspective on this comes from experiences with the Malaria Atlas Project. This is something we started with many colleagues, and collectively asked the malaria community to share information about Malaria prevalence across the world to make global maps. Over that time we’ve seen a change from people being very reluctant to share, to sharing becoming almost routine. We as a research project have also evolved over that time to a position where not only are we open access, but we put everything that we receive online straight away. We don’t give ourselves a data head start because we think it’s more important to get the data out there so everybody can use them. All of the articles that we produce go into open access journals or ones in which we can pay to make open access. All the code that we write also goes up and becomes freely available to anyone that wants to use or reuse it. So we’re from top to bottom as open access as we can be in our project and we essentially think that is the way forward. There are big pluses to doing this for researchers, which I’m not sure is widely realized.

“So we’re from top to bottom as open access as we can be in our project and we essentially think that is the way forward.”

What are the pluses for the researchers?

Well if you write a bit of code that gets widely used for example, the individuals that wrote it in the first place get a huge amount of kudos, professional and research community wise. For example, a bit of phylogenetic code that’s freely available is called BEAST (Bayesian Evolutionary Analysis by Sampling Trees; https://www.beast2.org), and many of the community in that area use that bit of software. The individuals are leaders in the methodological and software development of that field and get cited heavily. You would have to say, if they had gone another route, if they had just kept the code to themselves, it would have had much less impact and my contention is that as individuals they would have had less impact in their field.

What are the barriers to the open sharing of data at the moment?

There are three common things. The one that we experience mostly is the dirty laundry syndrome. Whenever you share data, you share the good parts and the bad parts, so there’s always a risk that you’ve made a mistake in there that somebody else will see. I think that’s a false worry, if you’ve done work that’s good enough to be peer-reviewed, the data underneath it should be good enough to be scrutinized! It is also a good mechanism for finding your mistakes and none of us are perfect.

Another barrier is inertia, and I hope this is something that Scientific Data can really get around. With all the demands that I as the generic hypothetical scientist have on my life, making data available to a third party, who may be my perceived competitors doesn’t come up high on my agenda. This is amplified if I get no credit for sharing. However if you have a forum like Scientific Data, even if it’s something that you’ve not done before, I think you’re much more likely to move down that route. This is human nature, but when data beneficiaries can cite, acknowledge and credit the hard work that went into collecting that data in the first place, the probability of sharing must increase.

The final thing which I think is more of a genuine concern than any of the others is that certain individuals that create datasets may not have the technical capacity that others have to exploit those datasets. Somebody spending years collecting data “in the field” might not have the full software and time and experience to be able to fully exploit their dataset. That’s one of the only valid arguments that I’ve ever heard against not sharing and it is of course overcome by addressing the core issues of disparity in technical capacity.

Whose responsibility is it to create a more open science?

In my opinion it all comes very squarely down to the researchers. The researchers are the institutions, it is they that provide all the peer review, do all the work, apply for all the grants, and it’s generally them that are sitting on the panels of the funders and evaluating applications. So the primary vehicle for change must be each individual researcher taking responsibility for making what they do widely available to their research community and judging others by this yardstick. The journals, the publications, and the funders all have a responsibility to set the rules of engagement, but all of those researchers, whether they’re sitting on a grant review panel reviewing a particular grant application, or reviewing a particular paper, can take a view as to how important open access is and move that into their considerations when they’re viewing other researchers. If they do that collectively cultures will change very rapidly.

“So the primary vehicle for change must be each individual researcher taking responsibility for making what they do widely available”

In my field, where much of my funding comes from charitable organisations, either the Wellcome Trust or the Bill and Melinda Gates Foundation, I can’t with any moral justification keep hold of any information that I’m paid to collect on their behalf to influencing global health. For me, anybody that’s paid by a charity (or the taxpayer for that matter) to do science, has a moral obligation to make the information that they collect and generate available.

Where do you see a product like Scientific Data in the ecosystem?

It is a dynamic time in publishing, so it will be very interesting to see what the demand and the uptake is. I really do hope it’s strong and that Scientific Data is absolutely inundated with publications, so that it is a real validation of this kind of exercise going forward. If there are successes in this area and if people buy into it, the data sharing and archiving will grow and become a solid part of the ecosystem. It will be a success when we take sharing data in such a way for granted, as best practice. There are competitors out there, but I’m just glad that more and more diversity is coming in the journal offerings and I hope that they all have a big audience!

See the Data Descriptor by Simon Hay and coauthors:
Messina, J. P. et al. A global compendium of human dengue virus occurrence. Sci. Data 1:140004 (2014).

Interview by David Stuart, a freelance writer based in London, UK

Comments

There are currently no comments.

You need to log in or register to comment.

About this blog

Scientific Data is an online-only, peer-reviewed publication for descriptions of scientifically valuable datasets. Follow this blog for news about Scientific Data, as well as commentary from our editors and the diverse set of researchers, funders, and data managers who are supporting us.
Find out more

nature.com blogs home

Scientific Data

Scientific Data updates