Scientific Data | Scientific Data

Data Matters: Interview with Gavin Simpson

gavin-simpsonGavin Simpson is a Quantitative Environmental Scientist at the Institute of Environmental Change and Society, University of Regina, Canada.

How open has your field traditionally been in the sharing of data?

As far as ecology goes in general I don’t think we’ve been very good at sharing data; we haven’t been very open in the way that we share data. There are a lot of people who collaborate on datasets and do things in private, but little in the open. There are a few big exceptions, things like the Breeding Birds Survey and a few other larger datasets, but they’re the exceptions rather than the rule. It’s changing, there’s been a movement towards making more and more data freely and openly available, but I think it’s still very much lagging behind open access in preparing journal articles. It’s growing, but it’s still very small and I think we’ve got a long way to go to move the community towards seeing this as something that people just do with their data, as opposed to being something that a few people do, and most people have worries about giving away their data. We’d like all data to be available to some extent and it’s just a case of how do we get to that, how do we move to a situation where all data is open, it’s all secured for future use, for future generations, and that people can actually start to make greater use of it.

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

We had the same problem with open access to some extent, but data are a bit closer to people’s hearts. If you’ve toiled in the field for years to collect data then you’re not going to be very easily convinced to make the data available. Often when people suggest that people do make their data available openly there’s a bit of a push back, and often it’s because people think that the people advocating open data are trying to force a particular way of working and are ignoring the particular concerns of the community.

What do you think the barriers are to open sharing of data?

It’s not part of our culture. There are some fields where sharing data is much more prevalent, you think of genomics and high-throughput microbiological data. To some extent that comes down to the fact we don’t really have collaborative datasets or databases or repositories, that people can feel as though they’re contributing to some greater good by putting their data in there. That’s not to say we don’t have these more general databases, things like Pangaea and The Paleobiology Database, but we also have a lot of these very specific datasets for one particular community. If we had more ecologically focussed but broader repositories that allowed users to go in and search and extract those data rather than being just generic data silos, then that might help the culture of open data.

“If you’ve toiled in the field for years to collect data then you’re not going to be very easily convinced to make the data available.”

The reward is also important. The reward at the moment is getting recognition for your data. The argument that several people have said to me is that that doesn’t really help them. They’re operating on small research budgets out of small schools, they don’t need extra citations really, what they need is to be able to do more research. So the collaborative side of things is important to them, or that there are still things they want to do with the dataset that would lead to a paper a few years down the line, and the threat of that being scooped would really undermine their ability to generate publications, and that affects their job prospects.

Who’s responsible for changing to a more open data culture?

I see the open data movement as having two main aims. One is the openness of data and being able to have a system which scales well, but there’s also securing and archiving data. So even if you don’t necessarily go down the route of being open and putting your data in an open repository, just getting the metadata in the correct format and having your data in an institutional repository is a key thing that many researchers don’t do, because they either don’t know how to do, or they don’t have time, or they envisage it’s going to be costly. The people that fund our data are going to start expecting much more of that of us, and then it’s only a small step for them expecting us to then make these data freely available, and if the public are paying for this stuff, then other researchers and the public ought to be able to get at those data and make use of them.

Journals also have a responsibility. PLOS, for example, have just announced that from now on they are going to expect that data are archived and freely available for papers that are published in PLOS One. So that has an effect, although it might actually put people off publishing in journals like PLOS One because they don’t want to give up their data. It might have taken researchers 20 years to get these datasets together, and they would have real issues about just making those data instantly available, just for one paper in PLOS One.

“We need to make sharing data normal and allay people’s fears that they are going to get scooped.”

As individuals and as a community we need to make sharing data normal and allay people’s fears that they are going to get scooped, or if they do get scooped then the people that do it are treated as pariahs. At the moment tenure applications or job interviews give very little regard to these extra things that advance science. We need to move towards recognising these additional things, not just sharing data, but open access and open code. Everybody in the community has a responsibility to do that, but it’s not just something we can impose on people through journals and through our funders.

So how do you think a product like Scientific Data can contribute to creating more open science?

It forces people to do good things with their data. It forces people to record how the data were collected, the appropriate metadata, and get the data into some formal repository that will secure the data forever.

It will also help through providing credit for sharing the data. In the main, getting a citation, having a tangible product that can be cited, or that people view as something that can be cited is probably the main benefit that a journal like Scientific Data can provide. There are a proportion of people within the ecological community that would feel better about making their data available freely if they had a paper behind it that people could cite and if they could at least see that they were getting the credit that way.

Interview by David Stuart, a freelance writer based in London, UK


There are currently no comments.