Scientific Data | Scientific Data

Data Matters: interview with Patricia Soranno

PCPatricia Soranno is a Professor in the Department of Fisheries and Wildlife, Michigan State University, USA.

How important is data reuse in your field?

In the field of ecology data reuse is increasingly being recognised as important. In the last 2-3 years in particular, there have been more calls for the benefits of reusing data. A lot of problems in ecology and the environment occur across very broad spatial extents, the continent, even the globe, and typically we have not studied our natural systems at that scale. Problems such as climate change, invasive species, and biodiversity studies, to name a few. One strategy that’s been identified as a valuable strategy for such studies is to take finer-scaled studies and mash them together. Consequently, these finer-scaled studies are being recognised as being extremely valuable beyond their initial design.

What are the barriers to data reuse in ecology?

“These finer-scaled studies are being recognised as being extremely valuable beyond their initial design.”

Our field doesn’t have a culture of sharing our data, yet, but I’m hoping that improves through time. One of the outcomes of that is that we have not done a great job of creating standards in how we document our data or in the format of the databases that we create; it is standards at both the data level and the metadata level. In ecology we have had a standard for a while now called the ecological metadata language (EML) and so I would say anyone sharing their data now in ecology would know to use that standard, but past studies prior to that standard being created would not necessarily use that standard. Also, ecological data can come from many sources. For example, government agencies that collect ecological data may not feel the need to use that standard. I’m also using a lot of data from other fields as well, as many ecologists do, such as climatology, hydrology, geology. These fields will have their own standards, which further complicates data integrations.

There are also a lot of logistical barriers in doing this, it’s really hard to take individual datasets that are very idiosyncratic and put them together. I’m involved in a project doing that right now, and it’s taking far longer than we had projected, because of the lack of standards and the lack of effective database management of smaller scale studies.

Is the discoverability of data a problem for data reuse?

Discoverability is still a problem because only a relatively small proportion of datasets in ecology are currently available in online repositories. For the vast majority of studies we would have to go to individual researchers and ask them for their data, and past research that’s been done on that strategy for fields in ecology and other fields have found that to be a pretty poor way to get data. You get reactions across the board, from a lack of response to email, a response saying that I would love to share but I don’t have time to work up the metadata, or emails saying I’m not comfortable sharing, or I’ll share but I want to be a co-author. The culture of sharing is a really important issue, and I think right now in ecology that culture is not quite there yet. The cultural shift is the most fundamental shift that has to happen because as an individual scientist we have to be willing and want to share our data. Every year it gets a little better, but I don’t think we’re quite there yet.

Our culture is that we feel such ownership to the data we collect. People think there’s more they could do with the data, so they just hang on to it and wait to see or they have some vague plans of what they want to do with the data, rather than saying, ‘ok, I’m going to throw this out to the rest of the research community and there will be people who have ideas for this data that I never imagined, and so actually I will increase my odds of this data having impact if I make it available.’ That latter viewpoint in my mind, is one of the key reasons I think we need to share data, because there’s a rich opportunity for data becoming more useful in ways that one individual researcher could not think of. Particularly when you combine it with other data that that researcher does not have access to.

There are strategies out there to address the discoverability problem, we’re starting to get repositories specifically for ecology, people are starting to use them, so I feel like we’ve got some solutions in place now, but for using and accessing past data we’re still far behind other fields.

How do you think a product like Scientific Data helps with data reuse?

It’s really important. It’s a new way to think about getting our data out there, it gives us the opportunity to look at data in a new way from the standpoint of new research products. It’s a way to have data be a recognised product of research that can have impact and that can count as much as the traditional publication. It’s something that researchers can get credit for, which is also part of our culture that we need to be aware of, and I think we need to try out a variety of different ways to increase sharing and increase the value that scientists and society as a whole place on data.

Publications that focus on data are going to be a key player in addition to all these other changes that need to happen.

Interview by David Stuart, a freelance writer based in London, UK

Comments

There are currently no comments.