Scientific Data | Scientific Data

Data Matters: Interview with Maryann Martone

Maryann_MartoneMaryann Martone is Professor of Neuroscience in the Department of Neuroscience, University of California San Diego, USA.

What are the data sharing practices in your field at the moment?

The routine sharing of imaging data in neuroscience is very nascent. We ran a database called the Cell Centered Database, which is still up and has merged with the American Society of Cell Biology’s Cell Image Library, and whereas we get a few unsolicited contributions from the neuroscience community most of the time if we want data we have to explicitly ask for it, or it just simply doesn’t exist. There are some neuroscientists that share their images by just putting them on web pages and those who are starting to share through www.openconnectomeproject.org but I would say that in the imaging field it’s still very ad hoc.

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

There’s a certain group of people who are just dead set against it and don’t think it would be useful, and there’s another group that just does this as a matter of practice. Most of the unsolicited data that gets shared comes out of people who come from either the genomics or structural biology community where it’s just expected that you’re going to deposit your structures and your sequences and so they just don’t see it as a big deal. There is also a large middle group, which thinks it’s a good idea, would like to do it, and either may not be aware of resources that are available to them to share their data, or feel that the amount of effort that it’s going to take for them to be able to do this is more than they can afford to give. It takes a while to assemble all of the metadata, and all of the information that you need.

“There’s a certain group of people who are just dead set against it and don’t think it would be useful.”

Is it specifically a case with metadata that researchers are lacking skills?

I think a lot of hope was if you had detailed enough input forms or a guided process that people would be able to supply these things on their own. There are some things that they most certainly can supply but there are others that they are just not qualified to do. As I’ve learned having come out of neuroscience: structuring data is not a natural human process. Researchers aren’t thinking of what is required for somebody other than themselves to understand it. At some point we’d like to believe that a lot of the basic information will be coming out of electronic laboratory notebooks and machines, and we’re certainly seeing trends towards that, but still there’s that polish and additional round of review that I think curators provide that I haven’t seen researchers generally be able to do very well.

How big a problem is discoverability?

The Royal Society came up with a number of rules, if you want to have data reused the number one rule is first you have to find it, then you have to be able to access it, and then you have to be able to understand it. At the very minimum it has to be available by some networked search, whether that comes through a data paper or whether it comes through a web search. Just putting it up on a webpage and letting Google discover it doesn’t mean that anyone will be able to find it from the other billions of things that are out there, or understand the way you describe it on a webpage.

“Structuring data is not a natural human process.”

We should encourage people to make sure at the minimum that the metadata is discoverable by Google because that puts you way up there. But also people should use terminology and semantic links that will ensure people get funnelled to you appropriately. It’s not something that researchers ever think about. We find that there are simple things, that the webpage banner is an image, and of course, a search engine can’t interpret that image, and so we encourage people to make sure that the things they have can be indexed. Even people who are in informatics, people who create databases, think that everyone is going to stay inside their databases, and that as long as everything is internally consistent within their database and marvellous that they have satisfied the discoverability. But, as we know, they first need to know that the database itself exists, and we’ve been cataloguing databases for almost 7 years now, and there are still ones that we find that we didn’t know existed.

The number of times that these topics are coming into discussion at various venues, mostly informatics, is increasing. Researchers are starting to understand that there is such an emphasis on data that the management and sharing of data is going to be something that they have to pay attention to.

What do you see the role of the traditional publisher as in the discoverability of data?

Data journals are an important avenue to explore because they fit into the current reward system of academia which is paper based. That sort of avenue is going to be important in getting researchers to start to think that data is a primary product of research. There is still a need to pay attention to the stewarding of data and long term preservation, and the long term sustainability of those resources is still an issue, and I don’t think that the publishers can rely on them without also supplying some support to them to help store and manage the data.

What I like about the data paper is that with NLP (Natural Language Processing) tools getting better at extracting information, it allows the author to supply metadata and key pieces that are required in order to understand what the data are in a form that is very natural to them. They know how to write papers, they know how to write method sections, they know how to write those sorts of descriptors, so when we learn how to interpret all that text and are able to withdraw the structured information I think it’s going to be a very powerful marriage. Right now it’s too early to know whether it’s going to work, but first you’ve got to get the information somewhere and at least if you have the information somewhere then it’s possible at some point for us to figure out a way to extract it. It’s an interesting model and one I can’t imagine would not yield some benefits.

Interview by David Stuart, a freelance writer based in London, UK

Comments

There are currently no comments.