Scientific Data | Scientific Data

Data Matters: Interview with Kaylene Simpson

Kaylene_SimpsonKaylene Simpson is the head of the Victorian Centre for Functional Genomics, an RNAi screening facility at the Peter MacCallum Cancer Centre, Australia.

What is your experience of data practices in your field?

I’ve been performing RNAi screens since the early days, and when we published our screen in 2008, we set up a repository where you could look at the analysed data, images and time-lapse information: it was very custom and specific to us. I’ve always been passionate about the idea that there’s no point doing the screens when you can’t publish the screen data. We need to be able to understand how people got their data. These screens are extremely important, and need to be exposed to everybody so that data can be analysed in different ways and reinterpreted in light of new information. Also so you can see from the technical side of it exactly how a screen gets done.

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

I see people coming in and spending several years doing their screens and they might finish their PhD with a list of targets and that might be it, sometimes they’re not getting published because they haven’t performed sufficient mechanistic studies. The post-docs are so passionate about the technical details and getting it just right and they’re so proud of what they’ve done, but in the end they feel a little deflated because there isn’t an avenue to be able to publish their screen in all its detail.

“I’ve always been passionate about the idea that there’s no point doing the screens when you can’t publish the screen data.”

We published a paper last year and submitted all the data to PubChem. My drive as a leader of a facility is to reach a submission rate of ~90% of users publishing in a public repository. Although we certainly wouldn’t publish every screen in Scientific Data, we’d hope to publish the really unique ones. I like the idea of repositories where you can go and get the raw data and have it all described there for you. We’ve always worked with MIARE [Minimal Information About an RNAi Experiment] and most users of our lab will submit a MIARE style document with any publication, generally as supplementary information. One of the biggest issues we’ve had to date within the RNAi field is providing publications with sufficient and appropriate details to fully interpret the screen, and even having researchers complete a MIARE document seems a challenge. Scientific Data helps put such issues to rest.

What’s the wider culture of data ownership and sharing in your field?

For us as biological scientists doing our own screening, the real owners of data are just a couple of people: an academic group leader driving the project, one or two scientists that actually run the screens, then maybe down the line there comes in a couple of other people that do some specialised biology analysis, and a bioinformatician. So still a small team.

There is no doubt that we’ll have a couple of labs that will absolutely not share anything and will be paranoid about putting data out, but I see it as a need for the greater good. There’s only so much they can work on but their argument to me would be they might come back to this in 5 years’ time. My student who has submitted her screen to Scientific Data is really excited by the idea that it could all come to light, and other researchers will be able to get a lot out of it. You become very passionate when you’ve spent such a long time performing a screen, and it’s not just about the end target of the publication, but it’s about sharing a gold mine of data.

“Data should not be buried; we invest a huge amount of time and money and people should not be reinventing the wheel.”

What role can Scientific Data play in the open data ecosystem?

I see Scientific Data as filling an absolutely essential need within the high throughput fields. It’s long overdue that we have something like this that enables us to share data in a fully open and comprehensive fashion. Data should not be buried; we invest a huge amount of time and money and people should not be reinventing the wheel. There’s links to PubChem and screen data, but not the detailed description of how a screen was run. That’s the key thing that we’re losing by forcing people to publish a screen with detailed mechanistic insights of several targets, we’re losing all the information about what has created that screen.

Because Scientific Data is an NPG journal people will pay attention to it. I expect Scientific Data will be the go-to journal for assigning a fully descriptive resource; I imagine you’re going to have an absolute flood of manuscripts. There are so many screening facilities that will want to be involved in doing this, and I’m only speaking from my perspective of RNAi and compound screening. There’s a whole world of high content imaging where people really should be showing how they’ve analysed data and depositing images. This raises a new problem: Is Scientific Data going to want to publish half a dozen screens from my lab? If a paper doesn’t get into Scientific Data there’s currently no plan B on such a scale like there is for most other journals.

See the Data Descriptor by Kaylene Simpson and coauthors:
Falkenberg, K. J. et al. Genome-wide functional genomic and transcriptomic analyses for genes regulating sensitivity to vorinostat. Sci. Data 1:140017 (2014).

Interview by David Stuart, a freelance writer based in London, UK

Comments

There are currently no comments.