Ask not what you can do for open data; ask what open data can do for you

Mathias Astell, marketing manager for Scientific Data and Scientific Reports, outlines the benefits of open research data and provides some tips and tools researchers can use to make their data more open.

It has been shown that research articles receive more citations when they have their underlying data openly linked to them. With this in mind, it’s time to consider not just the ideological reasons for making research data open, but the selfish benefits of openly sharing data that all researchers can (and should) be taking advantage of.

mat1

This infographic can be downloaded under a CC-BY licence here

And as an increasing number of funders mandate data sharing, and publishers start implementing more consistent data policies at their journals, it is worth seriously considering how and why you should make the research data you generate more openly available. Continue reading

Promoting open science from a pub: the Panton Principles

Follow the Panton Principles to ensure your data is licensed and accessible for immediate reuse, says Atma Ivancevic.

In a world where scientific discovery is driven by impact factor and funding, the idea of open data may seem idealistic. But the open data movement has been growing since the early 2000s, spurred by the rise of big data and computational capabilities. For the sake of reproducibility in science, we need to encourage data sharing after publication.

panton principles pic

Founders of the Panton Principles at the Panton Arms, Cambridge UK.
Copyright Panton Principles Authors (CC by 3.0).

Continue reading

#scidata16: Open data should be easy

There’ll always be reasons not to share data. It’s time we stop making excuses and start making plans, says Atma Ivancevic.

On the morning of October 26, 2016, a group of scientists convened in London to discuss the state of open data. The third Publishing Better Science through Better Data conference kicked off with morning tea, international introductions, and furious scribing from @roystoncartoons. The premise was simple: “Today is all about being open”, said conference chair Iain Hrynaszkiewicz. We settled in to learn the advantages of data sharing at both the individual level and for the scientific community at large.

“Open data should be easy,” said Dr Jenny Molloy from the University of Cambridge as she explained the importance of building a data management plan. She pulled up a poster of a missing black backpack: “CASH REWARD” it read, “contains 5 years of research data which are crucial for my PhD thesis!”  I laughed along with everyone else, internally reflecting how similar my life had been before I discovered version control.

IMG_20161102_213011-smaller

Think you don’t need a research data management plan?

Continue reading

Opening doors to open data at #scidata16

Want to embrace open data but don’t know where to start? The tools are out there, says Matthew Edmonds.

The Publishing Better Science through Better Data conference, or #scidata16 for short, took place at the Wellcome Collection in London at the end of October. This one-day event organised by the journal Scientific Data, Springer Nature and the Wellcome Trust explored the challenges facing early-career researchers as we enter the era of open data.

analytics-282739_1280

As a data novice, I arrived without really knowing what to expect. The types of experiments I perform generate only small datasets needing a simple statistical test, easily summarised in a graph in the manuscript. The original data can be safely left to gather dust in a shared drive. Continue reading

Data sharing: Contribute to the community

Data sharing can make a significant contribution to the scientific community, but it comes with challenges, says Caroline Weight.

Guest contributor Caroline Weight

We have all heard of it. We are all worried about it. We hear whispers of it in the corridors. We are advised to be careful what we say to ‘others’. We constantly check the literature. It matters to us. After all, it is our careers on the line.

‘Scooped’.

The process of publication is vigorous, competitive and tricky. It’s not uncommon for five years to pass between writing the grant application and publishing the work. Big labs with state-of-the-art facilities stand a better chance of getting their work out there first, given the extra manpower and often more-established protocols. This race for ownership of the data makes it difficult to share information and present new findings at meetings or conferences. Even at manuscript submission, there is often a chance to actively inhibit particular referees in case of conflicts of interest or personal competitors, to retain the novel concepts and data until they have been made public. Not until the publication has been accepted and is in print can you heave a sigh of relief and move on to the next project. Yet, sharing of data is essential to the progression of science in the modern world. Continue reading

Focus on TCGA Pan-Cancer Analysis

Nature Genetics is pleased to present today the first installment of our Focus on TCGA Pan-Cancer Analysis.

The Cancer Genome Atlas (TCGAhas analyzed over 8,000 cancer cases across 27 tumor types to date, and aim to have over 100,000 specimens analyzed by the of 2015. They have commendably made both data and exploration tools publicly available at https://www.cancergenome.nih.gov. They have previously published 8 papers reporting in-depth genomic characterization of individual tumor types.

The TCGA Pan-Cancer initiative, launched in October 2012 at meeting in Santa Cruz, California, seeks to combine analysis across tumor types in order to identify both similarities and differences in genomic alterations.  The work presented in this collection of Pan-Cancer publications includes analysis of the first 12 TCGA tumor types. This includes over 3,000 cancer patients profiled with 6 different platforms to assess genomic, transcriptional, epigenetic and proteomic alterations, combined with clinical data.  The authors demonstrate that while a majority of the tumor samples show unique genomic alterations, that by combining analysis they are able to both increase statistical power for the detection  of molecular drivers and to identify common pathways that are altered across tumor types.

The Pan-Cancer initiative provides a model for large-scale collaborative analysis as well as data sharing, bringing together over 250 collaborators from ~30 institutions working together on over 60 projects analyzing the same dataset.  These efforts required a strong collaborative framework, a commitment to rapid distribution of data, and means to facilitate shared analysis. Josh Stuart and colleagues provide an overview of this project in an accompanying Commentary.

This work also relied on the development of new bioinformatics tools and platforms, providing a foundation that should prove useful in future large-scale analysis projects. A Commentary by Larsson Omberg and colleagues highlights these approaches and the use of the Synapse software platform to share and evolve data, analysis and results among the Pan-Cancer Working Group. The Synapse platform was developed by Sage Bionetworks to facilitate open and data-driven collaborative research efforts, and is also being well used in DREAM challenges.  The use of this platform supported the discovery efforts reported in this collection of Pan-Cancer papers, which also provide a public resource of highly curated and standardized data sets across a series of data freezes along with automated analysis systems.

In the first of two Analysis papers published today in Nature Genetics, Chris Sander and colleagues provide a hierarchical classification of 3,299 tumors from 12 cancer types from the Pan-Cancer dataset, using a newly developed algorithmic approach. Their analysis separates tumors into those with primarily somatic mutations and those with primarily copy number alterations. They also identify oncogenic signatures that characterize ~30 tumor subclasses, which may suggest therapeutic targets of relevance across tumor types.

In a second Analysis published in Nature Genetics, Rameen Beroukhim and colleagues characterized somatic copy number alterations (SCNAs) in 11 cancer types and 4,934 primary cancer specimens from the Pan-Cancer dataset.  They observed whole-genome doubling in 37% of cancers, associated with higher rates of all SCNA.

We are pleased to support the TCGA Pan-Cancer efforts as a model for large-scale collaborative genomics projects combined with open data sharing, and demonstrating the ready benefits this can bring to our understanding of the molecular drivers of cancer.  The TCGA Pan-Cancer project continues to develop, and so will this Focus, so please get primed with this selection of publications and stay tuned.  In the meantime, here is a selection of social media and press stories: https://storify.com/obahcall/nature-genetics-pan-cancer-focus.