Working towards harmonised peer-review of controlled-access data at human data repositories

Guest post by Viki Hurst, Locum Associate Editor for Scientific Data

Scientific Data is exploring how peer-review mechanisms for sensitive human data can be improved. Here, we outline some of the initial feedback we received from leaders of human data repositories (HDRs), and some innovative alternatives to peer-review. Continue reading

The layered cake of FAIR coordination: how many is too many?

Guest post by Alastair Dunning1, Susanna-Assunta Sansone2, Marta Teperek1

(authors listed alphabetically by surname)

1 Delft University of Technology, the Netherlands; 2 Oxford e-Research Centre, Department of Engineering Science, University of Oxford, UK

Science is living in the era of data – the reuse of other people’s data can drive new research questions and products, and inspire new scientific discoveries. This was the motivation behind the FAIR Principles (published in 2016), which provide researchers with a framework to improve the quality of their research: making their data more Findable, Accessible, Interoperable and Reusable. To turn these aspirational principles into reality, however, we need to provide researchers with FAIR-enabling tools and services that make frictionless the (complex) technical machinery of (meta)data standards and identifiers that underpins FAIR. Service providers, librarians, journal publishers and funders, among others, are actively working to deliver the next generation framework for FAIR data, which is collaborative, interdisciplinary and sustainable. FAIR has also become the (core) mission of a growing number of initiatives – especially in Europe, USA and Australia – encompassing R&D projects and programmes, institutional, national and global service provision, alliances and societies, training and educational efforts. In particular, numerous policy makers from around the world have articulated a vision of global open science and embraced FAIR as the driving principles. In Europe, for example, the vision is being realised through the ambitious European Open Science Cloud (EOSC) programme, across all disciplines. Continue reading

Author’s Corner: Open data, open review and open dialogue in making social sciences plausible

Guest post by Quan-Hoang Vuong of Centre for Interdisciplinary Social Research, Western University Hanoi, Vietnam

A growing awareness of the lack of reproducibility has undermined society’s trust and esteem in social sciences. In some cases, well-known results have been fabricated or the underlying data have turned out to have weak technical foundations.

Dr Quan Hoang Vuong

{credit}Quan Hoang Vuong{/credit}

Many researchers have investigated the plausibility of findings in the social sciences and humanities. A typical example is the mysterious Critical Minimum Positivity Ratio 2.9013 by Fredrickson and Losada (2005), which claimed to show that there exists such a positivity ratio and that “an individual’s degree of flourishing could be predicted by that person’s ratio of positive to negative emotions over time”. This ratio had once been a well-known, highly influential and greatly admired psychological “constant” until it was shown by Brown, Sokal and Friedman (2013) to be an unfounded, arbitrary and meaningless number.

To address the plausibility problem, I suggest that a combination of open data, open peer-review and open community dialogue, could serve as a feasible option for the social sciences.

Continue reading

Author’s Corner: Revisiting the personalities of wild chimpanzees

NIK_7884-small

Alexander Weiss

Guest post by Alexander Weiss of the University of Edinburgh, United Kingdom

Early on in her behavioural observations of the chimpanzees at what is now known as Gombe National Park, Jane Goodall was struck by their personalities, which were as distinct as our own1. However, upon sharing her observations with a ‘respected ethologist’, she was told that, yes, animals differed in their behaviour, but that this was best ‘swept under the carpet’ (pp 11-12)2. Continue reading

Author’s corner: Providing incentives and ensuring quality in citizen science

Guest post by Steffen Fritz, Linda See & Ian McCallum of the International Institute for Applied Systems Analysis, Laxenburg, Austria

author-corner-photos-june-2017

{credit}Steffen Fritz, Linda See & Ian McCallum{/credit}

Citizen science, the collection or analysis of research data by the general public, has existed in one form or another for centuries, with contributions ranging from plant and animal observations to weather phenonmena1. In the field of land cover and land use, however, its application is relatively new2. Previously this was a task left largely to governments, research institutes and global bodies. With the recent availability of high resolution satellite imagery, this has changed, opening up new possibilities for citizen participation3. In our recent article in Nature Research’s Scientific Data4, we have made available a global dataset of crowdsourced land cover and land use reference data, containing the results of our first four citizen-science campaigns. Continue reading

Research data policy going back to basics in Barcelona

This blog was written by Iain Hrynaszkiewicz, Head of Data Publishing.
IMG_0004

This week the Springer Nature research data team are exploring how our research data policy initiative can help facilitate wider adoption of clear, consistent policies for publishing research data.

We are attending the Research Data Alliance (RDA) 9th Plenary meeting in Barcelona where, amongst other things, we are chairing the inaugural Interest Group meeting on standardisation of policy for publishing research data.

At a hastily organised unofficial meeting at the RDA 8th Plenary it became clear there is a lot of interest in addressing the problems researchers face in understanding and complying with data policies. We are now – officially – working via the RDA to do this. Continue reading

An open approach to Huntington’s disease research

Guest post by Rachel Harding, postdoctoral fellow at the Structural Genomics Consortium, University of Toronto, Canada

Rachel Harding

{credit}Rachel Harding{/credit}

Huntington’s disease (HD) is a fatal neurodegenerative disorder caused by a mutation in the huntingtin gene1. The progressive break down of brain neuronal cells in HD patients leads to deteriorating mental and physical abilities over a 10-20 year period prior to death, the symptoms often described as having Parkinson’s disease, Alzheimer’s disease and amyotrophic lateral sclerosis (ALS) simultaneously2. At the start of the huntingtin gene there is a CAG trinucleotide repeat region that encodes a stretch of poly-glutamine residues in the amino-terminus of the encoded protein. This repeat tract is expanded in HD patients. The repeat length of this region correlates with the age of symptom onset3. Affecting approximately 1 in 10,000 of the population4, rare juvenile forms of the disease exist in patients with the longest CAG expansions, although adult-onset HD patients typically have between 40-50 CAG repeats with symptom onset beginning between the ages of 35-50. Continue reading

Author’s corner: A testbed for reproducible and standardized human MRI connectomics

Guest post by Xi-Nian Zuo, Project Coordinator and Co-Founder of Consortium for Reliability and Reproducibility (CoRR), Professor of Psychology and Director of the Magnetic Resonance Imaging Research Center in the Institute of Psychology at Chinese Academy of Sciences, China.

XI-NIAN ZUO

{credit}Xi-Nian Zuo{/credit}

About a decade ago (2006), as a PhD student graduating from the School of Mathematics at Beijing Normal University, I stepped into the field of neuroimaging of the human brain by way of a short job interview offered by Dr. Yu-Feng Zang, my postdoc mentor in China. The most important thing that I learned and developed during my post doc training was how to question a study, an indication likely of my somewhat different background (mathematics versus brain sciences). Probability and statistics became my major tools in bridging new learning experiences with my existing knowledge, pushing me to further pursue research training offered by Dr. Michael Peter Milham at New York University. Ongoing work in his laboratory really interested me, particularly test-retest reliability of resting-state functional connectivity1, the first study of test-retest reliability in the nascent field of functional connectivity. However, an obvious limitation existed to that study, and a series of test-retest reliability studies I carried out subsequently2; the small sample size. This directly motivated me to seek and build up a truly big data set for test-retest reliability in connectomics. Continue reading

Author’s corner: Sharing proteomics data to build community-based resources

Ruedi Aebersold & George Rosenberger photo

{credit}Ruedi Aebersold & George Rosenberger{/credit}

Guest post by Ruedi Aebersold, Professor of Systems Biology with a joint appointment at ETH Zurich and the University of Zurich, & George Rosenberger, PhD student in the Aebersold group at the Institute of Molecular Systems Biology, ETH Zurich.

Mass spectrometry-based proteomics is a data-intense research discipline that primarily aims at identifying and quantifying the proteins that constitute the proteome1. This is achieved by generating large numbers (104 to 106) of fragment ion spectra that represent peptides generated by proteolysis of the respective proteome. Mass spectrometers can operate in different data acquisition modes, referred to as data-dependent acquisition (DDA), targeted acquisition exemplified by selected reaction monitoring (SRM) or data-independent acquisition (DIA)2 exemplified by SWATH-MS3,4. Specific software tools then generate from these raw data processed mass spectra – from which sets of identified peptides, proteins and their abundance are inferred and annotated with metadata. Both, the generation and the processing of such raw data sets are resource and time intensive.  Further, if unique, irreplaceable samples are being analyzed, as is often the case with clinical cohorts the data cannot be re-generated. Therefore, the proteomics community has started to embrace data sharing by the means of different specialized public repositories, for example GPMDB5, PRIDE6, PeptideAtlas7 or ProteomicsDB8. For the last few years, the ProteomeXchange9 consortium has provided centralized deposition of raw data and their meta-annotation. Continue reading

Author’s Corner: Advancing the sharing and standardization of metabolomics data

Mark Viant photo

{credit}Mark Viant{/credit}

Guest post by Mark Viant, Professor of Metabolomics in the School of Biosciences at the University of Birmingham, UK, and Director of both the national NERC Biomolecular Analysis Facility – Metabolomics and the Phenome Centre Birmingham

In 2014, my research team published the first Scientific Data Data Descriptor for metabolomics measurements, Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control. This article described in great detail the many steps that are critical for ensuring the production of high quality (direct infusion) mass spectrometry (DIMS) data. It was our intention that this publication would help to establish the benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. The data was also made freely available in the MetaboLights public database for metabolomics data (dataset MTBLS79).1

Continue reading