Take a lesson on open data from scientific megaprojects, says Sara Osman.
Today, the Protein Data Bank houses over 130,000 structures of proteins and nucleic acids — the molecules that constitute life. In this carefully curated and indexed database, users can browse entries and examine the three-dimensional arrangement of the atoms that make up life’s molecular machines. The shapes can help scientists from various fields to make informed guesses about their functions, formulate hypotheses and research questions, and explain and connect observations.
The bank represents generations of work from structural biologists over more than 70 years. Biologists have determined the structure of molecule after molecule — each new structure a brushstroke to the larger painting of how cells work.
The ultimate aim is nothing less than a complete visual understanding of life. Endeavors of such proportions hinge on international collaboration and open data sharing, with hundreds of research groups working together to contribute to the bank.
For years, structural biology methods meant that if scientists wanted to study a single molecule, they would need to take it out of the cell. That has changed in the last five years, with a technique called cryo (cold)-electron tomography.
The technique — which involves freezing a whole cell, slicing it, and investigating individual slices with electron beams — means we are now able to see molecules in their native environment inside cells: snapshots of tiny molecular machines going about their daily business at unprecedented resolution.
Only, “we” are not, because the images are not made publicly available. And why would they be? Obtaining these images, known as “tomograms”, takes work and time and precious grant dollars. Scientists with the know-how and expertise are sought after, and the data is still a goldmine of new information that can be analyzed and published, giving its owners a hefty competitive advantage. It comes as no surprise that laboratories may be reluctant to share their data at the outset.
Sharing data still lies well outside the traditional incentive structure in academic research. All too often the interests of individuals in terms of recognition, funding security and career growth square off with the interests of overall scientific advancement. The result is that impending breakthroughs in our knowledge of cells are right now lying idly on the magnetic disk stacks of a local server somewhere. Although no single laboratory has the capacity to exhaustively analyze these images, getting credit for being first still holds more weight than discovery in itself.
Frameworks for data sharing in various fields have been put in place in the past, and huge collaborative projects have consistently demonstrated astounding success. So why is data sharing not the norm when a new development arises?
The responsibility lies as much with publishers and funding agencies as it does with scientists. Publishers have the leverage to require the submission of data and metadata alongside results. They could also set up a system where scientists are credited for their data when used. This way, publication of data could become inherently rewarding for scientists themselves.
However, for open data to become the norm, a suitable platform (like the Protein Data Bank) is needed, which requires curation and manpower. Funding agencies aiming to promote transparency in science can help by allocating money to build and maintain these kinds of databases.
Moreover, funding agencies need to start placing less emphasis on publication record as the sole indicator of success when evaluating scientists’ careers. This will help to gradually reduce the tremendous pressure scientists face which drives them away from sharing data in the fear of being “scooped”.
Science is a fundamentally collaborative enterprise, and opening data to others for both discovery and scrutiny is crucial. Systematizing the publishing of data will help scientists worry less about their careers, and more about the collective pursuit of truth – advancing science for everyone.
Sara Osman is a PhD student doing basic science research in molecular and structural biology at the Max Planck Institute for Biophysical Chemistry in Germany. Originally from Egypt, her interest in science was sparked by reading popular science magazines as a teenager, when she became fascinated by exciting new ideas from quantum theory and relativity to the evolution of life.
Suggested Posts
How will open data advance scientific discovery?
Remapping the scientific landscape: moving from a closed to open science world
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions