The Human Genome Project led to a paradigm shift in the way science is conducted and data is shared, says Rehma Chandaria.
Guest contributor Rehma Chandaria
In 1996, an international group of scientists came together in Bermuda to discuss how sequence data from the Human Genome Project (HGP) should be released. The meeting concluded in the formation of the ‘Bermuda Principles’, a set of rules ensuring the data would be immediately shared on publicly accessible databases as it was generated. This ground-breaking accord contravened the conventional practice of releasing data only after publication in scientific journals. It changed the way we see data sharing, and ultimately, changed the way science research was conducted.
Its success demonstrated how a global community of scientists could collectively produce and use data far more efficiently than an individual could. This greatly benefited scientific progress and led to many important new insights and discoveries. For example, information of 30 genes associated with disease was published prior to publication of the draft sequence in 2001.
Recognising its ability to accelerate progress, there is an enormous push for all scientists to make raw data publicly available for others to analyse and use. As a prerequisite for publication or receiving grants, it is becoming increasingly common for journals and funding bodies to insist that data is shared openly.
The tools and infrastructure to support this are improving all the time. Shared data are only valuable if they are searchable and usable. Numerous data repositories for different types of data have been set up. General data repositories, such as Dryad and Figshare, can be used for any data type. The addition of digital object identifiers (DOI) to resources in these public repositories can also make searching for specific entries easier. Furthermore, the DOI means that data can be cited in a way that properly credits the producers of the data. The majority of journals and publishers welcome research articles reporting analysis and conclusions based on previously published datasets with a DOI, for example F1000 Research. Ensuring that the scientists who produce and share their data are rewarded with citations is crucial for encouraging this practice.
The HGP took 13 years and US$3 billion to sequence the first human genome. It is now possible to sequence a human genome in a matter of days for only $1000. This rise of ‘big data’ (the rapid collection of large volumes of information) requires researchers from different backgrounds and specialities to work together. There are now interdisciplinary research centres like Cambridge Big Data, which bring together expertise from laboratory based life science researchers with computer scientists and mathematicians. This is essential for processing, analysing, storing and utilising vast quantities of data.
Gone are the days when important scientific discoveries are made by one man alone in a laboratory, making scribbles in his notebook. The advent of new technologies that produce reams of data every minute means that it is almost impossible for scientists to fully analyse and interpret their results in isolation. Working together with others in your field as well as experts in other disciplines is vital in these times of data-intensive research. The world has become a smaller place in the internet era, meaning that it is now possible to quickly and easily share large amounts of data that can be accessed by researchers across the globe. This can only be a good thing for science – sharing knowledge is critical for progress.
Rehma Chandaria is a winner of the 2015 Scientific Data writing competition. She is also a PhD student the University of Nottingham researching tissue engineering of colon epithelium. She has just finished an internship at the Fund for the Replacement of Animals in Medical Experiments (FRAME) and enjoys various activities away from the bench such as contributing to HEARTblog. When not working, Rehma likes experimenting in the kitchen as well as watching and umpiring cricket matches.