TechBlog: C. Titus Brown: Predicting the paper of the future

C. Titus Brown, a bioinformatician at the University of California, Davis, participated in a January workshop at Caltech on “The Paper of the Future,” and wrote about the experience on his blog. Here, he expands on how academic publishing may change in the years to come.

p45

Continue reading

Escape gene name-mangling with ‘Escape Excel’

It’s been nearly a decade since Eric Welsh first noticed some weirdness with Microsoft Excel. A senior staff scientist in the Cancer Informatics Core at the H. Lee Moffitt Cancer Center and Research Institute in Tampa, Florida, Welsh was using Microsoft’s venerable spreadsheet application to view mouse and human gene expression data, the better to sort and understand the numbers. But a quick glance revealed the import hadn’t gone exactly as planned. “Excel would screw them up every time,” he says.

How so? When data are imported into Excel, the program works hard to figure out what kind of value each cell holds. Most of the time, Excel is smart enough to do that correctly, and values like ‘BRCA1’ and ‘12345’ are converted into text and integers, as expected. But “Excel is a little too smart for its own good,” Welsh says. If a cell reads “SEPT7,” the program assumes the author meant to write a date, and converts it automatically. It also sometimes translates what appear be numbers in scientific notation – say, ‘2310009E13’ – into actual scientific notation (‘2.31E+13’). The problem is, those two terms are neither dates nor numbers – they are proper names, scientifically speaking: gene names, sample identifiers or accession numbers. And by autoconverting them, those names are lost, or at least, obscured.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80

Continue reading

How is the rise of data-intensive research changing what it means to be a scientist?

Data-intensive research requires a new breed of scientist: interdisciplinary analysts who enjoy swimming in data, says Atma Ivancevic.

There has always been an emphasis on the generation of novel data in science. Being a scientist involves progressing from observation to hypothesis to experiment to output. In the past, a combination of scarce data to look at and low throughput machinery to make more has led to limited experimental outcomes.

2016-09-12-Atma Ivancevic 04-smaller-cropped

Atma Ivancevic

Continue reading

Big data jobs are out there – are you ready?

Jungwoo Ryoo, Pennsylvania State University

Big data is increasingly becoming part of everyday life. Network security companies use it to improve the accuracy of their intrusion detection services. Dating services use it to help clients find soulmates. It can enhance the efficiency and accuracy of fraud detection, in turn helping protect your personal finances.

“Big data” is a catchall term for any data set of exceedingly large volume. It could be transaction information at a credit card company, invoice data at an online retailer, meteorological measurements from a weather station. All these data sets have unique characteristics that make it extremely difficult to use conventional computing technologies and techniques to store and process them for analysis. Their variety is daunting, and high velocity is required to handle them in a timely manner.

Organizations in any field can use big data to enhance their effectiveness, which is why there are seemingly unlimited career opportunities in big data these days. The big data industry is growing fast, with the market predicted to grow at a compound annual growth rate of 23.1 percent over the 2014-2019 period.

So who is going to store, manage and process all this information? Well, why not you? Companies are starved for people with this kind of expertise. Big data is a growth industry and people from a variety of academic backgrounds can find successful careers in this area.

Get ready, get set….
World Bank Photo Collection, CC BY-NC-ND

Continue reading

#Scidata15: Big data: Challenges create opportunities

The era of big data brings with it a sea of opportunities for development and innovation.

Guest contributor Daniela Quaglia

naturejobs-blog-Scidata15-Opportunities-from-Research-Data

{credit}Image credit: SCIENTIFIC DATA/LUDIC GROUP{/credit}

Big data is here to stay. As scientists, we stand to benefit by being part of this exciting revolution. At the second Publishing Better Science through Better Data conference, held in London on October 23rd, Dr. Ewan Birney, joint associate director of the European Bioinformatics Institute (EBI), and Dr. Timo Hannay, founder of SchoolDash (a website that provides statistics about schools in England), walked us through some of the opportunities that arise from working with big data.

Opportunities in biology

Birney spoke about how the increase in big data is influencing the way we do biology. He promised to give the audience “an EBI centric view of the world”. I’m glad he did, because every scientist wanting to use big data should understand how EBI can help them.

EBI takes data provided by laboratories and stores, verifies, classifies and shares it. This approach means that a wealth of molecular-biology data, from DNA sequences to full systems (such us biomolecular pathways and metabolomics data), can be found in one place. As most scientists do not want to have to work from shared data in their raw form, the institute also works with the scientific community to convert original data into useful formats. Data from the Human Genome Project provides a compelling example of how such transformations can benefit the community — as Birney pointed out, not even the most experienced researchers want to analyse such complex raw data. Continue reading

People, publishing, and policy: Q&A with Janet Thornton, director of the European Bioinformatics Institute.

Janet Thornton has been named Dame Commander of the Order of the British Empire. She feels it is an important recognition of bioinformatics.

Janet Thornton has been named Dame Commander of the Order of the British Empire. She feels it is an important recognition of bioinformatics.{credit}EMBL-EBI{/credit}

The scientist profiled in the February issue of Nature Methods (the Author File) is Janet Thornton, the director of the European Bioinformatics Institute.

Here, she shares some additional insight about publishing, science policy, and mentoring. What follows is an edited excerpt of her conversation with Nature Methods. Read more here.

VM: In an era of not-so-plentiful funds, ELIXIR (interviewer looks up acronym…)—the European life-sciences Infrastructure for biological Information—and other initiatives takes you deep into policy-making. Which tends to not resemble a picnic on a sunny Nottinghamshire day. What motivates you?

JT: ELIXIR was launched Dec 18 and now has its own director. It does feel a bit that it’s my child. But it’s a child that has grown up and is really on its way to becoming independent and moving forward to being an independent adult. It’s still got a long way to go. It’s a bit like a teenager, actually. (laughs)

I honestly believe that these initiatives are the best way forward because, despite the setbacks, everyone broadly agrees. So it is a case of getting through the politics and making the science happen. As we know, science has no borders—and all scientists agree with this—so in the end, common sense will win and we can go forward.

VM: You have published around 400 papers. What does a paper mean to you?

JT: Probably for me the most important part of the process of science is publishing a paper. Because it’s the time when you really sort out what matters, why you did it, what you discovered and then you try and make it understandable for other people. And I have to say I get really upset when my papers are rejected.

VM: What types of papers do you enjoy reading?

JT: I love reading good solid papers, which are logical and explain how the results are obtained and why they are important. I used to spend hours in the library, like a detective tracking down information and knowledge.

VM: Rumor has it, you still present posters.

I don’t often present posters but there was one particular occasion when the University of Cambridge organized an event and they asked all the senior staff throughout the university to present posters. That was the last sort of official poster presentation. Of course, my students and post-docs have posters all the time. And I do man those posters as appropriate. It’s fun. You talk about your work.

VM: What is the best way for a scientist to select members most suited to his or her lab?

JT: Five things I look for: a) Bright/clever, b) Committed and interested in a project or area of research, c) Relevant expertise – though this is not the most important thing, d) What does the lab think? e) Would I like to have a meeting at 9am on a Monday morning with this person?

VM: Computational resources in the life sciences are not always appreciated. What do you recommend to scientists keen on being and staying tool-builders and resource-providers?

JT: Find a good place to go to follow your dream; find someone you want to work with and prepare yourself for the future. Not all scientists can be principal investigators (PIs), nor indeed want to be, so the key is to find your own niche.

VM: You studied physics at the University of Nottingham, then shifted to biophysics for your PhD at the National Institute for Medical Research. What do you advise when students of any stripe wonder: ‘Shall I choose physics? Computer science? Biology?’

JT: I am afraid I am biased—go with biology—it is amazing, beautiful, complex, but still an open book with lots to discover. And even if this were not enough, it has so many really important applications —many of the so-called grand challenges that will literally affect the future of this planet and everyone on it.

Bioinformatics what is it and how it can bring prehistory to life?

ivan.JPG

Ivan Karabaliev joined Eagle Genomics located at the Babraham Research Centre in Cambridge, UK, a bit more than a year ago and has been discovering the essence of bioinformatics. Coming from a business marketing background, Ivan likes to explain the complex world of bioinformatics to new audiences and the general public.

Explained in just one sentence, bioinformatics is the science of managing, analysing, storing and merging biological data (DNA sequences, proteins, etc.) using advanced computing techniques. Put another way, it is the application of computer science and information technologies to solve biological questions. Simple questions include asking what a specific region of given DNA is responsible for, or how closely related one organism is to another by comparing their genomes.

The genome is the entirety of an organism’s hereditary information; the genetic make-up of all living organisms. It contains the instructions needed for a living organism to grow and function. When we know the sequence of a gene, the role it has in an organism and the diseases caused by malfunctioning copies of the gene, this information can be used to improve life for the organism. This is where bioinformatics comes along, to better interpret and understand genetic messages.

The genomes of organisms, some of which can be several billion DNA base pairs long, can be stored in biological databases. The data stored may include gene function, structure, localization (both cellular and chromosomal), physiological or clinical effects of genetic mutations, as well as similarities of biological sequences and structures.

In 1990 the Human Genome Project was formally given a green light, encouraged by the need to understand and help cure human diseases – the genomic revolution started to take its first steps. The project was led by Dr. Francis Collins, head of the International Human Genome Institute. The whole human genome, which is 3 billion base pairs long, was sequenced in 2000. The news was proclaimed by Bill Clinton:

Humankind is on the verge of gaining immense, new power to heal. It will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases!

You can watch a YouTube video of the announcement here. During the announcement a very important fact was neglected: the sequence was not truly complete, but a mere first draft. About 10 percent of the human genome had not been read.

It wasn’t until 2003 that the human genome’s sequencing was officially completed. Since then, along with the constant improvement of bioinformatics, genetic investigations have enabled the development of new tests, drug targets and have given fresh insights into the basis of human disease. However, these pioneering investigations have also revealed just how complicated human biology is and how much remains to be understood.

The human genome project is a great example of the application of bioinformatics. The project stores huge amounts of genetic data in a database that analyses and maintains human genome sequences. The database is able to write complex, biologically-aware algorithms to analyse the massive amount of information and to compare it to other related data. This enables the efficient sequencing and identification of all three billion chemical units in the human genetic instruction set, helping to find the genetic roots of diseases. But, this is just one example of how bioinformatics can be used. Below is an overview of some of the other interesting applications of bioinformatics:

The Microbial Genome Project where scientists are determining the DNA sequence of C. crescentus, one of the microorganisms used for sewage treatment. Genomes of highly resistant bacteria are sequenced and analyzed to aid the waste treatment industry. Some bacteria can reduce levels of uranium in water. Other bacteria species like the Geobacter are capable of breaking down petroleum compounds so polluted waters can be treated.

• Climate change can also be aided thanks to bioinformatics. How? Well the Department of Energy in USA launched a program to decrease atmospheric carbon dioxide levels. One method of doing so is to study the genomes of microbes that use carbon dioxide as their sole carbon source.

• In the food industry, researchers anticipate that understanding the physiology and genetic make-up of Lactococcus lactis bacteria used in the dairy industry (buttermilk, yogurt, cheese, also used to prepare pickled vegetables, beer, wine and breads) will prove invaluable for food manufacturers as well as the pharmaceutical industry. Similar advances are expected in forensic science where bioinformatics tools are used to compare crime-scene samples to existing databases to see if they are present there or if they are related to other microbes.

• Another and potentially controversial application of bioinformatics is in defence. Scientists have built the virus poliomyelitis using entirely artificial means. They did this using genomic data available on the Internet and materials from a mail-order chemical supply. The research was financed by the US Department of Defence as part of a biowarfare response program to prove to the world the reality of bioweapons. The researchers also hope their work will discourage officials from ever relaxing programs of immunization.

In agriculture, sequencing of the genomes of plants and animals has enormous benefits for the field. Bioinformatics tools are used to search for potentially useful genes within these genomes and to elucidate their functions. The gathered genetic knowledge could then be used to produce stronger, more drought-, disease- and insect-resistant crops, or to improve the quality of livestock making them healthier, more disease-resistant and more productive.

Future uses of bioinformatics

• Medicine will become more personalised with the development of the field of pharmacogenomics, which is the study of how an individual’s genetic make-up affects the body’s response to drugs. At present, many drugs fail to make it to the market because a small percentage of patients show adverse affects to a drug often due to sequence variants in their DNA.

• Enhancement of gene therapies. Gene therapy is the approach used to treat , cure or even prevent disease by changing the expression of a person’s gene. Currently this field is in its infancy. There are currently many ongoing clinical trials for different types of cancer and other diseases.

• And finally my favourite example for potential use of bioinformatics is in sequencing dinosaur DNA. Remember Spielberg’s movie Jurassic Park based on the book by Michael Crichton? Scientist Mark Boguski read the book and decided to do a simple experiment to replicate the movie’s premise of dinosaur DNA having been preserved inside an amber-encased mosquito. He found out that the genetic sequence quoted in the book and movie had nothing to do with dinosaurs, so he wrote a journal article about his findings. Crichton came across this manuscript and approached Boguski to provide him with a real DNA sequence for his second book: The Lost World. (Read the full story here.) This is the actual paper where Boguski wrote his findings:

Conclusion

Bioinformatics isn’t going to replace lab experiments any time soon. For now it is best used to help “focus” and complement scientific research. In most cases, bioinformatics helps to eliminate false positives, saving time and money pursuing false leads. However, with the ever-increasing volumes of data, bioinformatics has become an important part of all genomic research projects and the future is bright. As developments in genomic and molecular research technologies improve, in line with developments in information technology, bioinformatics is becoming a major player in the understanding of biological processes and disease.