Computational biology is one of the fastest growing areas in biomedical research but many biology students or researchers may not have time to study or access formal training. Geoffrey Siwo shares some tips on how to follow a less formal path.
When I started my undergraduate degree in biomedical science and technology 18 years ago, the idea of a student owning a personal computer would be the talk of campus, even if that student was studying computer science.
At the time internet access in many parts of Kenya, (and Africa at large), was still limited, with only about 200,000 of its 30m inhabitants online.
Here are some examples, based on my personal experience, on how a biologist can keep pace with computational biology.
Learn by solving a problem that matters to you
Computational biology is rapidly changing and covers a wide area of knowledge from nucleic acid and protein sequence analysis, statistics, machine learning, data mining, and so on. Knowing where to start is a challenge as there are many topics to cover.
If you are a full time biology student or researcher, you may not have enough time to learn several areas of computational biology and you may also end up not using much of the knowledge.
One practical way is to start with a biological question that really interests you. Then begin to think how to solve it using a computer instead of an experiment.
If this proves too hard, try breaking down your biological question into smaller, more manageable units. This is exactly how I first got into bioinformatics and computational biology.
In 2002 I undertook an internship at the Institute of Primate Research (IPR) in Nairobi. I worked with professor Jason Mwenda, who was studying the role of a family retrovirus-like materials in the human genome during pregnancy.
Referred to as human endogenous retroviruses (HERVs), they occupy about 10% of the human genome and are thought to be remnants of retroviral invasions of the human germ line.
At that time I had never heard of these retroviral elements, but I was intrigued by the possible interactions between them and HIV. I was particularly curious whether viruses like HIV could recombine with HERVs in infected cells, leading to HIV viruses with resistance to drugs or other new biological properties.
Testing this idea in the lab at IPR would have required more resources than we had. But the idea of testing this hypothesis on a computer by analysing publicly available DNA sequences intrigued me, and I was able to answer it at an internet cafe on campus at Egerton University, about two hours from Nairobi, where I did my undergraduate degree.
At the time my online activity was mostly limited to checking emails. The internet café charged about $1US per hour.
There were no formal courses on computational analysis of biological data, so I went to the café almost daily. It is there that I first learned how to analyze biological data sets with computers, laying a foundation for my research in the area without any formal computational biology training.
I first had to teach myself how to use tools like the NIH’s Basic Local Alignment Search Tool (BLAST) to find similar nucleic acid or amino acid sequences of interest, CLUSTALW to align multiple sequences so as to see how similar they are, and literature databases such as PubMed to read peer-reviewed publications.
Collectively, these tools enabled me to learn more about HERVs, HIV and bioinformatics with the goal of determining similarities between drug resistance mutations in HIV reverse transcriptatse and natural genetic variants in HERVs.
Share your ideas, interact with crowds, don’t be afraid being wrong! When you learn a new skill on your own, there’s a good chance you’ll make a mistake. But discussing your work with other biologists can help you avoid that.
While learning how to analyze DNA sequences, I got some interesting results comparing HERV and HIV reverse transcriptases. I shared these results with fellow undergraduates at seminars where I talked about HERVs and DNA sequence analysis.
Sharing my experience with others led us to organize a series of bioinformatics workshops to train other students, and enhance our own training in the process. At least four of my colleagues from these workshops now work full-time on computational biology projects.
Many online resources allow users to ask questions about their computational question of interest. Biostars, Seqanswers and Bioconductor, for instance, all offer forums where beginners and those with more experience can ask and answer questions.
Be sure to read their forums or FAQs first: You may find that other people have already asked similar questions that you would have wanted to ask.
Another option is to contribute to an open-source programming project in your field. Or, participate in one of the growing number of open-innovation or crowd-sourcing challenges whereby participants are invited to use some biological data to make specific predictions which are then validated using withheld datasets.
Such challenges are powerful because they provide a means of testing your skills using real-world data, and many are posted on sites such as Innocentive, Kaggle and DREAM (Dialogue for Reverse Engineering Assessments and Methods).
Some have cash rewards and others offer the chance of joining a vibrant community of computational biologists, or even of publishing your findings.
A few years ago I teamed up with friends at the University of Notre Dame in South Bend, Indiana, and participated in a DREAM challenge for predicting gene expression activity from DNA sequence.
Working on this challenge enabled us to learn more about gene regulation and machine learning. Later on, as part of another team, I entered a challenge on Innocentive for proposing new ways of estimating a person’s age based on extremely small DNA samples.
This challenge enabled us to learn more about the relationship between epigenetic modification and DNA. Participate in internships and hands-on workshops that solve problems that matter
Internships in computational biology labs can help you acquire the skills you need while also bringing your biological background to computational biologists.
You can also participate in hands on workshops or hackathons, which can focus on highly specific topics. One recent hackathon, for example, used gene expression to predict drug resistance in malaria. It was organized by H3Africa Bioinformatics Network, the University of Notre Dame and IBM Research-Africa.
Hackathon participants have different sets of expertise and are also good ways to learn soft-skills such as how to work in teams, in addition to technical skills.
Computational biology is an integral part of many aspects of biological research. A hands on approach starting with a problem of interest and learning skills that help to solve it and in the process can play a vital role in adapting to the rapid change.
Geoffrey Siwo (@gsiwo) is currently a scientist at IBM Research Africa where he leads the data driven healthcare team. He will be soon joining the University of Notre Dame, Indiana, USA, as an assistant research professor. Previously, he co-founded Helix Nanotechnologies, a DNA nanotechnology company.
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions