Consider for a moment the logistics of rewriting a genome from scratch. Starting from a reference genome sequence, you nip and tuck, recode and reorganize. Changes to any one element changes the genetic coordinates of every element downstream, meaning the process requires consider genetic bookkeeping.
Joel Bader, Jef Boeke, and an international team of colleagues faced precisely that problem as they rebuilt five chromosomes from the yeast, Saccharomyces cerevisiae – an effort Amy Maxmen covered yesterday in Nature News.
Bader likens the problem to a large collaborative software project, and in a sense, it is. Only instead of instructions to program a computer, this project involves genetic instructions for programming yeast – 12 million bases worth of them. Multiple researchers around the world need to be able to access those instructions simultaneously, and to modify them if desired. They must have the ability to review changes, and to roll them back if errors or missteps are discovered. And they need a way to establish provenance, determining which researcher created which change, and why.
Those needs essentially recapitulate the features of version-control software, tools like Git and Subversion, which allow researchers to manage big, collaborative software projects. And so, when it came time to tackle the yeast genome, that’s exactly what Bader and his team created.
BioStudio, he says, blends the properties of version control software with that of a word processor. Researchers can use the tool to make both local and global alterations to the genome – everything from individual base changes to global search-and-replace operations. Each change is tagged with the researcher who created it, who also must explain the alteration, like a well-annotated variant of the Track Changes feature in Microsoft Word. The system also automates tasks that would be difficult if not impossible for researchers to handle on their own, he adds – for instance, creating and managing a unique set of watermarks across both the synthetic and wild-type genome.
According to Bader, the version control tool is similar in concept to the popular version control software Git. Each round of edits produces a new ‘version’ of the genome (or rather, its underlying annotations). Users can track those changes, roll them back, and run a ‘difference finder’ to compare one version to another. The primary difference between the tools, he says, is that BioStudio maintains just a single sequence repository, whereas in Git, each user works off their own cloned copy.
Researchers can run BioStudio from the command line or via a graphical user interface, which is based on the generic model organism database genome viewer, GBrowse. Once they’ve made all the desired changes, the software takes over, Bader says, tweaking the sequence to accommodate the user’s choice of synthesis and assembly strategy (for instance, adding restriction enzyme sites for traditional cloning, or overlaps for homology-based strategies). It then “segments” the chromosome into oligonucleotides, outputting them onto electronic order sheets that are ready to send to the synthesis provider directly. (You can watch a nifty video of the software in action here.)
According to Bader, BioStudio was made possible by a trio of talented biologists and computer scientists in his lab – Sarah Richardson, Giovanni Stracquadanio, and Kun Yang – who were sufficiently well-versed in biology to understand the researchers’ needs – an object lesson in the value of multidisciplinary education.
“It’s just so important to have somebody who really understands the biology and understands the design goals and why we’re doing the things that we’re doing,” he says. “And also is able to understand algorithms so that when they have a problem, that if it’s a problem with a known solution, they know the solution, and if it’s a problem that doesn’t have a known solution, they understand how to think through how to develop an efficient algorithm.”
Though developed for the Sc2.0 project, anyone can use the tool, Bader says. BioStudio is based on open-source software, and installation instructions are included in the lead paper’s supplementary materials. But it’s complicated, he warns, as BioStudio depends on many other applications. An Amazon Web Services instance is also available.
Jeffrey Perkel is Nature‘s Technology Editor
Suggested posts
Escape gene name-mangling with ‘Escape Excel’
Away from home: Proteins in Germany
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions