Cuddly Koala Genomics

{credit}Rebecca Johnson{/credit}

The genome assembly of the koala is reported in a paper published online in Nature Genetics. This high quality genome represents the most complete genome sequence for a marsupial to date. The data give insight into the highly specialized koala diet, consisting of eucalyptus leaves, and provide information that may be useful to combatting infectious disease.

Koalas are a vulnerable species and part of the aim of the the project was to use the genomic data to inform conservation efforts. We spoke with lead author Rebecca Johnson to get some background on this work:

Koala{credit}Rebecca Johnson{/credit}

How did the koala genome project come to be?

The genome project started as a small group of Australian researchers (from the Australian Museum, University of the Sunshine Coast and University of Sydney) who were enthusiastic about koala conservation and using genomics to manage populations and diseases. We partnered up with colleagues at the Ramaciotti Centre at the University of New South Wales (UNSW) who were enthusiastic to try out their new sequencing equipment on a ‘de novo mammal sized genome’. This hadn’t been done before in Australia.

We decided to take a bit of a risk and announce to the world in 2013 that we were establishing the Koala Genome Consortium and sequencing the genome. This was a very effective way of getting our project on the scientific horizon but then the pressure was on us to deliver! Fortunately for me (and the koala) one of my biggest career risks (announcing the genome well ahead of time) has resulted in a brilliant collaboration of scientists producing a high quality genome with many exciting outcomes and applications.

 

What do you think were the most interesting or surprising findings that came out of the genome data?

So many interesting things have come out of this work, so it is difficult for me to pinpoint one in particular. However, as a conservation geneticist I’m particularly fond of the conservation genomics work, particularly the historical population reconstruction which infers what koala populations would have looked like through evolutionary time. It was a little surprising to discover that koalas underwent such a dramatic decrease in population size 30-40kya, which was around the time many of the megafauna were experiencing extinction in Australia. Another surprise was that the three koalas used for this analysis are from two quite geographically separate locations (~600 km apart) but both suggest a dramatic reduction in population size indicative of widespread pressures across the continent.

Having this ‘deep-time’ perspective on koala populations, combined with the contemporary population work we did as part of this study we have a long term understanding of koalas in the landscape (i.e. the importance of long-term regional gene flow). Conservation management efforts can now be based on this holistic knowledge rather than a single genetic snapshot taken in time.

 

What are the biggest threat to the koalas now?

The koala is now classified as ‘vulnerable’ due to habitat loss and widespread disease. Threats to koalas are multifaceted, with the biggest primarily due to loss and fragmentation of habitat, urbanization, climate change and disease. Current estimates put the number of koalas in Australia at only 329,000 animals (range 144,000-605,000), and a continuing decline is predicted unless measures are put in place to arrest this decline.

 

How do you envision that this genomic information can aid conservation efforts?

The benefit of the genome to conservation efforts is widespread. The population diversity information presented in our work provides the impetus for a conservation management strategy to maintain gene flow regionally while incorporating the genetic legacy of biogeographic barriers. We have also identified the huge contrast in genome-wide levels of diversity across the northern and southern populations of koalas which will be factored into future decision making. The importance of genetic diversity indices for koala conservation has been included in the recently released NSW koala strategy so we will be focusing on highlighting the genetically healthy koala populations and ensuring they maintain regional gene flow. If more intensive measures such as translocations are required (for example from the genetically diverse populations to the genetically depauperate populations), we now have the tools and data to inform those decisions.

The immune gene repertoire we report as part of the genome is also being used directly in efforts to understand the response of koalas to disease such as chlamydia and the koala retrovirus (KoRV). Several of our collaborators on this work are involved in very important work developing and trialing vaccines for both chlamydia and KoRV. The genome affords the ability to understand which immune genes are up or down regulated in response to disease or treatment and provides the platform for future therapies to be tailored to the genome level.

 

What is it like working with koalas? Do you have any good stories that you would like to share?

It never gets tiring working with koalas and it was not difficult at all to bring collaborators on board to work on this project!

Koalas are notoriously chilled out animals (spending most of their time sleeping or eating), although my friends and colleagues who wrangle them in the field do report how unpleasant it is to be on the receiving end of their extremely sharp claws and nippy diprotodon teeth!

As part of sequencing the genome, our efforts to extract suitable quality DNA from koala blood were unsuccessful (possibly because they have a high lipid content in their blood) the only way we could get suitable quality DNA was to wait for an animal to be euthanized so we could access tissues suitable for genome and transcriptome work. Our two females were euthanized because they had advanced untreatable chlamydia. It is an extremely sobering experience to be involved in these necropsies because you can see the ravages of the disease on the body. While these moments are very tough they also inspire you to work harder to ensure we are producing the best possible science to conserve this amazing species.

 

For more video information, please see:

 

https://www.youtube.com/watch?v=tcMCni28nNo&t=4s

The Colorful Carrot Genome

Simon carrots 1

Iorizzo et al. Nature Genetics, 2016

A high-quality assembly of the carrot (Daucus carota) genome is reported this week in Nature Genetics. Carrot is an important crop due to its high content of Vitamin A precursors, alpha- and beta-carotenes, as well as its popularity in global cuisines.  The bright orange color of the modern carrot and its high carotenoid content are features that emerged through selection and breeding- the complete genome sequence will serve as a resource to aid breeders in crop improvement strategies.

Iorizzo et al., 2016, Nature Genetics

Iorizzo et al., 2016, Nature Genetics

Sequencing the carrot genome allowed for the identification of two novel Whole Genome Duplication events and 634 proposed pest and disease resistant genes. In addition, a novel candidate gene regulating carotenoid accumulation was found. Finally, the authors re-sequenced 35 carrot species and outgroups to determine genomic regions associated with domestication and estimated genetic diversity. Further phylogenomic comparisons with other plants clarified evolutionary divergence between carrot and tomato, grape and kiwifruit.

Iorizzo et al., 2016, Nature Genetics

Iorizzo et al., 2016, Nature Genetics

We spoke with lead author Philipp Simon to get some background on the research.

How did you end up working on carrots?

The position I am in focuses on carrot genetics and breeding. It became advertised soon after I completed my Ph.D. in genetics. The ability to do genetic research on a crop with a strong positive impact on consumers appealed to me. I was fortunate enough to enter that position.

What do you consider your most surprising result coming out of sequencing the whole genome?

The discovery of a candidate gene for the Y locus, which conditions the accumulation of carotenoid pigments in carrot roots. In previous work we were able to map the trait and also genes for enzymes in the carotenoid biosynthetic pathway, but none of those genes involved in carotenoid biosynthesis mapped with the Y locus. With a well-characterized genome available, we discovered a candidate for that important gene. The Y locus is one of the two genes responsible for the domestication of wild white carrots (ancestral wild type) to orange.

What user group do you think will benefit the most from these data?

The immediate users of the whole genome sequence will be by plant breeders for marker-assisted selection they have underway for carrot disease resistance and seed production traits. There are also several public sector labs doing more basic research on carrot pigments, biotic and abiotic stress response, reproduction, and evolution that will find it useful.

You propose an interesting model for carotenoid accumulation in the carrot. How might this knowledge be applied to the potential improvement of other crops?

 There are several possibilities. The knowledge of this mutation in carrot may provide insights for identifying similar mutations in sequenced genomes of other crops, or generating similar mutations with genome editing technologies, for example. This could have application with other root crops such as cassava, but similar mutations are also known to influence pigment accumulation in fruit crops, so there may be applications beyond root crops.

What are some of your future directions going forward now that the genome assembly is complete?

 Now we are using the carrot genome to understand genes for other carrot traits, including traits influencing accumulation of carotenoids, anthocyanins, carbohydrates and flavor terpenoids; pest and disease resistance; abiotic stress responses; plant reproduction and growth.

Bonus- do you have a favorite carrot recipe?

Regarding carrots in my diet, I usually eat raw carrots, but roasted or stir-fried carrots are also very tasty.

Ancient migrants left Africa with a ‘mutational load’

The populations that broke off from early out-of-Africa migrants may have progressively accumulated harmful genetic mutations, suggests a new study published this month in the Proceedings of the National Academy of Sciences.

Modern humans, originating in Africa, started migrating out of the homeland towards Asia and the Americas around 50,000 years ago. Theoretical models predict that the expansion out of Africa might have happened through small bands that started expanding into multiple continents.

Population genetics theory says that each population breaking off from these small bands carried a mutational load.

Scientists in this study say that not only did the migrations leave a mark on the genetic diversity of different populations, but they also gave way to classes of harmful alleles that have different patterns across said populations. The farther away from Africa (in other words, the greater the distance covered away from the homeland), the more harmful the mutations or genetic variants are.

To test their hypothesis, the team of scientists sequenced the full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico.

The next-generation sequencing technology they used confirmed that the mutations under scrutiny evolved with the migrations, and revealed that the degree of the harm is directly proportional to the distance traveled away from Africa.

“To be able to see this, you need a huge amount of data in many populations from different continents. Only 5 years ago, this would not have been possible,” says study co-author Laurent Excoffier, in comments to Science Daily.

Biting into the pineapple genome

"Pineapple and cross section" by Taken byfir0002 | flagstaffotos.com.auCanon 20D + Sigma 150mm f/2.8 - Own work. Licensed under GFDL 1.2 via Commons - https://commons.wikimedia.org/wiki/File:Pineapple_and_cross_section.jpg#/media/File:Pineapple_and_cross_section.jpg

“Pineapple and cross section” by Taken byfir0002 | flagstaffotos.com.auCanon 20D + Sigma 150mm f/2.8 – Own work. Licensed under GFDL 1.2 via Commons

The genome sequences of cultivated pineapple (Ananas comosus) and a related wild species (Ananas bracteatus) were published last week by Ming et al. in Nature Genetics. The genome has already led to insights into monocot evolution and CAM photosynthesis. In the future, studies that use the pineapple genome have the potential to lead to innovations in engineering drought resistant crops.

Every species, plant, animal or microorganism, that is sequenced is a useful resource for the research community. But each time a new genome is sequenced, we ask “what is really new about this one” and “what are we learning about biology”?  Pineapple is of course a delicious and economically important crop, but what makes its genome special?

There are a number of important aspects of pineapple biology that make it an important genome to sequence. First, pineapple uses a metabolic strategy known as crassulacean acid metabolism (CAM). CAM allows the plant to conserve water, making it more resistant to drought. Only one other CAM plant has had its genome sequenced, the orchid Phalaenopsis equestris.

NG-NV42149 Liu_Figure2

{credit}Zhong-Jian Liu, National Orchid Conservation Center of China {/credit}

Another reason to study the pineapple’s genome is to understand how self-incompatibility has evolved in monocotyledon plants. Wild pineapple species are self-compatible, but cultivated pineapples are not. As a result, cultivated pineapple is highly heterozygous. This aspect of pineapple biology also makes sequencing its genome technically challenging. Fortunately, the authors of the study devised a way around this potential problem to generate an extremely high-quality genome assembly (see the image on the right, courtesy of Zhong-Jian Liu, who was not affiliated with the study. Click for a larger view).

One of the most interesting aspects of the pineapple genome was only discovered after the genome was assembled. As the study’s authors found, pineapple has conserved the order of genes on its chromosomes more so than any other monocot studied to date. This high degree of synteny with the hypothetical ancestral monocot makes pineapple an ideal outgroup for comparative evolutionary studies involving other monocot species, such as grasses.

We spoke to the lead author of the study, Ray Ming, to learn a little more about how the study was conducted.

The genomes of many plants have been sequenced, or are in the process of being sequenced. Why did you decided to focus on pineapple?

I started my career at the Hawaii Agriculture Research Center and have been working on genomics of Hawaiian crops, including papaya, pineapple, sugarcane, and coffee.  We sequenced the papaya genome first.  It is a logical choice to sequence the smallest genome of the remaining three next. In addition, pineapple is the most economically important CAM plant crop, the second most important tropical fruit, is self-incompatible, and prone to somatic mutations.

How was the idea arrived at to use hybrids (the F153 x CB5 F1 cross) to overcome issues of high heterozygosity in the assembly process? Was this the initial plan, or were there other ideas as well?

We anticipated the difficulty of assembling the heterozygous pineapple genome.  Before we started the genome project, I discussed this issue with co-author John Bowers during the International Plant and Animal Genome Conference in San Diego, and John was the one who came up with the idea to sequence an F1 individual at deep coverage to have a single molecule from each parent for phasing to improve the assembly of the reference genome F153. Co-author Michael Schatz implemented this strategy, and also designed sophisticated approaches to improve the assembly of this heterozygous genome as detailed in the method section. Mike’s team did an outstanding job to produce a high quality assembly of this highly heterozygous genome. Mike is a pioneer and a leading scientist in assembling complicated and complex plant genomes.

We also tried to sequence the genome from single sperm cell to generate haploid genome sequences, but it wasn’t successful.  The long reads from Moleculo and PacBio improved the genome assembly, and the ultra-high density map of re-sequencing F1 individual genomes substantially improved the quality of the genome assembly and corrected 199 chimeric scaffolds.

Did you expect to see such high levels of conservation of synteny with ancestral monocots in the pineapple?

No. It was a surprise, but it makes sense since pineapple is self-incompatible and vegetatively propagated, hence having fewer generations of sexual reproduction in its evolutionary history.

How do you envision others using the pineapple genome sequence in their research?

The pineapple genome will be used for CAM photosynthesis research as a model system, and it will be used as a reference genome or even the reference genome for comparative genomics in monocots.

800px-PapayaBonus question: What is your favorite fruit?

Pineapple for its extraordinary flavor and aroma, and papaya for its number 1 nutritional value among fruits, and for its flavor.

The Method of the Year for 2013 is… single-cell sequencing

Single-cell sequencing edged out other contenders as our choice of Method of the Year in 2013. These techniques really came into their own in 2013 and are fast providing new insights into the workings of single cells that ensemble methods are incapable of.

Method of the Year 2013Back in 2008 we chose next-generation sequencing as our Method of the Year not only because of how the new techniques would improve performance in conventional sequencing applications, but also because they opened up whole new applications, unthinkable with traditional Sanger sequencing. Our choice of Method of the Year in 2013 bears this out, as none of these single-cell sequencing applications would be possible without next-generation sequencing. And in some applications the sequencing is used almost exclusively for identifying and counting tagged molecules.

Our choice likely comes as a surprise to all those who were certain that we would pick CRISPR/Cas9 technology for targeted genome modification. This is certainly an exciting technology, and not only for genome engineering, but also for epigenome editing as described in a Method to Watch. But genome editing with engineered nucleases was our pick for the 2011 Method of the Year and although CRISPR/Cas9 provides a huge practical improvement by largely dispensing with the need to engineer the nuclease and relying instead on a programmable guide RNA, the advance over 2011 is mostly one of ease-of-use.

Methods to investigate biology at the level of single cells have been of keen interest to Nature Methods since the journal started. Our first research article from Robert Singer described a paraffin-embedded tissue FISH (peT-FISH) method to simultaneously detect expression of several genes in situ in single cells while maintaining tissue morphology (Capodieci, P. 2005). This was followed by many other imaging-based methods for such things as measuring cell growth (Groisman, A. 2006), quantifying mRNA (Raj, A. 2008) and protein (Gordon, A. 2006) levels, profiling intracellular signaling (Krutzik, P.O. & Nolan, G.P. 2006)(Loo, L.-H. 2007) and DNA insertion-site analysis (Schmidt, M. 2008) in single cells.

The number of original research articles published in Nature journals exploded in 2013

The number of original research articles published in Nature journals exploded in 2013. These numbers may not be complete.

The publication of M. Azim Surani’s article on mRNA-Seq whole-transcriptome analysis of a single cell (Tang, F. 2009) in 2009 helped signal the rise of sequencing-based methods for single-cell analysis. But even two years later the Reviews and Perspectives in our supplement on single-cell analysis were more focused on imaging-based than sequencing-based aproaches to single-cell analysis.

It was only in 2013 that we finally saw an explosion of original research articles using or reporting single-cell sequencing methods in Nature-family journals. Numerous studies reported new biological results that relied on sequencing of whole or partial genomes or transcriptomes from single cells.

Our Method of the Year special feature has three Commentaries by researchers in the field, including some of the earliest developers and users of methods for single-cell analysis. An Editorial, News Feature and Primer describe our choice and provide helpful background information. We hope you enjoy the selection of articles in our special feature.

A star is born: the updated Human Reference Genome

The release of the 38th build of the human reference genome gets a well-deserved rock-star greeting by the scientific community.

The new GRCh38 is already a rock-star

The new GRCh38 is already a rock-star{credit}Wikimedia Commons/Flickr:Starman/K.Spencer{/credit}

Fans know it is worth the effort to camp out for tickets to a concert by a beloved rock, pop or country star. GRCh38, the newest build of the human reference genome, is that kind of star. Delayed by a few snags and also held up by the US government shut-down, the sequence has just traveled to GenBank for use by the scientific community.

Not only has Genome Reference Consortium build 38 (GRCh38) eliminated some pesky previous gaps, it will be the first human reference assembly to have sequence information for centromeres. Up until now, centromeres, which are specialized structural components of chromosomes, have been represented in the reference by gaps of 3 million base pairs. The news about centromere sequence will be of interest to cell biologists and genomics researchers alike.

“This will be a major boon to evolutionary studies of human populations and to the many groups doing mechanistic work on human centromeres and kinetochores,” says Stanford University researcher Aaron Straight, whose work focuses on cell division and chromosome segregation. “Finally, now we can stop saying ‘mind the gap’.”

The reference genome finishers are the members of the Genome Reference Consortium (GRC) at the European Bioinformatics Institute, the US National Center for Biotechnology Information, The Wellcome Trust Sanger Institute and The Genome Institute at Washington University.

Scientists may not have physically camped like concert-goers in front of the buildings where genome finishers scurry to get the sequence out the door. But the throngs have been virtually present. The GRC, which works on human, mouse and zebrafish reference genomes, is “having to field a lot of questions from folks who want to know the minute they can have the assembly,” says Deanna Church, a genomicist formerly at the US National Center for Biotechnology Information and who has, since this interview, moved to Personalis, a genetic testing and analysis company.

The din has faded from the 2001 celebration marking the end of the Human Genome Project. But the sequence was not complete nor is it complete now. As colleagues at Nature Methods have pointed out here and here, the sequence originally had around 150,000 gaps.

The most recent reference genome, Genome Reference Consortium build 37 (GRCh37), has 357 gaps. And is missing sequence around the centromeres. No longer.

Come here, centromere
The structure and repetitive nature of centromeric regions has made them largely inaccessible to methods used to create the reference assembly, says Church. The concept and the methods to produce the centromere sequences for this reference build were developed by a research team at University of California at Santa Cruz (UCSC). They constructed sequences using the Sanger technique and the data helped the team behind GRCh38 to fill in these important gaps.

The centromere community will be happy to no longer say this.

The centromere community will be happy to no longer say this.{credit}Wikimedia Commons/Clicsouris{/credit}

In a paper, the UCSC team, led by Karen Miga and Jim Kent, a member of GRC’s scientific advisory board, noted that centromeric regions are replete with near-identical tandem repeats—satellite DNA. Difficult assembly of these regions have led them frequently to be excluded from genomic studies. In the new reference genome, the scientists used reads generated during the Venter genome assembly and created models for the centromeres, says Church.

“These models don’t exactly represent the centromere sequences in the Venter assembly, but they are a good approximation of the ‘average’ centromere in this genome,” she says. And these sequence models are not exact representations of any one centromere, either. But including these sequences in the reference assembly “will likely improve genome analysis using current methods, and allow for some further study of population variation in centromere sequences,” says Church.

Continue reading

Fishy research at the Biology of Genomes 2012

Strange things are afoot at the Biology of Genomes meeting at Cold Spring Harbor Laboratory in New York this week. When Jeramiah Smith, a geneticist at the University of Kentucky in Lexington delivered his talk this evening, he started by apologizing that what he was about to present was “kinda weird.”

Smith studies lampreys, ghastly looking jawless fish that hold a special place in the hearts of evolutionary biologists. Our common ancestor with these beasties resides somewhere deep within the Precambrian boughs of vertebrate ancestry.

Smith says he faced a puzzling problem with the lamprey genome, though. Some DNA sequence he had produced from lamprey sperm cells simply wasn’t lining up with the lamprey genome assembled by Sanger. Some bits aligned partially, and then veered off into unmatched DNA.  Other bits were completely without a match. “That turned out to be a red herring in a sense,” he says. The sequence wasn’t lining up because up to about half a billion basepairs of DNA found in the reproductive cells of lampreys is deleted from all other adult cells.

Much of Smith’s work has since been trying to figure out both why and how the lamprey seems to make about 20% of its genome disappear during the development of all but its gametes.

A pacific lamprey {credit}Dave Herasimtschuk, Fresh Waters Illustrated{/credit}

Through sequencing DNA and RNA and comparing what he’s found with sequence from other animals, he’s identified a handful of genes that disappear at some point between fertilized egg and full grown fish. Among them, APOBEC1, some genes for zinc finger proteins and WNT7A/B. Several, said Smith, have qualities of “stem celly-ness”.

This makes sense. A fertilized egg would make good use of stem-cell-related genes as it divides and differentiates into all the cells of an adult organism. It also might make sense why the genes would be deleted. Genes that favour a stem-cell state also have a tendency to be oncogenic. Humans and other vertebrates have ways of tightly controlling the expression of such genes in adult cells to prevent cancers from occurring.  But, asks Smith, “What better way to get rid of them,” than completely throwing them out?

A few other organisms also seem to delete genes in this way, including the only other known jawless vertebrate, the hagfish. And although the genetics of these species are definitely weird, Smith hopes that further study might make it clearer what’s going on and possibly help explain why other vertebrates, including humans, tend to do things differently. “I think it has something to tell us about the way our genes are regulated,” said Smith.

 Region