Learning every way to break a gene

From Fig. 1 in Majithia et al. Nature Genetics 2016

From Fig. 1 in Majithia et al. Nature Genetics 2016

Finding the genetic cause of a disease—a mutation or genetic variant—is a lot like looking for a needle in a haystack. Except in the case of exome sequencing, it’s not always clear what a needle even looks like.

When a clinician finds a protein-altering variant in a gene known to cause disease, it could be the cause of the patient’s disease…or it could be nothing. This is the definition of a variant of uncertain significance (VUS). VUS’s often stay unknown unless someone puts in the time and resources to functionally characterize the variant. For obvious reasons, functional characterization of each individual VUS of every gene implicated in human disease is not practical.

One way to determine which variants cause disease and which don’t is to look at the DNA of many healthy individuals. If a healthy person carries a variant, it is unlikely to cause disease. The Exome Aggregation Consortium (ExAC) is the largest such effort—over 60,000 people have contributed their exome sequences to ExAC and thus provided a unique resource for clinicians to de-prioritize specific VUS’s as causes of disease.

But what if your variant is too rare or isn’t in ExAC?

Amit Majithia, David Altshuler and colleagues from the Broad Institute of Harvard and MIT developed a different strategy, published last week in Nature Genetics. The authors chose a gene, PPARG, that can cause Mendelian lipodystrophy when mutated. Some variants in this gene can also increase the risk of developing type 2 diabetes. PPARG also has many VUS’s in the population.

To find out which variants are likely to be pathogenic, the authors constructed a library of all 9,595 possible protein-altering variants of PPARG and tested them in pools using a functional assay in human macrophages. Cell pools that showed a positive result were sequenced and the numbers of each variant were counted, allowing the authors to assign a function score to each variant. These scores were then used, along with known benign and pathogenic variants from the prior literature, to train a machine-learning algorithm that could then classify each variant as pathogenic or not. The classifier found 6 new likely pathogenic variants, which were then validated through additional tests.

Summary of strategy used by Majithia et al.

Summary of strategy used by Majithia et al.

The strategy used by Majithia et al. could potentially be used by other researchers to study other proteins implicated in disease. Although it is somewhat of a brute-force approach to test every possible variant, the use of a pooled functional assay and computational classifier increases both the efficiency and accuracy of the result. We asked Dr. Majithia to tell us a little more about this study.

Author Q&A:

Why did you choose to focus your study on PPARG

In the diabetes community PPARG is a very important and storied gene. It has been linked to both common type 2 diabetes and rare familial forms. PPARG is also the target of multiple FDA approved drugs to treat diabetes. So on one hand, PPARG has been studied in humans and in the lab for two decades. On the other hand, we showed in 2014 (Majithia et al. PNAS) that even though PPARG had been sequenced in humans for so many years, we had only scratched the surface of the possible missense mutations that people in the general populations carry. Most of these mutations were benign, some strongly increased diabetes risk, and it took laboratory experiments with each and every mutation to sort them out. So PPARG was the perfect test case for our prospective experimental approach to test all possible missense mutations: it is relevant to common and rare genetic disease, has mutations of known function we could use for validation of our method, and has many unknown mutations, i.e. variants of uncertain significance that need to be functionally characterized.

Do you think that a similar strategy could be employed for other genes with many variants of unknown significance in the population? What would be the major challenges for applying this strategy to other genes?

Absolutely. A major purpose of this study was to demonstrate proof-of-concept that other investigators and clinicians could utilize for VUS in other genes/diseases. In principle there are three challenges in applying a prospective experimental approach to generate a “lookup table” for missense VUS: 1) making every possible missense mutation 2) building an assay with the scale and throughput to study every missense mutation and 3) connecting the lab experiments to what happens in people (i.e. phenotypes)

  1. The mutation synthesis technique we use, which was pioneered by Tarjei Mikkelsen, applies to any gene. Other groups like Jay Shendure’s in Seattle have independently developed methods to mutate genes at scale and now companies like TWIST Biosciences offer high quality mutation libraries for any gene. This is no longer a barrier to entry.
  2. Genes have myriad functions and so building an appropriately scaled, high throughput readout is a gene by gene process. No single assay can test the function of every gene, but there are partially generalizable strategies that can be used to study certain classes of genes. The strategy we used for PPARG, combining reporter gene expression and FACS, could be deployed for any gene that activates transcription. In fact we are taking this approach with another diabetes relevant transcription factor, HNF1A.
  3. Establishing clinical relevance of saturation mutagenesis data is a critical step. In our study we set a criterion for our assay, that in order for us to be able to discriminate variants of “unknown” significance, we should be able to accurately discriminate variants of “known” significance. For PPARG we benefitted from decades of research that had resulted in a series of missense variants with known function and disease effect. Many genes do not have such “allelic series” but with the increasingly widespread use exome/genome sequencing our knowledge of allelic series for genes is rapidly growing.

From your perspective, was the most surprising aspect of the study?

To independently prove the findings from our “lookup table” our collaborators in Cambridge took a series of VUS from patients referred to their clinic and tested them in single variant assays that are the standard for PPARG functional testing. In one case, our “lookup table” had a major discrepancy from the single variant assay. Careful follow-up work by our Cambridge colleagues resolved that the single variant assay that has been used for decades was actually incorrect and the “lookup table” made the correct call. We were surprised that, in this case, our new, high throughput assay had higher fidelity with human biology and it introduces a degree of nuance into what we consider “gold standard” when discrepancies inevitably arise.

How do you see your findings impacting on future research?

We hope this study will have its largest impact as a proof-of-concept for other genetic diseases and VUS interpretation. Our method is capable of improvement. For example, it does not assess missense effects on gene splicing, but others in the community are developing complementary methods that will overcome this (for example, see this paper from Gregory Findlay et al. published in Nature). We are particularly excited that our lookup table approach opens the door to systematically characterizing drug-by-genotype interactions that could be useful to guide treatment.

 

Spreadsheets have misprints – it is known

by Myles Axton

Normally we do not re-examine supplementary information in this detail, but there is a common minor problem that systematically affects a small number of gene IDs within long lists of gene names copied into spreadsheets in the supplementary tables of many articles. We suggest checking for this problem before submitting tables to journals. It is easy to see the altered gene names by sorting the column in a separate version of the file and then searching for the misspelled name to correct it in the replacement version intended for publication.

Example of the Excel formatting issue

Example of the Excel formatting issue

The authors of this paper claim that gene names in a large proportion of papers reporting gene expression data have this problem. Here we list the supplementary tables they identified in the journal prior to 2015 and from the first nine months of 2016 that we found to contain one or more misprints. We think that these mistakes do not prevent reuse of the datasets provided and as stated in the accompanying editorial Legible ledgers we do not propose to publish formal Corrigenda for the supplementary tables of these articles.

 

August issue cover: What’s going on here?

Rhinopithecus bieti

Rhinopithecus bieti{credit}Yong-cheng Long{/credit}

This month’s cover image is inspired by the paper on page 947 reporting the reference genome sequence of the black snub-nosed monkey, the second snub-nosed monkey genome paper published in Nature Genetics. The golden snub-nosed monkey genome was published in 2014.

In their paper, Li Yu and colleagues present the de novo genome sequence assembly of Rhinopithecus bieti as well as whole genome resequencing of all four other snub-nosed monkey species. All five species are among the world’s most endangered primate species. Three species, R. bieti, R. roxellana and R. strykeri, live at very high altitudes—above 3,000 meters. R. bieti lives exclusively on the Yunnan and Tibetan plateaus. The other two species, R. brelichi and R. avunculus, inhabit lowland regions. The authors compared the genome sequences between these species to identify genomic regions showing evidence of positive selection that could be related to living at high altitudes.

The photograph on the cover image was taken by one of the study’s co-authors, Dr. Yong-Cheng Long, who was profiled by the Nature Conservancy for his work on conservation of R. bieti (also called the Yunnan golden monkey by the locals). We asked Dr. Long to tell us a little about the monkey shown in the picture.

“The monkey is [a] male, whose name is ‘Big Guy’, and he is feeding on some leaves,” he said by email. “The Big Guy used to have 4 wives (about 6 years ago) and now has only 2, as he is getting old and is not strong enough to hold all of them because the females are more likely to find a strong shoulder to cry on.”

Dr. Long said there are 57 R. bieti individuals in the habituated “Yunnan snuby” group, which is open to the public. Because many of the individuals in the area are fully habituated to human presence, it is not difficult to get photographs of them. The group is only a small portion of the largest natural monkey troop (approximately 1,000 in total) in the world. Dr. Long emphasized the impact that illegal poaching has had on the monkeys. “This species has been endangered by human’s killing, and the monkeys can certainly survive once the killing is stopped.” In China, 2016 is the Year of the Monkey, and it has turned out to also be a lucky year for these particular monkeys. “We found the monkey group has boomed,” said Dr. Long. “12 of the 57 are the infants born this year.”

monkey

Nature Genetics office mascot

The lead author of the study, Dr. Yu, became interested in studying these species because of his focus on conservation genetics of endangered mammals distributed in Yunnan Province, China. This is one of the core regions of biodiversity in the world. “The most notable among the endangered mammals distributed in Yunnan Province is R. bieti, which is found exclusively on Yunnan and Tibetan Plateau”, said Dr. Yu by email. “It is unique in that it is the only primate having a red mouth like most humans, which [is why it’s called] one of the most beautiful animals.” Dr. Yu also noted that it is the highest altitude-dwelling nonhuman primate. It can survive in very cold and hypoxic environments that other primates cannot tolerate. “So, I was deeply attracted by this mysterious and interesting species, and was eager to come to understand it.”

 

IMG_1863We at Nature Genetics are also celebrating the Chinese Year of the Monkey. Our office mascot is this golden snub-nosed monkey (right), which was produced for marketing purposes in China (I snagged one during a recent visit to the Shanghai office). Scanning a barcode on the monkey’s rear end (left) will take you to the publication of the R. roxellana (golden snub-nosed monkey) genome paper.

 

 

Joint calling of the ExAC publications

ExAC publications in Nature

We report this week in Nature and Nature Genetics the first publications from the Exome Aggregation Consortium (ExAC), a project that has generated the largest catalogue to date of variation in the protein-coding regions of the genome (known collectively as the exome), aggregating sequence data from over 60,000 individuals from across 21 research studies. Most importantly, they have provided a publicly accessible database (https://exac.broadinstitute.org), which has already become a critical resource for research and clinical studies. While an estimated over 1 million individuals have been exome or whole genome sequenced, only a small fraction of this data has been made publicly available, as there are many challenges to sharing and providing open access to these datasets. We applaud the authors for recognizing this need and meeting these challenges.

This work comes 15 years after we published the Human Genome Project, and follows in a series of community resources to catalogue variation in human genomes within and across populations. We continue to support these efforts, recognizing the necessity of developing these resources to further studies to understand the information encoded in our genome, genetic variation and genetic basis of disease.

Mapping ExAC publications

Very rare genetic variation: a first look

The scale of this sequencing dataset in ExAC has provided some of our first glimpses into very rare genetic variation across populations, with several important early insights. Firstly, the authors identify more than 7.4 million high-confidence genetic variants, on average one every 8 bases, the majority of them entirely novel (not present in any existing database) and extremely rare (more than half of the variants are seen only once across all 60,706 samples). Second, they are able to document recurrent rare mutations emerging independently, providing an estimate of the frequency of recurrence, never observed systematically before due to the need for such large sample sizes. Third, they are able to examine the level of selective constraint against protein-truncating variation, identifying 3,230 genes that appear highly loss-of-function-intolerant. Reassuringly, this includes most known human haploinsufficient disease genes, however 72% do not yet have an established human disease phenotype. While some of these genes may be associated with weaker phenotypes or embryonic lethality, this points to how much more we have yet to understand about the phenotypic consequences of loss of function in human genes.

Copy number variation in ExAC

In a companion paper in Nature Genetics, Douglas Rudefer, Shaun Purcell and colleagues examine rare copy number variation (CNV) with the ExAC dataset, specifically the rates and properties of genic CNVs with <0.5% frequency. They use their previous method XHMM to characterize CNV calls from this exome sequencing dataset. They find that ~70% of individuals carry at least one rare genic CNV, with an average of 0.81 deleted and 1.75 duplicated genes. The authors also estimate relative intolerance to CNVs for each gene. This CNV dataset is incorporated into ExAC and will be useful for continuing population and disease association studies, together with other measures of genic intolerance, and the authors provide an example of this in analysis of a schizophrenia case-control study.

Clinical genetics: classifying pathogenic variation

The current work also brings an important message for clinical genetics in the need for reexamining the literature on classifying pathogenic variation for rare disorders. The average ExAC participant harbors ~54 variants that have previously been classified as causal for a disease, and considering the ascertainment of the study it is likely that most of this may be due to misclassified variants.

Using ExAC as a reference panel for classifying disease relevant variation, Lek et al. review the evidence for pathogenicity of 192 previously reported pathogenic variants for rare Mendelian disorders. Only 9 of these variants had sufficient support for disease association, with a high proportion of these variants present at an implausibly high frequency in the ExAC dataset. This suggests that many of these were false positive associations and incorrectly classified as pathogenic, the implications of which are not merely academic, as these findings are often used in clinical diagnoses and treatment.

In two additional companion publications, the authors take this a step further and demonstrate what is needed to move towards resolution of the nature of these prior associations, by bringing together large case series combined with ExAC. Walsh et al. (Genetics in Medicine, 10.1038/GIM.2016.90 published online August 17, 2016) systematically reexamine evidence for genes implicated in cardiomyopathy, one of the most common and severe rare disorders, and find many well known purported cardiomyopathy genes do not show support for pathogenicity, including some that are included in routine clinical genetic testing. Similarly, Minikel et al. collect 16,025 prion disease cases, the largest case series ever available for prion disease, for which ~10-15% of cases are estimated to be caused by mutations in the PRNP gene. They find a number of variants in PRNP thought to be pathogenic and with high penetrance appear to be likely benign (Minikel et al. Science Translational Medicine 10.1126/scitranslmed.aad5169). This led to a corrected patient diagnosis soon after this report, as Robert Green explained in his Perspective accompanying this publication (Lebo et al. Science Translational Medicine 10.1126/scitranslmed.aad9460).

These findings highlight the necessity to carefully evaluate the literature for rare genetic disorders. This also reinforces the value of large reference panels such as ExAC for filtering variants seen in patient exomes, a practice most of the genomics community has adopted in establishing standards for assessing sequence variants in human disease (MacArthur et al. Nature 508, 469–476 (2014), 10.1038/nature13127). The ExAC project continues to expand in size, hoping to increase to more than 120,000 exome sequences over this next year, as well as 20,000 whole genome sequences, bringing additional sample size, diversity and exploration of non-coding regions that will aid these efforts.

ClinVar and contributing to variant interpretation databases

This project, which relied on the willingness of many large research consortia to provide their raw data, demonstrates the extreme value of promoting the sharing, aggregation and harmonization of genomic data. This is true also for patient genetic variants, as there is a need for databases that provide greater confidence in variant interpretation. NCBI’s ClinVar database, which accepts contributions of clinically annotated genetic variation from clinical labs, clinicians and researchers, has become a key resource for clinical variant interpretation.

Improvements to the landscape of clinical genetics will require continued investment in such variant databases, continued expansion of human genetic reference panels, as well as efforts to link these to phenotype data. Recontacting to obtain phenotype data will be trialed on a subset of the ExAC dataset where consents allow, while new initiatives such as the UK 100,000 Genomes Project and the US Precision Medicine Initiative will also provide linked genome and phenotype information. Finally, enabling the ethical sharing of linked genetic and clinical data without violating participant privacy will require fundamental innovation in regulation and ethics policy, work that has been started by bodies such as the NIH and the Global Alliance for Genomics and Health, but around which considerable uncertainty remains.

Farm to Genomes: African Rice

Meyer at al., Nature Genetics, 2016

Meyer at al., Nature Genetics, 2016

Rice is one of the most important crops on the planet, responsible for feeding billions of people. Given this global significance, studying rice in different geographies can be useful and aid in harnessing genetic diversity underlying particular traits and adaptations favorable to different environments. African rice (Oryza glaberrima Steud.) is mainly grown in sub-Saharan Africa and known for its stress tolerance. In a new article this week in Nature Genetics, Michael Purugganan and colleagues report the whole genome re-sequencing of 93 African rice landraces from various regions of Western coastal and sub-Saharan Africa. They create a genome-wide SNP map and through comparative genomic analysis study the domestication and population history of African rice. They use their map to perform GWAS for salt tolerance and find 11 significantly associated regions, highlighting the value of this unique genetic resource.

Meyer et al., Nature Genetics, 2016

Meyer et al., Nature Genetics, 2016

By studying various regions with distinct environments, the authors were able to get clues about adaptation and geographic spread of the populations. They focused on coastal Senegal and inland Togo, which have higher and lower levels of soil salinity, respectively, and interviewed farmers in the region to understand the agricultural practices they employ in each region. The knowledge of the farmers helped to inform the genetic analysis and contributed to the model of African rice domestication and dispersal.

You can watch some of the interviews with the farmers here:

African rice farmers- interviews

Additionally, we spoke with authors Michael Purugganan and Rachel Meyer to get some background on this research.

Why do you think that rice is understudied in Africa compared to other places?

MP: I think it’s because it is not widely grown, unlike its Asian counterpart which has pretty much taken over the world.  But there definitely is more interest in African rice as breeders are trying to figure out how to increase food production in Africa, as well as to try to see what genes in African rice can be used to improve Asian rice.

RM: There is a lot of great research on improving Asian rice for African farmers that is being done by brilliant AfricaRice scientists, and they are working hard on the social science side too. But there are so many challenges that Africa disproportionately faces – particularly climate variation – that demands ramping up rice research. There is insufficient support for programs that integrate crop experiments and trials into the different farmlands. A better connection between scientists and small-scale farmers would really help farmers adopt new varieties too- because there is sometimes resistance to trying new ones.

How did you choose which samples to include in your analysis?

RM: Recognizing that a lot of NGO work encouraging farmers to grow Asian rice ramped up in the 80’s and 90’s, we took advantage of the germplasm largely donated in the 70’s to the West Africa Rice Development Association, which were duplicated and available through IRRI (International Rice Research Institute). We chose accessions with the most metadata available, preferring ones with georeferenced location and a cultivar name. It wasn’t until later that we realized water tables far inland were high in salinity, so we just tried to make sure we had a fair number of samples within 250km of the coast, or along rivers connecting to the ocean.

Were you surprised by any of your findings?

MP: There definitely were a few surprises in the data, but the big revelation for me was the long time for the population bottleneck that led to domestication.  We found from the genomic data that it may have taken more than 10,000 years of steady population decline before full-blown domesticated African rice shows up in the archaeological record.  This suggests the possibility that humans were already cultivating or managing its ancestor for thousands of years, and I think if this pattern holds for other domesticated crop species it will change our thinking on how domestication has taken place.

RM: I was surprised we got nice GWAS results with so few samples, and even more surprised that we saw several of those exhibiting signatures of geographic selection. We were lucky to find a broad distribution of traits in the landraces we chose to sequence, for we had made the DNA libraries ahead of the phenotyping experiments.

What was it like to meet and talk with the farmers?

RM: It was one of the highlights of my life to meet the farmers! I’m grateful to have gotten a glimpse of their heritage, their pride, and their struggles. We were all so impressed with the generosity of women, in particular, to help each other. We were also shocked by how many farms are run by the elderly; their children don’t see farming as profitable and many have left. For the three of us in the field, it made us think hard about how we can give back to the communities that gave us their time. I hope that crop science, publicity (like this blog) and policy changes can raise the profile of the small-scale farmer.

In each interview, the farmers also had a chance to interview us, and that part was especially interesting. Several asked really good questions about African and Asian rice domestication. You could see the cultural value of the basic science.

You chose to focus on salinity tolerance as a trait particularly relevant to farming in Africa.  In what ways do you see your results being used for crop improvement?

RM: One of the authors, from AfricaRice, Dr. Kofi Bimpong, had actually been working on salt tolerance separately as well, and has two graduate student collecting African rice landraces in Casamance. If from this paper we can consider that domestication possibly occurred in the Inner Niger Delta region and also in the West, then these collecting efforts are all the more important because they are from a center of origin, promising more genetic variation than people would have ever estimated. If you look through the available germplasm there is so little that has been collected or studied from Casamance. It’s tricky collecting there, for there is social unrest, and landmines. Hats off to the young graduate students, Mamadou Sock and Bathe Diop, doing that fieldwork; I’m sure there is a lot of discovery to be made with those collections, and more promising salt tolerant landraces to integrate into breeding programs.

In addition, our results suggesting many of the salt tolerance genes are shared in both rice species make them more valuable to explore in other crops.  Shared adaptive mechanisms are especially fascinating to evolutionary biologists and are powerful assets of the breeder’s toolbox.

July issue cover: What’s going on here?

JulyThis month’s cover features the inspiring block-like karst mountains of the Li River between Guilin and Yangshuo in Guangxi province. The image was inspired by a study in this month’s issue reporting deep sequencing of the MHC region in individuals of Han Chinese ancestry. The study represents an important resource for the study of immune-related disorders in Asian populations. It also identifies loci associated with risk of psoriasis, thus demonstrating the power of this resource.

In addition to simply being a beautiful image evocative of the mountains in Guangxi province, the image also brings to mind the peaks that might be observed in many types of genomic data, such as Sanger sequencing reads, ChIP-seq peaks, etc.

Our own chief editor, Myles Axton, did first-hand research leading to the selection of this cover image. As he found, the Yulong River in Yangshuo is less muddy than the Li River and better for swimming and sightseeing from bamboo rafts (arrow indicates NG editor in the field).

Yulong River

Yulong River{credit}Myles Axton{/credit}

 

Myles holding a 20 yuan note with drawing of karst mountains.

Myles holding a 20 yuan note with drawing of karst mountains.{credit}Myles Axton{/credit}

June issue cover: What’s going on here?

Carrot canang sari by Rachel Meyer

Carrot canang sari by Rachel Meyer

As June comes to a close, it’s time to look back at our June issue and ask “what’s going on here?” with the cover image. As you may have guessed, the image is related to the publication of the carrot genome sequence in this month’s issue.

The cover image was provided by Rachel Meyer, a scientist who was not a co-author of the genome paper. Dr. Meyer was previously a postdoctoral researcher with Michael Purugganan at NYU and is an AAAS Science and Technology Policy fellow. She is also a co-founder of Shoots & Roots in New York.

Dr. Meyer gave us the following information about the carrot canang sari on the June cover:

Celebrating the recent availability of rainbow carrots year-round in Washington DC, I cut them in various ways and laid them out in a public dirt plot between the sidewalk and the street that was still bare because Spring had barely started and planting was far from beginning. The cold kept the carrots nicely preserved for three days. The installation took about 6 hours, and the design itself was lifted from a Persian carpet, sharing an origin with some of the earliest domesticated carrots. I had no intention to leave the installation there but people in the busy U-street/Shaw district, coming home late at night from the bars, would stop and photograph it, and even some of the suits interrupted their morning power walks to work to investigate it. After a few days, to my surprise it was not rats, but a middle-aged man who had decimated the carrots for a meal.

Shelby Ellison, an author of the carrot genome article this cover references, did this research as part of her NSF Plant Genome Postdoctoral Fellowship. We were in the same class of Fellows together and became friends because we would look for cool restaurants around DC together during our brief visits to NSF for annual Plant Genome meetings. I’m grateful to be able to explore the subject of her science through installation.

For more about the carrot genome paper, see our previous blog post, featuring Q&A with the corresponding author.

Cancer clones- mixing and spreading

Shah 1

McPherson et al., Nature Genetics 2016

The trajectory of tumor cells during metastasis can be influenced by many factors, including the physical environment and the genetic makeup of metastatic clones. In high-grade serous ovarian cancer, there are limited barriers in the intraperitoneal space, allowing for extensive spreading and mixing of tumor cells. A recent article published in Nature Genetics explores these different patterns of clonal evolution in metastatic ovarian cancer using a combination of bulk and single cell sequencing.

The authors characterized the mutation landscapes of different metastatic tumors and find both monophyletic and polyphyletic clones. While in most patients there was unidirectional seeding from the original ovarian tumor, two patients exhibited polyclonal spread and reseeding. Therefore, high-grade serous ovarian cancer cells can migrate through and establish metastasis within the intraperitoneal space via different evolutionary routes.

McPherson et al., Nature Genetics 2016

McPherson et al., Nature Genetics 2016

We spoke to lead author, Sohrab Shah, to get some background on this research.

What features of this particular cancer made you want to study its metastasis? Were you surprised by your findings?

High grade serous ovarian cancers are often widespread through the peritoneal cavity at diagnosis.  We wanted to ask what are the characteristics of cells that spread and what is the distribution of these cells throughout the abdominal lesions.  The focus was to study the disease state prior to any treatment to characterize the diversity and take in inventory of the ‘substrate’ of clones upon which treatment selective pressures may be acting.  Many patients experience relapse after initial response to treatment.  Mapping which clones lead to relapse remains a key question in the field.  This was borne out in one patient in our study where specific clones that led to relapses were already present at diagnosis but only represented a minority of branches in the clonal phylogeny.

It is important to note that the mode of spread in this disease differs from most solid cancers, where spread is achieved through the bloodstream or lymphatics.  Ovarian cancer represents a unique opportunity to study disease spread through a relatively physically unencumbered anatomic space.  One might expect that in such an environment the potential for clonal intermixing is high.   This might lead to many clones co-existing at many sites.  But the majority of intraperitoneal samples were clonally pure, suggesting unidirectional spreading from ovary sites with diverse clonal repertoires, and a lack of clonal intermixing.

You provide evidence that the microenvironment influences the metastatic success of tumors. What does this say about in vitro cancer models that don’t account for tissue context?

One of the intriguing findings suggested that specific clones were present in specific sites.  This may indicate that particular microenvironments are differently suited to particular clones. Another surprising finding was that every patient harbored at least one lesion that was very diverse in its clonal make-up (typically within primary ovary sites).  This leads to the natural question of whether properties of specific microenvironments in some way promote or ‘tolerate’ clonal diversity.  If this were the case, then both in vitro and in vivo model systems such as cell lines, organoids and mouse xenografts may not adequately represent the natural disease state we find in patients prior to treatment.

How did you choose your sampling strategy?

The study results are naturally biased by the sampling strategy.  The study design was subject to what material could be obtained during the provision of care.  In our setup, we consented patients for collection and study of all material removed at primary debulking surgery.  Wherever possible tissue was cryopreserved, but inevitably many deposits were preserved in formalin.   Our strategy led to acquisition of a median n=10 samples per patient.  The nature of the samples and their locations are presented in Figure 4 and are also available in interactive web-form at:

https://compbio.bccrc.ca/research/tumour-evolution/

Users can click on the links for each patient and explore the clonal maps.

You utilize both bulk and single cell sequencing as complementary approaches to elucidating tumor evolution. Can you comment on the trade offs between cost and throughput and how you chose your sample sizes?

The field is entering an interesting time.   There are several limitations to both bulk and single cell sequencing strategies to define the clonal constituents of a tumor sample.  Most single cell techniques suffer from vast under-sampling of the clonal repertoire since they are limited in throughput and may only practically yield data from 100s of cells.  Furthermore, single cell techniques are prone to two key experimental sources of noise: missing data and allele-dropout.  We used targeted, multiplexed single cell sequencing as a form of validation from inferences made from the bulk sampling including validating co-occurrence of point mutations and structural variations in the same cells.  Hypotheses were generated from multi-site bulk analysis and were then tested using orthogonal single cell approaches.   Accordingly, the sample sizes in single cell were chosen to identify clones that were detected in bulk samples – in the range of 5% prevalence.  Notably, the noise properties of targeted multiplexed single cell data required some careful statistical treatment, the results of which were published as a standalone contribution in Nature Methods simultaneously with this publication.  As the field moves forward, it may become practical to sequence the whole genomes of 1000s of cells per sample. I look forward to the day when a single experimental design would be sufficient to dissect the important clones present in a cancer.  This would enable studying evolutionary properties at scale, leveraging richly defined principles and statistical models from the field of population genetics.

You find that there are differences in the potential for migration and metastasis across the tumors from your patients. What clinical implications might this have?

Our study is underpowered to provide a clear answer on this.  Our results hint anecdotally that cases with strong patterns of unidirectional spread fared poorly in their treatment trajectories.  Whether cancers harboring clones with strong potential to invade new micro-environments and dominate their local landscapes indicates potential to evade chemotherapy remains an important question to consider.  As we take this study forward in model systems derived from spatially distinct sites, reproducible treatment selection experiments can be carried out to robustly address this question.

 

May issue cover: What’s going on here?

May2016This month’s cover image is inspired by the Article on p. 528 of this issue, by Jeff Wall, Nicola Illing, Nadav Ahituv and colleagues. The paper reports the genome of the bat Miniopterus natalensis and transcriptional dynamics in the developing bat wing. This species, one of a group known as vesper bats, is also known as the Natal long-fingered bat and is found in parts of Africa.

The image chosen for the cover is a frontal view of a bat embryo at a late stage of development (stage CS21) taken by study co-author Mandy Mason. This developmental stage is known as
“Translucent Wing”, as you can clearly see the skeletal structures in the wing and the membrane between the outstretched digits. The embryo in this image was stained with Alizarin red (maroon-red-pink) for bone and Alcian blue (blue-cyan) for cartilage. The image was actually taken as part of an earlier study to understand the progression of limb development in this species and to compare it with that of the mouse.

The current study presents not only the genome sequence of the Natal long-fingered bat, but also RNA-seq and ChIP-seq (for H3K27ac and H3K27me3) profiling of the developing limbs. The authors identified more than 7,000 genes that were differentially expressed between the forelimbs—the eventual wings—and the hindlimbs. Through comparative genomics analyses, they found nearly 3,000 regions showing evidence of accelerated evolution along the bat lineage that overlapped with H3K27ac peaks, suggesting that these are candidate enhancer regions for wing development. “This study offers a comprehensive resource for future work in comparative limb development,” co-author Mandy Mason told us. “Aside from the results that we have presented in this paper, these open datasets can be queried to help answer additional questions that may be asked by both our and other research groups.”

 

The Colorful Carrot Genome

Simon carrots 1

Iorizzo et al. Nature Genetics, 2016

A high-quality assembly of the carrot (Daucus carota) genome is reported this week in Nature Genetics. Carrot is an important crop due to its high content of Vitamin A precursors, alpha- and beta-carotenes, as well as its popularity in global cuisines.  The bright orange color of the modern carrot and its high carotenoid content are features that emerged through selection and breeding- the complete genome sequence will serve as a resource to aid breeders in crop improvement strategies.

Iorizzo et al., 2016, Nature Genetics

Iorizzo et al., 2016, Nature Genetics

Sequencing the carrot genome allowed for the identification of two novel Whole Genome Duplication events and 634 proposed pest and disease resistant genes. In addition, a novel candidate gene regulating carotenoid accumulation was found. Finally, the authors re-sequenced 35 carrot species and outgroups to determine genomic regions associated with domestication and estimated genetic diversity. Further phylogenomic comparisons with other plants clarified evolutionary divergence between carrot and tomato, grape and kiwifruit.

Iorizzo et al., 2016, Nature Genetics

Iorizzo et al., 2016, Nature Genetics

We spoke with lead author Philipp Simon to get some background on the research.

How did you end up working on carrots?

The position I am in focuses on carrot genetics and breeding. It became advertised soon after I completed my Ph.D. in genetics. The ability to do genetic research on a crop with a strong positive impact on consumers appealed to me. I was fortunate enough to enter that position.

What do you consider your most surprising result coming out of sequencing the whole genome?

The discovery of a candidate gene for the Y locus, which conditions the accumulation of carotenoid pigments in carrot roots. In previous work we were able to map the trait and also genes for enzymes in the carotenoid biosynthetic pathway, but none of those genes involved in carotenoid biosynthesis mapped with the Y locus. With a well-characterized genome available, we discovered a candidate for that important gene. The Y locus is one of the two genes responsible for the domestication of wild white carrots (ancestral wild type) to orange.

What user group do you think will benefit the most from these data?

The immediate users of the whole genome sequence will be by plant breeders for marker-assisted selection they have underway for carrot disease resistance and seed production traits. There are also several public sector labs doing more basic research on carrot pigments, biotic and abiotic stress response, reproduction, and evolution that will find it useful.

You propose an interesting model for carotenoid accumulation in the carrot. How might this knowledge be applied to the potential improvement of other crops?

 There are several possibilities. The knowledge of this mutation in carrot may provide insights for identifying similar mutations in sequenced genomes of other crops, or generating similar mutations with genome editing technologies, for example. This could have application with other root crops such as cassava, but similar mutations are also known to influence pigment accumulation in fruit crops, so there may be applications beyond root crops.

What are some of your future directions going forward now that the genome assembly is complete?

 Now we are using the carrot genome to understand genes for other carrot traits, including traits influencing accumulation of carotenoids, anthocyanins, carbohydrates and flavor terpenoids; pest and disease resistance; abiotic stress responses; plant reproduction and growth.

Bonus- do you have a favorite carrot recipe?

Regarding carrots in my diet, I usually eat raw carrots, but roasted or stir-fried carrots are also very tasty.