From the archives (2004): Large-scale structural variation in the human genome

Scherer_Lee

{credit}Iafrate et al. Nature Genetics 2004{/credit}

During the past 25 years, Nature Genetics has been lucky to publish many exciting papers, more than a few of which can be described as “landmark” papers—publications that have had a dramatic and long-lasting impact on a field. In 2004, the Journal published such a study by Stephen Scherer, Charles Lee and colleagues (Iafrate et al.) in which they reported 255 loci across the human genome containing large structural variants.

In 2017, the idea that there exist large numbers of structural variants in the genome (such as rearrangements, deletions and insertions) that differ from person to person is an established fact. But in 2004, this was not the prevailing wisdom. Prof. Scherer has already written an excellent essay at The Winnower about the study and its importance to the field, so I won’t recap it in detail here—I will simply encourage you to the read the piece.

Charles Lee wrote us about the study by email. “I saw a talk by Dr. Dan Pinkel at the 2002 ASHG meeting where he presented his latest array CGH findings,” he remembers. “In his talk, one of the slides showed the array CGH results of a trisomy 18 patient and Dan remarked how cleanly his array platform performed, especially for the other chromosomes. But in fact, I (and others, I’m sure) could see that there were actually occasional clones that deviated from the expected log2 ratio of 0. During the question period, I sheepishly asked him about these clones. I really didn’t mean to criticize his platform, but I think that he took it that way. Those “blips” bothered me and when I returned to Boston, John Iafrate (who was a postdoc with me at the time) began our own array CGH experiments. Ironically, there were several other groups that were way ahead of us with respect to technical expertise and experience with array CGH, but it could be that they considered these “blips” as technical artifacts – without biological implications.”

Prof. Lee added, “In late 2003, I gave a talk at the University of Toronto and met Stephen Scherer in person for the first time. In a casual conversation, we realized that we were both using the same 1 MB chromosome microarray platform from Spectral Genomics and that we were both seeing these recurrent ‘blips’ in our data.”

Stephen Scherer also corresponded with us by email about the study and the mutual decision to collaborate with the Lee lab. “We were both were fresh enough to look beyond what others were calling ‘noise’ to realize these aberrations represented intermediate and gene-level copy number variation.”

“Many of us suspected it was there,” he said of the large-scale variation they uncovered, “based on the fact there were lots of smaller indels and that 0.6% of the population carried cytogenetic alterations. We kind of predicted it in our chromosome 7 mapping and sequence paper, but only at the chromosomal level.”

ng1416-F1

Circles to the right of each chromosome ideogram show the number of individuals with copy gains (blue) and losses (red) for each clone among 39 unrelated, healthy control individuals. Green circles to the left indicate known genome sequence gaps within 100 kb of the clone, or segmental duplications known to overlap the clone, as compared to the Human Recent Segmental Duplication Browser. Cytogenetic band positions are shown to the left. {credit}Fig. 1 from Iafrate et al. 2004{/credit}

The study by Iafrate et al. was published on August 1, 2004. Exactly one week prior, a very similar study by Michael Wigler and colleagues (Sebat et al.) was published in Science. The methods used by the two groups were different, but the findings and implications were consistent with each other. “Charles and I were happy to see the Wigler paper,” said Prof. Scherer, “because nobody believed our results.” Prof. Lee added, “This was one of the most difficult papers for me to publish. The reviewers were very skeptical. We had to keep providing more and more validation data, and one of the reviewers even commented that s/he did not believe that the paper was worthy of being an article and we had to shortened the paper into a Brief Communication. At the end, Reviewer #2, who was persistently negative wrote: ‘… I still feel hesitant about publication of this work in Nature Genetics… and I still doubt the importance and novelty of their work.” Prof. Scherer remembers similar levels of skepticism in the community. “Prior to publication I was showing the data at talks, including one at Michigan where they were trying to recruit me, and I remember getting trashed. People in my own department were mostly the same.”

[I looked up the referee reports and internal notes from the review process and Prof. Lee is correct that at least one of the reviewers was very skeptical about the impact of the study. However, I do want to note the very unusual fact, at least by today’s standards, that the study was published a little more than 2 months after initial submission, according to our records. I wish this was more common!]

After publication, however, the importance of the studies was immediately clear, at least to those working most closely in the field. Nigel Carter contributed a News and Views article in Nature Genetics about the studies. He wrote, “This unexpected level of LCV [large-scale copy-number variation] forces us to re-evaluate our view of the structure of the normal human genome.”

However, Prof. Lee remembers some ongoing skepticism about the work. “For more than 18 months after the paper was published, I had trouble getting grant funding for continuing my work in human copy number variation. Some comments that I received included, ‘If this was real, the Human Genome Project would have found it.’ I am embarrassed to say that I was forced to write for smaller grants on other topics and when funded, did everything I could to complete the projects using less money and use the ‘extra’ funds for my human copy number variation interests. It was very, very frustrating.”

In 2007, Science announced Human Genetic Variation as the Breakthrough of the Year.  “When I saw this article in Science,” Prof. Lee said, “I felt like there was finally some widespread acceptance of our findings in the general scientific community.”

“However, this came with different issues.” For example, he often received the response from the GWAS community that structural variation is interesting, but it is too difficult to incorporate into GWAS. “So, most association studies continued to focus on SNPs, which is a problem that persists to this very day.”

The findings in Iafrate et al. were based on, by today’s standards, a fairly small sample of 55 individuals profiled by array comparative hybridization array comprising ~12% of the genome (the study in Science reported results from 20 individuals using representational oligonucleotide microarray analysis). However, the impact on the field was anything but small. Part of the legacy of the studies was the establishment of the Database of Genomic Variants (originally the Genome Variation Database) that has now collected over 550,000 CNVs. The discovery that so many structural variants are present in our genomes, even in healthy individuals, opened up an entire field of study to understand the function of these variants, and much is still to be discovered (see for example a recent study on the impact of structural variation on human gene expression).

Prof. Scherer summed up the impact of the studies this way: “If you remember the fights between the public Human Genome Project and Celera Genomics, and them finger-pointing to the errors in each other’s assemblies, in many cases these were due to CNV and other structural variations. They had no idea these CNV variants existed. It was really the 2004 Nature Genetics and Science papers, coincident, pure discovery, that opened the eyes of the community and it took some longer than others to believe it.”

From the archives (1995): Guidelines for interpreting and reporting linkage results

NG1995In 1995, Nature Genetics published a report by Eric Lander and Leonid Kruglyak, recommending clear statistical guidelines for reporting linkage results for complex traits. The paper had an immediate impact, setting the bar for what could or could not be called “significant” in the literature. Although originally focused on human genetic linkage studies, the guidelines set forth by Lander & Kruglyak influenced fields from model organism genetics to plant genetics, and eventually genome-wide association studies (GWAS).

The mid-1990’s was a very exciting time in genetics. The human genome project had recently been announced and advances like microsatellite linkage maps of the human genome and multiplex sequencing technology were now available. Mapping genes underlying complex phenotypes was now a real possibility, and human geneticists were busy prospecting for genetic gold. However, as Lander & Kruglyak cautioned in their paper, the lack of clear guidelines could foster a spate a false positive reports that would, if left unchecked, discredit a the nascent field (for example, see this 1993 paper in Nature Genetics finding no evidence for a previously-reported linkage region for manic depressive illness).

On the other hand, setting too high a bar for reporting significance would mean missing many true signals where they exist, an equally dangerous proposition for a new field. As explained in the paper, “striking the right balance requires both a mathematical understanding of how positive results will occur just by chance and a value judgment about the relative costs of false positives and false negatives.” The paper then outlines the mathematical and statistical arguments in favor of the standards we now all know and love.

Capture

{credit}Lander & Kruglyak, Nature Genetics 1995{/credit}

I spoke with Leonid Kruglyak, co-author of this landmark paper, to get a sense of the context in which this paper came about, and the impact it had on the field at the time of publication. He first explained that it was finally possible to conduct genome-wide linkage studies with hundreds of individuals, allowing linkage mapping methods to be applied to complex traits (for example, this genome-wide screen for schizophrenia susceptibility genes published in the same issue). However, unlike Mendelian genes, there was no clue as to “how many signals there should be, or what their expected sizes were.” Thus, the need for a statistical framework.

This need was recognized as well by the Journal. As Prof Kruglyak recalls, Kevin Davies (founding editor of Nature Genetics) originally commissioned this work as a News & Views article, but it then evolved into a more extensive piece as its implications became clear. However, as he remembers, there was still a very strict deadline for the paper as it had to make the next issue (and these were still the days of hard-copy submissions). At the time, Prof Kruglyak was a young postdoc, so it fell to him to rush to the main FedEx office in downtown Boston before closing time, to make sure the manuscript got to the printer on time.

Prior to submitting the final text, Lander & Kruglyak produced some of the “original preprints”, sending a copy of the paper by snail mail or email to “everyone we knew in statistical genetics”, for comments and suggestions. After all, these guidelines would affect quite a lot of people and “signals that people would like to be results might not be real results anymore”.

Presentation1

{credit}Curtis, Nature Genetics 1996{/credit}

Following publication, “the reactions came in essentially two flavors,” Prof Kruglyak recalls. There were those who thanked the authors, saying that someone really needed to do this. Others were less enthused. “They said, ‘you’re standing in the way of progress and making it harder to publish.’” In fact, Nature Genetics published two letters to the editor arguing that the proposed genome-wide significance threshold was too strict, or that at the very least additional discussion was warranted before these guidelines were adopted (see the letters here and here, and the authors’ reply here). Personally, I agree with the overall sentiment of Lander & Kruglyak as summed up in this portion of their reply: “The correspondents (all trained statisticians) argue that there is no need for guidelines because everyone should be able to interpret the genomewide significance of pointwise P values on their own. In our view, this is naïve. Most geneticists are not statisticians, and rules of thumb can be extremely helpful in promoting sensible discussion.”

The legacy of this paper is clear to anyone familiar with GWAS. “The GWAS community learned a lot from that whole experience [of false positive linkage reports],” says Prof Kruglyak. “There were many serious statistical geneticists involved [in the GWAS field] from the beginning, with a lot of carryover from the linkage era to the GWAS era.”

“Guidelines are not just ‘external gatekeepers’”, he noted.  They are not just there to tell you what you can and can’t publish. “You know what they say, the easiest person to fool is yourself.” These guidelines were developed to help researchers understand their own findings better and decide which are worth following up. “You can often make up a plausible story, but how strong is the evidence?”

25 years of Nature Genetics

 

AprilThis April marks the 25th anniversary of the first issue of Nature Genetics, and I think it’s safe to say that the field of genetics has come quite a long way. In 1992, we were still nearly a decade away from the draft human genome sequence, “omics” was not yet a word in common usage, and CRISPR/Cas9 gene editing wasn’t even a pipe dream.

Most of the content in our current issue would have possibly seemed like far-fetched science fiction to geneticists in 1992. Take for instance the new-and-improved domestic goat genome assembly reported on page 643 of this issue, for which multiple, relatively new technologies were employed to create one of the most complete and contiguous genome assemblies to date. However, as the News & Views by Kim Worley exemplifies, science marches on. While the geneticists of the past might have marveled at the possibility of a whole-genome shotgun assembly (indeed, a major advance reported in that first issue was a new technology allowing for automated sequencing of 106kb), Worley refers to the scientists of the present who are “frustrated with the highly fragmented genome sequences available for most species.”

Still, many things have remained the same.

Taking a look back at the very first editorial published in the journal, much of the journal’s mission in 1992 is still applicable to 2017. Take this passage:

“Researchers should not be dismayed that developments like this are widely reported in the general press. That is merely a measure of the widespread compassionate interest in inheritable disease. Who can be but flattered by such public testimony to the importance of a field of research?

“The research community’s interest, rather, is that there should also be a wide general understanding that the identification of an aberrant gene does not imply that there is a cure at hand for the condition for which it is responsible. […] The elucidation of the mechanisms by which genes determine the behaviour of the cells that carry them will be a general preoccupation in the years ahead. Nature Genetics intends to play its part in the publication of this important research, and also of course, in classical genetics that throws light on the human genome.”

NG1992

{credit}doi:10.1038/ng0492-1{/credit}

While there is no denying that important medical advances have been enabled by the identification of disease genes, it is still painfully true that simply finding the gene does not directly lead to a cure on its own. Thus, both the identification of new disease-causing genetic alterations and studies that bring new mechanistic understanding of how a given mutation gives rise to disease are still core to the journal’s scope and aims.

The focus of the journal, as can be seen from this first editorial, was very much on human genetics at the beginning. Model organisms were considered just that, models for human biology. One of the major changes in the journal since that time has been our expansion to genetics (and genomics) more broadly, as represented by the many reference genomes and population genetics studies published for other organisms.

Too many landmarks to count

The editorial published in this month’s issue highlights a few selected articles from our among our more than 5,000 research publications over the years. These are obviously a restricted set of examples, and they are by no means the “best” papers, as such a ranking system would be ill-advised and ultimately useless. But the papers selected cover a wide range (though not all) of the sub-fields represented by the journal. This list includes landmark papers in human genome mapping (Kong et al. 2002) and cataloging of genetic variation (Iafrate et al. 2004); statistical methods that helped drive an entire field of research (Price et al. 2006); Mendelian disease gene discoveries that shed new light on biological mechanisms (Amir et al. 1999); key advances in the field of epigenetics (Heintzman et al. 2007); and advances in crop plant improvement (Ren et al. 2005).

We invite you to take a trip down memory lane and revisit these and other landmark papers from our archives. As a part of the celebration of 25 years of Nature Genetics, the editors will be blogging throughout April to highlight some of our past content.

A brief history of Nature Genetics

Nature Genetics was launched as the first of the Nature Research journals (if we ignore the very brief existence of Nature New Biology and Nature Physical Science in the early 1970s and the earlier version of Nature Biotechnology, Bio/Technology, published first in 1983).

While the history of genetics as field is by far more interesting than the history of a single journal, the occasion of our 25th anniversary has us thinking about our roots. For our 15th anniversary, founding editor Kevin Davies contributed a guest editorial telling the story of how Nature Genetics came about. I highly recommend that you check it out, if you haven’t seen it before.

Another feature of our 15th birthday celebration was the Question of the year. What would you do if the $1,000 genome were a reality today? To read the nearly 50 replies we received from leaders in the field, see the Question of the Year special here: https://go.nature.com/2mTMKBf.

The next 25 years

Just as researchers in 1992 would have been very unlikely able to predict the many breakthroughs that have occurred in genetics over the past 25 years, we have no idea where the next 25 years will take us. The goals will remain the same: to elucidate the mechanisms by which the genetic material produces the many phenotypic variations we see in nature and to identify the causes (and, more hopefully, cures) for human genetic disease.

That said, let’s take a stab at looking toward the future. What do you think will be the next major breakthrough in genetics? What will the field of genetics look like in another 25 years? Tell us below in the comments.

25 years from now, I hope to still be watching as geneticists make some of the greatest discoveries in biology. And I am confident that Nature Genetics will be there, playing its small role in announcing those discoveries to the world.

 

Learning every way to break a gene

From Fig. 1 in Majithia et al. Nature Genetics 2016

From Fig. 1 in Majithia et al. Nature Genetics 2016

Finding the genetic cause of a disease—a mutation or genetic variant—is a lot like looking for a needle in a haystack. Except in the case of exome sequencing, it’s not always clear what a needle even looks like.

When a clinician finds a protein-altering variant in a gene known to cause disease, it could be the cause of the patient’s disease…or it could be nothing. This is the definition of a variant of uncertain significance (VUS). VUS’s often stay unknown unless someone puts in the time and resources to functionally characterize the variant. For obvious reasons, functional characterization of each individual VUS of every gene implicated in human disease is not practical.

One way to determine which variants cause disease and which don’t is to look at the DNA of many healthy individuals. If a healthy person carries a variant, it is unlikely to cause disease. The Exome Aggregation Consortium (ExAC) is the largest such effort—over 60,000 people have contributed their exome sequences to ExAC and thus provided a unique resource for clinicians to de-prioritize specific VUS’s as causes of disease.

But what if your variant is too rare or isn’t in ExAC?

Amit Majithia, David Altshuler and colleagues from the Broad Institute of Harvard and MIT developed a different strategy, published last week in Nature Genetics. The authors chose a gene, PPARG, that can cause Mendelian lipodystrophy when mutated. Some variants in this gene can also increase the risk of developing type 2 diabetes. PPARG also has many VUS’s in the population.

To find out which variants are likely to be pathogenic, the authors constructed a library of all 9,595 possible protein-altering variants of PPARG and tested them in pools using a functional assay in human macrophages. Cell pools that showed a positive result were sequenced and the numbers of each variant were counted, allowing the authors to assign a function score to each variant. These scores were then used, along with known benign and pathogenic variants from the prior literature, to train a machine-learning algorithm that could then classify each variant as pathogenic or not. The classifier found 6 new likely pathogenic variants, which were then validated through additional tests.

Summary of strategy used by Majithia et al.

Summary of strategy used by Majithia et al.

The strategy used by Majithia et al. could potentially be used by other researchers to study other proteins implicated in disease. Although it is somewhat of a brute-force approach to test every possible variant, the use of a pooled functional assay and computational classifier increases both the efficiency and accuracy of the result. We asked Dr. Majithia to tell us a little more about this study.

Author Q&A:

Why did you choose to focus your study on PPARG

In the diabetes community PPARG is a very important and storied gene. It has been linked to both common type 2 diabetes and rare familial forms. PPARG is also the target of multiple FDA approved drugs to treat diabetes. So on one hand, PPARG has been studied in humans and in the lab for two decades. On the other hand, we showed in 2014 (Majithia et al. PNAS) that even though PPARG had been sequenced in humans for so many years, we had only scratched the surface of the possible missense mutations that people in the general populations carry. Most of these mutations were benign, some strongly increased diabetes risk, and it took laboratory experiments with each and every mutation to sort them out. So PPARG was the perfect test case for our prospective experimental approach to test all possible missense mutations: it is relevant to common and rare genetic disease, has mutations of known function we could use for validation of our method, and has many unknown mutations, i.e. variants of uncertain significance that need to be functionally characterized.

Do you think that a similar strategy could be employed for other genes with many variants of unknown significance in the population? What would be the major challenges for applying this strategy to other genes?

Absolutely. A major purpose of this study was to demonstrate proof-of-concept that other investigators and clinicians could utilize for VUS in other genes/diseases. In principle there are three challenges in applying a prospective experimental approach to generate a “lookup table” for missense VUS: 1) making every possible missense mutation 2) building an assay with the scale and throughput to study every missense mutation and 3) connecting the lab experiments to what happens in people (i.e. phenotypes)

  1. The mutation synthesis technique we use, which was pioneered by Tarjei Mikkelsen, applies to any gene. Other groups like Jay Shendure’s in Seattle have independently developed methods to mutate genes at scale and now companies like TWIST Biosciences offer high quality mutation libraries for any gene. This is no longer a barrier to entry.
  2. Genes have myriad functions and so building an appropriately scaled, high throughput readout is a gene by gene process. No single assay can test the function of every gene, but there are partially generalizable strategies that can be used to study certain classes of genes. The strategy we used for PPARG, combining reporter gene expression and FACS, could be deployed for any gene that activates transcription. In fact we are taking this approach with another diabetes relevant transcription factor, HNF1A.
  3. Establishing clinical relevance of saturation mutagenesis data is a critical step. In our study we set a criterion for our assay, that in order for us to be able to discriminate variants of “unknown” significance, we should be able to accurately discriminate variants of “known” significance. For PPARG we benefitted from decades of research that had resulted in a series of missense variants with known function and disease effect. Many genes do not have such “allelic series” but with the increasingly widespread use exome/genome sequencing our knowledge of allelic series for genes is rapidly growing.

From your perspective, was the most surprising aspect of the study?

To independently prove the findings from our “lookup table” our collaborators in Cambridge took a series of VUS from patients referred to their clinic and tested them in single variant assays that are the standard for PPARG functional testing. In one case, our “lookup table” had a major discrepancy from the single variant assay. Careful follow-up work by our Cambridge colleagues resolved that the single variant assay that has been used for decades was actually incorrect and the “lookup table” made the correct call. We were surprised that, in this case, our new, high throughput assay had higher fidelity with human biology and it introduces a degree of nuance into what we consider “gold standard” when discrepancies inevitably arise.

How do you see your findings impacting on future research?

We hope this study will have its largest impact as a proof-of-concept for other genetic diseases and VUS interpretation. Our method is capable of improvement. For example, it does not assess missense effects on gene splicing, but others in the community are developing complementary methods that will overcome this (for example, see this paper from Gregory Findlay et al. published in Nature). We are particularly excited that our lookup table approach opens the door to systematically characterizing drug-by-genotype interactions that could be useful to guide treatment.

 

August issue cover: What’s going on here?

Rhinopithecus bieti

Rhinopithecus bieti{credit}Yong-cheng Long{/credit}

This month’s cover image is inspired by the paper on page 947 reporting the reference genome sequence of the black snub-nosed monkey, the second snub-nosed monkey genome paper published in Nature Genetics. The golden snub-nosed monkey genome was published in 2014.

In their paper, Li Yu and colleagues present the de novo genome sequence assembly of Rhinopithecus bieti as well as whole genome resequencing of all four other snub-nosed monkey species. All five species are among the world’s most endangered primate species. Three species, R. bieti, R. roxellana and R. strykeri, live at very high altitudes—above 3,000 meters. R. bieti lives exclusively on the Yunnan and Tibetan plateaus. The other two species, R. brelichi and R. avunculus, inhabit lowland regions. The authors compared the genome sequences between these species to identify genomic regions showing evidence of positive selection that could be related to living at high altitudes.

The photograph on the cover image was taken by one of the study’s co-authors, Dr. Yong-Cheng Long, who was profiled by the Nature Conservancy for his work on conservation of R. bieti (also called the Yunnan golden monkey by the locals). We asked Dr. Long to tell us a little about the monkey shown in the picture.

“The monkey is [a] male, whose name is ‘Big Guy’, and he is feeding on some leaves,” he said by email. “The Big Guy used to have 4 wives (about 6 years ago) and now has only 2, as he is getting old and is not strong enough to hold all of them because the females are more likely to find a strong shoulder to cry on.”

Dr. Long said there are 57 R. bieti individuals in the habituated “Yunnan snuby” group, which is open to the public. Because many of the individuals in the area are fully habituated to human presence, it is not difficult to get photographs of them. The group is only a small portion of the largest natural monkey troop (approximately 1,000 in total) in the world. Dr. Long emphasized the impact that illegal poaching has had on the monkeys. “This species has been endangered by human’s killing, and the monkeys can certainly survive once the killing is stopped.” In China, 2016 is the Year of the Monkey, and it has turned out to also be a lucky year for these particular monkeys. “We found the monkey group has boomed,” said Dr. Long. “12 of the 57 are the infants born this year.”

monkey

Nature Genetics office mascot

The lead author of the study, Dr. Yu, became interested in studying these species because of his focus on conservation genetics of endangered mammals distributed in Yunnan Province, China. This is one of the core regions of biodiversity in the world. “The most notable among the endangered mammals distributed in Yunnan Province is R. bieti, which is found exclusively on Yunnan and Tibetan Plateau”, said Dr. Yu by email. “It is unique in that it is the only primate having a red mouth like most humans, which [is why it’s called] one of the most beautiful animals.” Dr. Yu also noted that it is the highest altitude-dwelling nonhuman primate. It can survive in very cold and hypoxic environments that other primates cannot tolerate. “So, I was deeply attracted by this mysterious and interesting species, and was eager to come to understand it.”

 

IMG_1863We at Nature Genetics are also celebrating the Chinese Year of the Monkey. Our office mascot is this golden snub-nosed monkey (right), which was produced for marketing purposes in China (I snagged one during a recent visit to the Shanghai office). Scanning a barcode on the monkey’s rear end (left) will take you to the publication of the R. roxellana (golden snub-nosed monkey) genome paper.

 

 

May issue cover: What’s going on here?

May2016This month’s cover image is inspired by the Article on p. 528 of this issue, by Jeff Wall, Nicola Illing, Nadav Ahituv and colleagues. The paper reports the genome of the bat Miniopterus natalensis and transcriptional dynamics in the developing bat wing. This species, one of a group known as vesper bats, is also known as the Natal long-fingered bat and is found in parts of Africa.

The image chosen for the cover is a frontal view of a bat embryo at a late stage of development (stage CS21) taken by study co-author Mandy Mason. This developmental stage is known as
“Translucent Wing”, as you can clearly see the skeletal structures in the wing and the membrane between the outstretched digits. The embryo in this image was stained with Alizarin red (maroon-red-pink) for bone and Alcian blue (blue-cyan) for cartilage. The image was actually taken as part of an earlier study to understand the progression of limb development in this species and to compare it with that of the mouse.

The current study presents not only the genome sequence of the Natal long-fingered bat, but also RNA-seq and ChIP-seq (for H3K27ac and H3K27me3) profiling of the developing limbs. The authors identified more than 7,000 genes that were differentially expressed between the forelimbs—the eventual wings—and the hindlimbs. Through comparative genomics analyses, they found nearly 3,000 regions showing evidence of accelerated evolution along the bat lineage that overlapped with H3K27ac peaks, suggesting that these are candidate enhancer regions for wing development. “This study offers a comprehensive resource for future work in comparative limb development,” co-author Mandy Mason told us. “Aside from the results that we have presented in this paper, these open datasets can be queried to help answer additional questions that may be asked by both our and other research groups.”

 

Ancient regulatory logic

Yao et al. found that certain brain enhancers were functionally conserved between mice (left) and acorn worm (right), despite very limited sequence conservation.

Yao et al. found that certain brain enhancers were functionally conserved between mice (left) and acorn worm (right), despite very limited sequence conservation. {credit}Douglas Epstein{/credit}

A study published this week in Nature Genetics shows that enhancers can be conserved across very long evolutionary distances, even without extensive sequence conservation. Continue reading

The many ways MYB drives cancer

Two papers published online this week in Nature Genetics demonstrate that MYB, long known as a cancer gene, has many different strategies for driving tumorigenesis.

A positive feedback loop drives MYB overexpression in ACC

A positive feedback loop drives MYB overexpression in ACC{credit}Drier et al. Nat. Genet. 2016{/credit}

Bradley Bernstein, Birgit Knoechel and colleagues studied the role of MYB translocations in adenoid cystic carcinoma (ACC) and found that MYB translocations can reposition the gene to be driven by super-enhancers—which themselves are bound by MYB to drive its own expression even higher. In an interesting twist, they also found that MYB drives different regulatory programs in different ACC cell lineages: MYB’s oncogenic function is mediated by TP63 in myoepithelial cells, while in luminal epithelial cells, MYB appears to act through the Notch signaling pathway.

In an independent study focused on pediatric angiocentric gliomas, Keith Ligon, Rameen Beroukhim, Adam Resnick and colleagues found that MYB translocations resulting in MYB-QKI fusion genes are the most common MYB alteration in this cancer type. The fusion results in higher expression of MYB and loss of QKI expression, both of which contribute to the development of these gliomas. As in the ACC study, this translocation resulted in repositioning of MYB near enhancers that help drive its expression up. At the same time, the translocation caused loss of some regulatory elements, also leading to aberrant expression of MYB, and loss of function of QKI, a tumor suppressor. Thus, MYB-QKI uses three different mechanisms to drive gliomagenesis.

MYB-QKI promotes tumorigenesis through 3 mechanisms

MYB-QKI promotes tumorigenesis through 3 mechanisms{credit}Bandopadhayay et al. Nat. Genet. 2016{/credit}

Angiocentric glioma.

Angiocentric glioma. Angiocentric gliomas are characterized by cells that typically grow around blood vessels. {credit}Shakti Ramkissoon {/credit}

Both cancer types are relatively rare but aggressive, and new treatment options are sorely needed. Adenoid cystic carcinoma (ACC) occurs in secretory glands, mainly the salivary glands in the head and neck, and can spread to the nerves as well as metastasizing to distant sites, such as the lungs. The tumors are often resistant to therapy and can recur many years after the primary tumor has been removed surgically. Angiocentric gliomas are very rare brain tumors that generally affect children and young adults. Very little is known about the genetic changes that occur in this tumor type and, prior to this study, there were no known recurrent driver mutations, which are often good candidates for new targeted drug therapies. “The discovery of a recurrent rearrangement in angiocentric glioma provides a clinically relevant diagnostic marker, and insights into the biology that drives these tumors,” said Pratiti Bandopadhayay, one of the lead authors of the study.

We asked some of the authors from both studies to tell us a little more about the work and why it is important. Yotam Drier and Birgit Knoechel talked to us about the study in ACC. Pratiti Bandopadhayay, Lori Ramkissoon, Guillaume Bergthold and Payal Jain talked to us about the study in angiocentric gliomas.

How do your findings clarify earlier results showing a role for MYB in ACC? Do you think these findings are relevant for other cancer types?

Yotam Drier and Birgit Knoechel (Broad Institute):

Our work identified a unifying mechanism for MYB over-expression in ACC. Persson et al. suggested in 2009 that MYB over-expression occurs where the MYB 3′ untranslated region (UTR) is lost. However, in most cases of ACC the MYB 3′ UTR remains intact, and we now describe that in all cases of detected MYB rearrangements in this cancer–independent of whether the 3′ UTR is retained or lost–MYB is being driven by hijacking MYB bound super-enhancers, thus creating a positive feedback loop. This is complementary to the previous model, and we believe that in those cases where the MYB 3′ UTR is lost, both mechanisms would contribute to increased MYB expression.

We believe that similar rearrangements involving enhancer translocations may contribute to MYB overexpression in other cancer types. For example, our colleagues at Dana Farber simultaneously report a similar mechanism of MYB activation in angiocentric gliomas.

 

How do the mechanisms described in your paper compare to what is described in the related paper by Drier et al.?

Pratiti Bandopadhayay, Lori Ramkissoon and Guillaume Bergthold (Dana-Farber Cancer Institute) and Payal Jain (Children’s Hospital of Philadelphia):

We were excited to learn about the findings from the Bernstein group as their findings compliment ours, in a completely different tumor type. We found that angiocentric gliomas harbor rearrangements involving the MYB and QKI genes, while Dr. Bernstein’s team focused on adenoid cystic carcinomas, which frequently have similar MYB rearrangements. Both papers show that MYB rearrangements result in aberrant activation of the MYB promoter to drive expression of the oncogenic fusion proteins, and that these fusion proteins then participate in auto-regulatory feedback loops to drive their own expression.

 

From your perspective, what was the most unexpected finding in this study?

Yotam Drier and Birgit Knoechel:

We were surprised by our finding that MYB orchestrates 2 opposing epigenetic states—a TP63-dependent program in myoepithelial cells and a NOTCH-dependent program in luminal cells. Thus, overexpression of a single transcription factor can drive distinct epigenetic states that depend on the cellular context in which the overexpression occurs.

Pratiti Bandopadhayay, Lori Ramkissoon, Guillaume Bergthold and Payal Jain:

The unexpected result of our study that we find very exciting is that this one single driver rearrangement contributes to tumor growth through multiple mechanisms. MYB-QKI rearrangements simultaneously drive expression of a fusion protein that causes cells to grow faster and form tumors, it changes the regulatory landscape of the gene to promote expression of this protein and it simultaneously disrupts a tumor suppressor gene (QKI) that in turn also makes the cells divide faster.  We feel that this finding is likely relevant to a number of other pediatric and adult cancers.

How does the fusion with QKI impact the function of the translocated MYB and do you think it is necessary for its role in driving gliomagenesis?

Pratiti Bandopadhayay, Lori Ramkissoon, Guillaume Bergthold and Payal Jain:

The rearrangement with QKI results in displacement of regulatory elements on QKI towards MYB and these elements help drive expression of MYB-QKI. In addition, it disrupts the function of QKI itself, which is a tumor suppressor gene.  We feel that the association with QKI is important in angiocentric glioma since the rearrangement between MYB and QKI occurred with such high frequency in our study.

 

What are the additional steps needed before your findings can be implemented in the clinic?

Yotam Drier and Birgit Knoechel:

Interestingly, while BET inhibition can slow tumor growth in low grade ACCs, high grade ACCs often show genetic activation of NOTCH and are thus amenable to treatment with gamma secretase inhibitors or other NOTCH targeting therapies. It will be important to evaluate whether combining BET inhibition with NOTCH inhibition may show additional effects over BET inhibition alone. It is conceivable that by adding the NOTCH inhibitor one might preferentially target the luminal epithelial cells which are characterized by a NOTCH driven regulatory program. This will need to be tested further in preclinical models. Moreover, the fact that grade 3 tumors failed to respond to BET inhibition requires further preclinical analyses. Identifying mechanisms of BET inhibitor failure which are just entering clinical trials will be of utmost importance in order to predict which patients may benefit from these.

Pratiti Bandopadhayay, Lori Ramkissoon, Guillaume Bergthold and Payal Jain:

We are excited that our results provide us with novel possibilities to treat angiocentric gliomas. As MYB is a transcription factor the likelihood of targeting it or the MYB-QKI fusion is challenging; however we identified several downstream targets that represent potential therapeutic strategies. In addition, the finding of altered regulatory elements represents another exciting therapeutic strategy. Our findings directly impact clinical care for children with angiocentric glioma through development of two diagnostic tests that will be used to support the diagnosis of angiocentric glioma.  We also feel our findings are likely relevant to other pediatric and adult cancers that are driven by driver rearrangements.

Finally we would like to highlight that multiple institutions and funding sources helped facilitate this study. We would also like to acknowledge the families whose children have been afflicted with Pediatric Low-Grade Glioma.

What makes a parasite?

Stronglyoides worm

Genetic clues to what makes parasitic worms different from free-living worms are reported in a paper published online this week in Nature Genetics. Groups led by Mark Viney, Matthew Berriman and Taisei Kikuchi carried out the sequencing and assembly of genomes from six nematode species from the clade that includes the human parasitic roundworm Strongyloides stercoralis. We asked one of the authors, Professor Mark Viney of the University of Bristol, to tell us a little bit about the study.

Although the genomes of several parasitic worm species have been published to date, Strongyloides represents a unique opportunity to learn some of the general rules of being a parasitic worm. According to Mark Viney, “what makes Strongyloides so special is that this clade contains parasites, facultative parasites and free-living species that are all close relatives. This gives us real power to our analysis.  Our work will be used by the international research community who work on these globally important parasites of people and other animals.”

S. stercoralis infects approximately 30-100 million people worldwide and causes a wide range of symptoms. Closely related species in the clade Strongyloides include both free-living and parasitic species that infect a wide range of hosts. In parasitic species, generations alternate between parasitic and free-living, resulting in genetically identical females with starkly different lifestyles.

The authors first compared the genomes of free-living and parasitic species to identify genes specific to the parasites. They found that acquisition of 1,075 gene families was associated with the evolution of parasitism and parasitism was associated with greater expansion of genes and gene families overall.

When asked what the most unexpected aspect of the study was, Professor Viney said, I think the really surprising thing that we found was just how largely expanded some gene families were in the parasitic species. This is quite unprecedented in the nematodes.” The authors also found that most parasitism-related genes were located in genomic clusters. “The important thing about these clusters is that nothing like this has ever been seen before in parasitic worms and it certainly speaks to the possible importance of these in their evolution of parasitism,” said Professor Viney.

 

The life cycle of the 6 sequenced species and the gene gains and losses in each lineage.

The life cycle of the 6 sequenced species and the gene gains and losses in each lineage. {credit}Hunt et al. Nat. Genet. 2016{/credit}

Two gene families were especially expanded in parasitic genomes—those encoding SCP/TAPS and astacin-domain proteins—and based on RNA-sequencing studies, these were also much more highly expressed in parasitic females than free-living females of the same species. This suggests that these gene families in particular are important for the ability of the worm to infect its host. In support of this hypothesis, the authors found that proteins from these two families are secreted by the worms, and would therefore be able to interact with host tissues to aid in invasion and migration.

Asked about the next steps that need to be taken for these findings, Mark Viney said, “For these SCP/TAPS coding genes what we really need to do is to find out what these genes are doing—this is completely unknown at the moment. For the astacins we can probably guess what they do—being involved in digesting host tissue so that the parasites can feed. They might be potential drug targets.”

The study brought together groups from the UK, Japan, Taiwan, Germany, USA, Mexico and Australia and is one of many examples of successful collaboration in science. “The field of parasitology is a very friendly and interactive community,” said Professor Viney, “so this collaboration was very easy to bring together, and worked extremely well—and will do in the future as well.”

 

To learn more about this study, check out this blog post from one of the co-first authors, Adam Reid, at the Wellcome Trust Sanger Institute. More coverage can also be found at the University of Bristol website.

 

Reference:

Hunt, V.L., Tsai I.J., Coghlan, A., Reid, A.J., et al. The genomic basis of parasitism in the Strongyloides clade of nematodes. Nat. Genet. (doi: 10.1038/ng.3495, 1 February 2016)

The paper is available for free online: https://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3495.html

 

 

Biting into the pineapple genome

"Pineapple and cross section" by Taken byfir0002 | flagstaffotos.com.auCanon 20D + Sigma 150mm f/2.8 - Own work. Licensed under GFDL 1.2 via Commons - https://commons.wikimedia.org/wiki/File:Pineapple_and_cross_section.jpg#/media/File:Pineapple_and_cross_section.jpg

“Pineapple and cross section” by Taken byfir0002 | flagstaffotos.com.auCanon 20D + Sigma 150mm f/2.8 – Own work. Licensed under GFDL 1.2 via Commons

The genome sequences of cultivated pineapple (Ananas comosus) and a related wild species (Ananas bracteatus) were published last week by Ming et al. in Nature Genetics. The genome has already led to insights into monocot evolution and CAM photosynthesis. In the future, studies that use the pineapple genome have the potential to lead to innovations in engineering drought resistant crops.

Every species, plant, animal or microorganism, that is sequenced is a useful resource for the research community. But each time a new genome is sequenced, we ask “what is really new about this one” and “what are we learning about biology”?  Pineapple is of course a delicious and economically important crop, but what makes its genome special?

There are a number of important aspects of pineapple biology that make it an important genome to sequence. First, pineapple uses a metabolic strategy known as crassulacean acid metabolism (CAM). CAM allows the plant to conserve water, making it more resistant to drought. Only one other CAM plant has had its genome sequenced, the orchid Phalaenopsis equestris.

NG-NV42149 Liu_Figure2

{credit}Zhong-Jian Liu, National Orchid Conservation Center of China {/credit}

Another reason to study the pineapple’s genome is to understand how self-incompatibility has evolved in monocotyledon plants. Wild pineapple species are self-compatible, but cultivated pineapples are not. As a result, cultivated pineapple is highly heterozygous. This aspect of pineapple biology also makes sequencing its genome technically challenging. Fortunately, the authors of the study devised a way around this potential problem to generate an extremely high-quality genome assembly (see the image on the right, courtesy of Zhong-Jian Liu, who was not affiliated with the study. Click for a larger view).

One of the most interesting aspects of the pineapple genome was only discovered after the genome was assembled. As the study’s authors found, pineapple has conserved the order of genes on its chromosomes more so than any other monocot studied to date. This high degree of synteny with the hypothetical ancestral monocot makes pineapple an ideal outgroup for comparative evolutionary studies involving other monocot species, such as grasses.

We spoke to the lead author of the study, Ray Ming, to learn a little more about how the study was conducted.

The genomes of many plants have been sequenced, or are in the process of being sequenced. Why did you decided to focus on pineapple?

I started my career at the Hawaii Agriculture Research Center and have been working on genomics of Hawaiian crops, including papaya, pineapple, sugarcane, and coffee.  We sequenced the papaya genome first.  It is a logical choice to sequence the smallest genome of the remaining three next. In addition, pineapple is the most economically important CAM plant crop, the second most important tropical fruit, is self-incompatible, and prone to somatic mutations.

How was the idea arrived at to use hybrids (the F153 x CB5 F1 cross) to overcome issues of high heterozygosity in the assembly process? Was this the initial plan, or were there other ideas as well?

We anticipated the difficulty of assembling the heterozygous pineapple genome.  Before we started the genome project, I discussed this issue with co-author John Bowers during the International Plant and Animal Genome Conference in San Diego, and John was the one who came up with the idea to sequence an F1 individual at deep coverage to have a single molecule from each parent for phasing to improve the assembly of the reference genome F153. Co-author Michael Schatz implemented this strategy, and also designed sophisticated approaches to improve the assembly of this heterozygous genome as detailed in the method section. Mike’s team did an outstanding job to produce a high quality assembly of this highly heterozygous genome. Mike is a pioneer and a leading scientist in assembling complicated and complex plant genomes.

We also tried to sequence the genome from single sperm cell to generate haploid genome sequences, but it wasn’t successful.  The long reads from Moleculo and PacBio improved the genome assembly, and the ultra-high density map of re-sequencing F1 individual genomes substantially improved the quality of the genome assembly and corrected 199 chimeric scaffolds.

Did you expect to see such high levels of conservation of synteny with ancestral monocots in the pineapple?

No. It was a surprise, but it makes sense since pineapple is self-incompatible and vegetatively propagated, hence having fewer generations of sexual reproduction in its evolutionary history.

How do you envision others using the pineapple genome sequence in their research?

The pineapple genome will be used for CAM photosynthesis research as a model system, and it will be used as a reference genome or even the reference genome for comparative genomics in monocots.

800px-PapayaBonus question: What is your favorite fruit?

Pineapple for its extraordinary flavor and aroma, and papaya for its number 1 nutritional value among fruits, and for its flavor.