Sea lamprey genomics

sea lamprey

Jeramiah Smith

The sea lamprey (Petromyzon marinus) is an important model in evolutionary biology. It was discovered in 2009 (https://www.pnas.org/content/106/27/11212.long) that the genome of the sea lamprey undergoes extensive programmed genome rearrangement during development, where ~0.5 Gb (around 20%) of DNA is eliminated from the genome. The somatic tissues contain smaller genomes and only the germ cells retain the full complement of genetic material. The genome of the sea lamprey had been sequenced previously from the blood and liver, so only the somatic genome has been thoroughly characterized (https://www.nature.com/articles/ng.2568).

Smith et al., Nature Genetics, 2018

Smith et al., Nature Genetics, 2018

In a paper published this week in Nature Genetics, Jeramiah Smith and colleagues report the germline genome sequence of the sea lamprey.  Using a combination of shot-gun and long-read sequencing integrated with scaffolding data and a meiotic map, the authors assembled a high-quality genome with near-chromosome level of contiguity. This allowed them to identify hundreds of genes that were systematically eliminated from the genome during development. Comparative analysis showed that mouse homologues of these genes are often marked by repressive complexes, indicating parallel strategies for programmed development.

We spoke with lead author Jeramiah Smith from the University of Kentucky to get some background on this research:

  • What inspired you to sequence the germline of the sea lamprey?

I have worked with lamprey for years. I originally got involved with lamprey because it holds a special place in the vertebrate tree of life that shed light on the common ancestor of all vertebrates. That was the motivation for the first lamprey genome project, which sequenced DNA from blood and liver cells.  Once we started working with lamprey we found out that the genome was much more complex than we ever anticipated. This included the fact that the genome changes its sequence content in a reproducible manner over the course of its normal development: something we call programmed genome rearrangement. The amount of DNA that is eliminated from sea lamprey is more than is present in some entire fish genomes, roughly half a billion bases. For me, this finding was the major inspiration behind sequencing the germline genome.

 

  • What do you think were the most surprising or interesting findings to come out of the sequencing?

There were quite a few, but the strong overlap between programmed genome rearrangement and Polycomb-mediated silencing was near the top. The other was the rather strong evidence suggesting the some chromosomes, including chromosomes carrying the HOX genes, appear to have duplicated rather recently and seemingly independently from the rest of the genome. It’s a really strange genome.

 

  • Can you comment on programmed genetic elimination as a developmental strategy versus Polycomb-mediated silencing? 

Polycomb-mediated silencing arose deep in our evolutionary history, and is even present in unicellular organisms. We know that lamprey possesses human homologs of all Polycomb genes, but also uses programmed elimination. The difference between programmed elimination and other mechanisms of gene silencing is that programmed elimination is essentially irreversible, given that the DNA is physically removed. This means that the genes can never be expressed after an embryonic cell lineage has undergone elimination. Other silencing mechanisms are generally reversible, meaning that gene expression can be reactivated. In some cases reactivation is important. For example, in the context of development and regeneration. But in other cases activation of genes in the wrong tissue can case diseases, such as cancer. Lamprey seems to know which genes should never be reactivated outside of the germline.

 

  • What is the most challenging part about working with sea lamprey?

The Genome! Aside from undergoing complex changes during development it also contains a large amount of repetitive DNA and a lot of sequence polymorphism. These features present substantial challenges for assembly and downstream analyses, but we’ve found that they can also be useful tools. We’ve used the abundance of sequence polymorphisms as a tool for mapping genes in lamprey and we now think that some classes of repeats are going to be critical for our future work aimed at figuring out how eliminated DNA is identified and packaged in the early embryos. Lampreys also only breed once a year and take from 5 to maybe 20 years to mature, this makes some experiments impossible, but lamprey researchers are very creative and the community has figured out how to get a lot done in this system.

  • What organisms would you like to see sequenced in the future to help resolve the evolutionary relationships of vertebrates?

There are so many! Hagfish are going to be critical. They are another deep lineage that provides important perspective on vertebrate evolution and also happen to undergo programmed DNA elimination. There are also two other deep lamprey lineages that I also think will be important. Those species live in the southern hemisphere and diverged from sea lamprey around 300 million years ago, as opposed to the roughly 600 million year divergence between lampreys and other vertebrates. A lot of evolution can happen over 600 million years and these species should help bridge that gap. Salamanders and other amphibians are also going to fill important gaps and teach us a lot about the way vertebrate genomes evolve and function. It also seems certain that new sequencing technologies are also going to give us better genomes for other important species that have already been sequenced (e.g. amphioxus, sharks and shark relatives, and even sea lamprey). Finally, I think the zebrafinch germline genome will also be really interesting. They seem to have recently evolved something similar to lamprey’s programmed eliminations, and have a chromosome that’s unique to their germline. I’d really like to know what’s on that chromosome.

A CRISPR screen for HIV targets

A new study published online this week in Nature Genetics reports the discovery of novel host targets of HIV infection identified from a high-throughput CRISPR/Cas9-based screen. This screen was performed in CD4 + T-cells and was designed to find candidate genes required for successful HIV infection, but whose inactivation did not affect cell viability. In this way, potential drug targets for anti-HIV therapy could be discovered.

Park et al., Nature Genetics 2016

Park et al., Nature Genetics 2016

Park et al., Nature Genetics 2016

Park et al., Nature Genetics 2016

 

The authors found two known (CCR5 and CD4) and three novel (ALCAM, SLC35B2 and TPST2) cellular factors that, upon abrogation, prevented HIV infection but did not have any negative effects on the cell itself. These new genes are involved in sulfation and cell aggregation pathways and represent candidate targets for interventional HIV therapy.

We spoke with first author Ryan Park to get some background on this research:

 Previous screens for host factors affecting HIV pathogenesis found a high number of hits, with low reproducibility across screens.  With your CRISPR/Cas9 approach, were you expecting similar results? Did the low number of hits in your screen surprise you?

We designed our screen stringently, as the existing literature has not been clear on what genes would potentially serve as good targets for host-directed anti-HIV therapies. Our goal was thus to identify these host factors with high confidence while maintaining an unbiased approach. The very low number of hits was certainly surprising, though, as you note, the limited overlap among the previous screens raised the suspicion of a high false positive rate and/or low reproducibility.

You find three novel genes that are dispensable for cell viability but that are needed for successful HIV infection.  Do you think that there could be natural polymorphisms in these genes in human populations that might mitigate susceptibility to HIV entry and transmission?

In the Exome Aggregation Consortium (ExAC) dataset recently published in Nature, there are individuals with truncations and/or homozygous mis-sense mutations in each of the three genes, as well as ITGAL (the loss of which we find is protective against HIV infection in primary CD4+ T cells). More work remains to be done to determine whether these individuals are relatively less susceptible to HIV infection.

Due to the high mutation rate of HIV and the emergence of resistance to drug therapies, potential targeting of host factors can be a useful strategy.  Do you envision these findings being utilized to develop novel anti-HIV therapies?

Host-targeted HIV therapies are of great interest for multiple reasons. Firstly, as you note, the emergence of drug-resistant HIV strains remains a major issue, particularly in settings where adherence to a daily antiretroviral regimen is challenging. Drug-resistant strains are less likely to emerge in the face of incomplete adherence to host-targeted therapies. Secondly, the identification of host factors may also serve as a basis for gene therapies (in which gene editing is used to produce a population of HIV-resistant target cells) that could result in a permanent HIV cure. As noted above, more work remains to be done to determine whether inactivation of these genes protects against HIV infection at the organismal level without causing detrimental effects.

How might this screen be adapted to find host factors important at other stages of the HIV life cycle and do you have future plans to explore such work?

Our screen captured all but the latest stages of the HIV life cycle (particularly virion assembly, budding, and maturation); this is because HIV Tat, which drives the GFP reporter in our cell line model, is expressed prior to these steps. Development of an alternative reporter system that is activated by virion budding or maturation would allow identification of host factors involved only at these late stages. Because completion of the HIV life cycle is not required for host cell killing by HIV, cells lacking these late-acting host factors may still not be captured in a screen; more importantly, these late-acting host factors may therefore not be attractive therapeutic targets.

Can this screening method be employed to find host factors important for infection by other viruses?  Do you speculate that there would be viruses for which a large number of non-essential host factors would be identified as important for infection?

The key elements of our approach, which include identification of a physiologically relevant cell line and the use of a high-complexity genome-wide sgRNA library, can be readily generalized to identify host factors that are critical to the propagation of any viral pathogen yet dispensable for cell viability. Our findings suggest that the number of non-essential host factors that are critical for HIV infection is quite limited, and that many candidate host factors identified by other screens or targeted studies may not be required for HIV infection or may compromise cell viability. Whether this is the case for other viruses is hard to know, but we have demonstrated that our approach can be quite powerful and specific in identifying the range of potential host targets with high confidence.

 

Ubiquitin, keratin and skin fragility

Lin et al. Nature Genetics, 2016

Lin et al. Nature Genetics, 2016

Protein degradation is a highly coordinated process with multiple levels of regulation, including both targeted and autodegradation.  This sophisticated cascade of protein turnover must be precisely balanced to maintain proper physiological function. A recent article published in Nature Genetics reports the discovery of gene with protein-truncating mutations that lead to the skin condition epidermolysis bullosa, which is characterized by tendency to blister, itching and other abnormalities. The authors found 5 patients all with start codon mutations in the KLHL24 gene, which encodes Kelch-like protein 24, a substrate receptor of the cullin 3 (CUL3)–RBX1–KLHL24 ubiquitin ligase complex.

Lin et al., Nature Genetics, 2016

Lin et al., Nature Genetics, 2016

The mutant proteins from these patients were found to be stabilized, with increased levels in patient samples, leading the authors to hypothesize that KLHL24 may target a substrate that is important for the structural integrity of the skin.  Indeed, through mass spectrometry and biochemical analysis, they identify keratin 14 (KRT14) as a KLHL24 substrate, and find that KRT14 levels are decreased in patient samples. Keratin 14 is an intermediate filament component important for maintaining keratinocyte integrity and mutations in the gene are found in some epidermolysis bullosa patients. The authors further show that KLHL24 is autoubiquitinated and that the truncated mutant has reduced levels of autoubiquitination, stabilizing the protein. This increased KLHL24 stability leads to increased KRT14 degradation, resulting in the skin fragility phenotype observed in the patients.

Lin et al., Nature Genetics, 2016

Lin et al., Nature Genetics, 2016

 

 

 

 

 

 

 

 

 

 

 

Although dynamic regulation of keratins by the ubiquitin–proteasome system had been proposed, no targeting E3 ligases had been identified. This work established KLHL24 as a keratin-targeting E3 ligase.

 

We spoke with authors Dr. Xu Tan and Dr. Yong Yang to get some background on their research.

Can you briefly describe how you found the KLHL24 mutations in these different patients?

The first three epidermolysis bullosa (EB) patients were first screened for the 18 previously known causative genes but no mutations were found. Then we performed whole exome sequencing and pinned down only one common variant gene among all three patients, namely KLHL24. We then acquired samples from two additional patients without mutations in the 18 known causative genes and used Sanger sequencing to show that both of them also have the mutations in the same KLHL24 gene, confirming that this is a new causative gene of EB.

All the patients you studied had start codon mutations leading to truncations in the protein. This must have been intriguing. What where your initial thoughts about this finding?

We were shocked. The first thought was that these must be gain-of-function mutations, unlike all the other EB mutations, which are loss-of-function mutations that can occur all over the places.

 

You very nicely demonstrate a model whereby Keratin 14 is an ubiquitination substrate of KLHL24, and that the truncated mutant is stabilized, thus leading to greater Keratin 14 degradation and the skin fragility phenotype. Can you walk us through how you teased apart this model? What do you consider the key piece of evidence that supports this model?

We used an unbiased “pull down + mass spectrometry” method to look for the binding proteins to the substrate binding domain of KLHL24 and Keratin 14 was the only one we found that specifically binds KLHL24 but not a carefully designed mutant that is predicted structurally to lose the substrate binding capacity. We immediately verified the binding and also showed that knocking down/overexpressing KLHL24 can increase/decrease Keratin 14 levels. A key piece of evidence is that transfection of KLHL24 in cell lines can boost Keratin 14 ubiquitination. Afterwards, we obtained two important pieces of in vivo evidence to show the anti-correlation of KLHL24 level and Keratin 14 level (in human skin samples and a knock-in mouse model), nicely confirming that Keratin 14 is a ubiquitination substrate of KLHL24.

 

You make a knock-in mouse, which recapitulates the decreased Keratin 14 levels similar to what is seen in patients, but not the skin fragility phenotype. Can you comment on why this might be so?

Many differences exist between human and mouse skin, the most obvious is the presence of fur in the mouse skin, which might afford better mechanic support of the epidermis than that in the human skin. In addition, there is actually a small but significant difference between the degrees of Keratin 14 decrease in patients and the mouse model (~70% decrease in patients vs. ~50% decrease in mice). Previously mouse models having ~50% decrease of Keratin 14 (the Krt14+/- mouse model) also did not show skin fragility. We don’t yet know the reason for the differential decrease of levels in human and mouse skin but are working on finding out the answers.

 

Do your findings have any potential implications for novel therapies for epidermolysis bullosa?

Absolutely, as I mentioned these are the first gain-of-function mutations found for EB, which should be easier to target therapeutically than loss-of-function mutations. Inhibiting KLHL24 in patients that we identified with these types of mutations should be able to effectively treat the conditions. We are now actively working on finding a specific KLHL24 inhibitor. In addition, because KLHL24 is a negative regulator of Keratin 14, other EB patients with partial loss-of-function mutations of Keratin 14 could also be helped by treatment with a KLHL24 inhibitor. In general, drug development targeting the ubiquitin-proteasome pathway has been given high hopes but it is not very obvious how to target the pathway specifically. Our studies provide a good example showing the importance of autoubiquitination of an E3 ligase, which might suggest previously over-looked strategies to target E3s.

 

Learning every way to break a gene

From Fig. 1 in Majithia et al. Nature Genetics 2016

From Fig. 1 in Majithia et al. Nature Genetics 2016

Finding the genetic cause of a disease—a mutation or genetic variant—is a lot like looking for a needle in a haystack. Except in the case of exome sequencing, it’s not always clear what a needle even looks like.

When a clinician finds a protein-altering variant in a gene known to cause disease, it could be the cause of the patient’s disease…or it could be nothing. This is the definition of a variant of uncertain significance (VUS). VUS’s often stay unknown unless someone puts in the time and resources to functionally characterize the variant. For obvious reasons, functional characterization of each individual VUS of every gene implicated in human disease is not practical.

One way to determine which variants cause disease and which don’t is to look at the DNA of many healthy individuals. If a healthy person carries a variant, it is unlikely to cause disease. The Exome Aggregation Consortium (ExAC) is the largest such effort—over 60,000 people have contributed their exome sequences to ExAC and thus provided a unique resource for clinicians to de-prioritize specific VUS’s as causes of disease.

But what if your variant is too rare or isn’t in ExAC?

Amit Majithia, David Altshuler and colleagues from the Broad Institute of Harvard and MIT developed a different strategy, published last week in Nature Genetics. The authors chose a gene, PPARG, that can cause Mendelian lipodystrophy when mutated. Some variants in this gene can also increase the risk of developing type 2 diabetes. PPARG also has many VUS’s in the population.

To find out which variants are likely to be pathogenic, the authors constructed a library of all 9,595 possible protein-altering variants of PPARG and tested them in pools using a functional assay in human macrophages. Cell pools that showed a positive result were sequenced and the numbers of each variant were counted, allowing the authors to assign a function score to each variant. These scores were then used, along with known benign and pathogenic variants from the prior literature, to train a machine-learning algorithm that could then classify each variant as pathogenic or not. The classifier found 6 new likely pathogenic variants, which were then validated through additional tests.

Summary of strategy used by Majithia et al.

Summary of strategy used by Majithia et al.

The strategy used by Majithia et al. could potentially be used by other researchers to study other proteins implicated in disease. Although it is somewhat of a brute-force approach to test every possible variant, the use of a pooled functional assay and computational classifier increases both the efficiency and accuracy of the result. We asked Dr. Majithia to tell us a little more about this study.

Author Q&A:

Why did you choose to focus your study on PPARG

In the diabetes community PPARG is a very important and storied gene. It has been linked to both common type 2 diabetes and rare familial forms. PPARG is also the target of multiple FDA approved drugs to treat diabetes. So on one hand, PPARG has been studied in humans and in the lab for two decades. On the other hand, we showed in 2014 (Majithia et al. PNAS) that even though PPARG had been sequenced in humans for so many years, we had only scratched the surface of the possible missense mutations that people in the general populations carry. Most of these mutations were benign, some strongly increased diabetes risk, and it took laboratory experiments with each and every mutation to sort them out. So PPARG was the perfect test case for our prospective experimental approach to test all possible missense mutations: it is relevant to common and rare genetic disease, has mutations of known function we could use for validation of our method, and has many unknown mutations, i.e. variants of uncertain significance that need to be functionally characterized.

Do you think that a similar strategy could be employed for other genes with many variants of unknown significance in the population? What would be the major challenges for applying this strategy to other genes?

Absolutely. A major purpose of this study was to demonstrate proof-of-concept that other investigators and clinicians could utilize for VUS in other genes/diseases. In principle there are three challenges in applying a prospective experimental approach to generate a “lookup table” for missense VUS: 1) making every possible missense mutation 2) building an assay with the scale and throughput to study every missense mutation and 3) connecting the lab experiments to what happens in people (i.e. phenotypes)

  1. The mutation synthesis technique we use, which was pioneered by Tarjei Mikkelsen, applies to any gene. Other groups like Jay Shendure’s in Seattle have independently developed methods to mutate genes at scale and now companies like TWIST Biosciences offer high quality mutation libraries for any gene. This is no longer a barrier to entry.
  2. Genes have myriad functions and so building an appropriately scaled, high throughput readout is a gene by gene process. No single assay can test the function of every gene, but there are partially generalizable strategies that can be used to study certain classes of genes. The strategy we used for PPARG, combining reporter gene expression and FACS, could be deployed for any gene that activates transcription. In fact we are taking this approach with another diabetes relevant transcription factor, HNF1A.
  3. Establishing clinical relevance of saturation mutagenesis data is a critical step. In our study we set a criterion for our assay, that in order for us to be able to discriminate variants of “unknown” significance, we should be able to accurately discriminate variants of “known” significance. For PPARG we benefitted from decades of research that had resulted in a series of missense variants with known function and disease effect. Many genes do not have such “allelic series” but with the increasingly widespread use exome/genome sequencing our knowledge of allelic series for genes is rapidly growing.

From your perspective, was the most surprising aspect of the study?

To independently prove the findings from our “lookup table” our collaborators in Cambridge took a series of VUS from patients referred to their clinic and tested them in single variant assays that are the standard for PPARG functional testing. In one case, our “lookup table” had a major discrepancy from the single variant assay. Careful follow-up work by our Cambridge colleagues resolved that the single variant assay that has been used for decades was actually incorrect and the “lookup table” made the correct call. We were surprised that, in this case, our new, high throughput assay had higher fidelity with human biology and it introduces a degree of nuance into what we consider “gold standard” when discrepancies inevitably arise.

How do you see your findings impacting on future research?

We hope this study will have its largest impact as a proof-of-concept for other genetic diseases and VUS interpretation. Our method is capable of improvement. For example, it does not assess missense effects on gene splicing, but others in the community are developing complementary methods that will overcome this (for example, see this paper from Gregory Findlay et al. published in Nature). We are particularly excited that our lookup table approach opens the door to systematically characterizing drug-by-genotype interactions that could be useful to guide treatment.