TCGA | Free Association

Sharp and broad H3K4me3 peak definitions{credit}Chen et al. Nat. Genet. 2015{/credit}

Modifications to histones, including methylation and acetylation, are used by cells to regulate gene expression. Though a lot is now known about how different histone marks correlate with transcriptional activation or repression, the “histone code” has not yet been fully elucidated. As we discussed last week, a recent study found that, contrary to expectation, genes that are dynamically regulated during development do not display histone modifications normally associated with active transcription.

A new study published this week in Nature Genetics reports another unexpected epigenetic pattern. Tri-methylation of histone 3 at lysine 4 (H3K4me3), a mark associated with active transcription, is usually present as a sharp, narrow peak at the gene promoter. The authors of the study observed that some genes show a different pattern of H3K4me3: broad, low density methylation spanning up to 10kb along the gene body.

The broad H3K4me3 mark was associated with high gene expression levels and transcriptional stability in this study. The authors also found that cell identity genes and, interestingly, tumor suppressor genes, were enriched for the broad H3K4me3 mark.

H3K4me3 density at housekeeping genes and tumor suppressor genes. Right panel is a zoomed-in version of the left panel. {credit}Chen et al. Nat. Genet. 2015{/credit}

Though it is unclear why tumor suppressors specifically would be associated with this mark, a comparison between normal and tumor cells showed that H3K4me3 peaks at tumor suppressor genes became narrower in cancer cells and that this was associated with transcriptional repression. Finally, the authors showed that candidate tumor suppressor genes could be identified by the broad H3K4me3 mark.

We asked one of the study’s lead authors, Wei Li, to tell us a little more about the study:

What was the motivation for your studies?

The general motivation was to make novel discoveries based on existing ‘big data’ in epigenomics. In order to do so, we have had to develop novel bioinformatic tools that will enable us to look at the data from a completely different angle. In particular in this study, we developed a new tool to quantify the H3K4me3 signal based on its width only. Most previous studies have only focused on its height or total signal, because the majority of genes (>95%) only have narrow (<1 kb) and high H3K4me3 peaks. This simple method has never been used in epigenomic data analysis before. We further proved that this computer-derived broad H3K4me3 signal alone is sufficient to define both known and novel tumor suppressors and its performance is even better than the human curated KEGG pathway in cancer (a collection of well-curated signaling networks involved in cancer development).

When you first observed broad H3K4me3 peaks, did you expect that it would be such a widespread feature of tumor suppressor genes?

No, it is totally unexpected. Many people in the field (including ourselves) observed broad H3K4me3 peaks long time ago (even in the first histone mark ChIP-seq paper published in 2007), but all ignored them and treated them as potential sequencing artifacts. My lab used the UCSC genome browser to check epigenetic patterns gene by gene on a daily basis, and we gradually noticed that broad H3K4me3 peaks are consistently observed in different datasets and specific to a small group of genes. To test whether it is an artifact or not, we decided to perform a functional enrichment analysis of genes marked with broad H3K4me3. If nothing is enriched, it must be a sequencing artifact. Interestingly, we found an unexpectedly strong enrichment in tumor suppressor genes.

Did you consider whether any other classes of genes were enriched in this histone mark?

We used an unbiased data-driven approach (rather than hypothesis driven) to study the genes marked with broad H3K4me3 peaks. It turns out that only cell identity genes and tumor suppressors are enriched. When we removed cell-type specific broad H3K4me3 peaks by epigenomic conservation analysis, tumor suppressors is the only class of genes that are enriched in the conserved broad H3K4me3.

Widespread shortening of H3K4me3 peaks in cancer{credit}Chen et al. Nat. Genet. 2015{/credit}

Tumor suppressors are defined by their role in cancer. Why do you think they show a similar pattern of H3K4me3 in normal cells?

A common feature of tumor suppressors is that they are usually highly expressed in normal cells to prevent tumor formation. This is likely why they show a similar pattern of H3K4me3 because broad H3K4me3 is associated with increased transcription elongation and enhancer activity together leading to exceptionally high gene expression in normal cells.

Not all tumor suppressors show the broad H3K4me3 mark. Why do you think this is?

Cancer is always heterogeneous. To my knowledge, there is no single mechanism in the literature that can specifically explain all tumor suppressors. Broad H3K4med3 is not an exception.

Nature Genetics is pleased to present today the first installment of our Focus on TCGA Pan-Cancer Analysis.

The Cancer Genome Atlas (TCGA) has analyzed over 8,000 cancer cases across 27 tumor types to date, and aim to have over 100,000 specimens analyzed by the of 2015. They have commendably made both data and exploration tools publicly available at https://www.cancergenome.nih.gov. They have previously published 8 papers reporting in-depth genomic characterization of individual tumor types.

The TCGA Pan-Cancer initiative, launched in October 2012 at meeting in Santa Cruz, California, seeks to combine analysis across tumor types in order to identify both similarities and differences in genomic alterations. The work presented in this collection of Pan-Cancer publications includes analysis of the first 12 TCGA tumor types. This includes over 3,000 cancer patients profiled with 6 different platforms to assess genomic, transcriptional, epigenetic and proteomic alterations, combined with clinical data. The authors demonstrate that while a majority of the tumor samples show unique genomic alterations, that by combining analysis they are able to both increase statistical power for the detection of molecular drivers and to identify common pathways that are altered across tumor types.

The Pan-Cancer initiative provides a model for large-scale collaborative analysis as well as data sharing, bringing together over 250 collaborators from ~30 institutions working together on over 60 projects analyzing the same dataset. These efforts required a strong collaborative framework, a commitment to rapid distribution of data, and means to facilitate shared analysis. Josh Stuart and colleagues provide an overview of this project in an accompanying Commentary.

This work also relied on the development of new bioinformatics tools and platforms, providing a foundation that should prove useful in future large-scale analysis projects. A Commentary by Larsson Omberg and colleagues highlights these approaches and the use of the Synapse software platform to share and evolve data, analysis and results among the Pan-Cancer Working Group. The Synapse platform was developed by Sage Bionetworks to facilitate open and data-driven collaborative research efforts, and is also being well used in DREAM challenges. The use of this platform supported the discovery efforts reported in this collection of Pan-Cancer papers, which also provide a public resource of highly curated and standardized data sets across a series of data freezes along with automated analysis systems.

In the first of two Analysis papers published today in Nature Genetics, Chris Sander and colleagues provide a hierarchical classification of 3,299 tumors from 12 cancer types from the Pan-Cancer dataset, using a newly developed algorithmic approach. Their analysis separates tumors into those with primarily somatic mutations and those with primarily copy number alterations. They also identify oncogenic signatures that characterize ~30 tumor subclasses, which may suggest therapeutic targets of relevance across tumor types.

In a second Analysis published in Nature Genetics, Rameen Beroukhim and colleagues characterized somatic copy number alterations (SCNAs) in 11 cancer types and 4,934 primary cancer specimens from the Pan-Cancer dataset. They observed whole-genome doubling in 37% of cancers, associated with higher rates of all SCNA.

We are pleased to support the TCGA Pan-Cancer efforts as a model for large-scale collaborative genomics projects combined with open data sharing, and demonstrating the ready benefits this can bring to our understanding of the molecular drivers of cancer. The TCGA Pan-Cancer project continues to develop, and so will this Focus, so please get primed with this selection of publications and stay tuned. In the meantime, here is a selection of social media and press stories: https://storify.com/obahcall/nature-genetics-pan-cancer-focus.

[View the story “Nature Genetics Pan-Cancer Focus ” on Storify]

Free Association

a blog from Nature Genetics

Tag Archives: TCGA

Methylation marks tumor suppressors