Data reproducibility and transparency mean different things to different people, but one aspect involves allowing scientists to view and manipulate the data or code underlying published figures, both to double-check others’ work and to repeat those analyses using custom data. Over the past year, for instance, the open-access journal F1000Research has implemented integrations with Code Ocean and Plotly for viewing and manipulating programming code and figures, respectively. Now, a new publication showcases interactive figures for 3D genome analysis, too.
In a paper published in the 5 October issue of Cell, graduate student Suhas Rao, and colleagues in the laboratory of Erez Lieberman Aiden at the Baylor College of Medicine in Houston, Texas, report the high-resolution, time-resolved analysis of genome structure following the loss of a protein complex implicated in genome folding, called cohesin.
The team used a modified version of the protein RAD21, which is part of the cohesin complex, to degrade and then replenish cohesin activity in cultured human cells. By taking snapshots of the chromatin structure over time as cohesin was removed and restored, the team demonstrated that cohesin is required for the formation and maintenance of loop domains in eukaryotic chromosomes — data that are consistent with the ‘loop extrusion‘ hypothesis of chromatin structure.
The technique at the study’s heart is Hi-C, a method that I wrote about in a recent Toolbox article for mapping chromatin contacts in 3D space across the entire genome. As I explained then,
The technology identifies sequences that are far apart in the linear DNA sequence but close neighbours in 3D space. “You look at a pair of positions in the genome, and it tells you often they bump into one another,” Aiden explains. Typically, those data are rendered as heat maps, with colour intensity reflecting the interaction frequency between two points.
To explore those maps, Neva Durand in Aiden’s lab, working with Jim Robinson of the Integrative Genomics Viewer (IGV) project, developed a desktop application called Juicebox; more recently, they worked with IGV team member Douglass Turner to create a browser-based version, Juicebox.js (see also the bioRxiv preprint, posted 19 October). In the recent Cell paper, the lab used that web-based version to create and share interactive versions of its otherwise static, published figures. (see: Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7)
As Rao explains, Hi-C datasets are so data-rich — involving potentially billions of contacts — that it’s impossible to convey every nuance in a single snapshot. So, the team worked with Cell to provide interactive versions that readers could explore on their own.
“[That] no reader will take our word for anything but rather will be able to make sure that we didn’t cut any corners and look at our figures and look at other regions in the genome beyond what we show in our figures, I think was a big hope for us,” Rao says. “That this would push the field forward in the sense of data transparency and data reproducibility.”
In fact, Rao notes, that strategy proved its worth during the peer-review process, when a reviewer expressed concerns that the 2D maps were so clean they may have been manipulated. “That question is just a moot point when you have an interactive figure, because if you think we’re manipulating the color scale, well, go to the interactive figure, change the color scale to whatever you want and see what happens.”
Juicebox is just one of a collection of tools for exploring 2D chromatin contact matrices. But there’s more than one way to fold a genome, to mangle the old saw. Though Hi-C typically identifies interactions between two chromosomal regions, it also can pick up contacts between three, four, or even more loci, representing higher-order chromatin structures.
“This is actually an interesting data-visualization problem,” Rao says, “because now we’re not dealing with a 2D matrix anymore. Now we’re looking at, in the case of three pieces of DNA ligated together, a cube. We have a 3D tensor.”
To visualize one such complex, team member Ziyi Ye developed a new interactive tool, in which each voxel in a manipulable 3D cube represents the averaged contacts between three distinct loci. (You can view the figure here.)
“You have the ability to take this cube that we’ve shown and rotate it around, look at it from all different angles, and zoom in, zoom out, [and] change the color scale,” Rao says. And, he adds, although that visualization was made specially for this paper, the team is working to integrate it into Juicebox moving forward. “When you start to get into these higher-dimensional views, it becomes crucial to have these more-interactive ways of visualizing [them], because it’s just not possible to properly visualize it in a static, 2D way.”
In the meantime, the chromatin visualization toolbox grows ever richer. On 8 October, Zhihua Zhang at the Beijing Institute of Genomics, and colleagues reported on bioRxiv a new tool called Delta. “Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topology associated domains and chromatin loops in the genome, and generates a physical 3D model which represents the plausible consensus 3D structure of the genome,” the team writes. Their tool is available here.
Jeffrey Perkel is Technology Editor, Nature
24 Oct 2017: Post updated to include link to bioRxiv preprint.
Suggested posts
Jupyter powers bioinformatics, again
HiPiler simplifies chromatin structure analysis
Mike Goodstadt: A circuitous route to bioinformatics
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions