As scientific career goals go, being digitized into a 3D game avatar probably rarely cracks researchers’ top 10. But for Emma Lundberg, it’s an achievement unlocked, says Jeffrey Perkel in his first workplace technology blog post for Naturejobs.
Since March 2016, Lundberg – or rather, her digital doppelgänger, complete with cool black body armour (“It even has freckles!” she says) has been stalking the virtual world of EVE Online, a sci-fi massive multiplayer online game, introducing players to the subtleties of subcellular protein distribution.
As detailed in Nature Biotechnology in May 2016, Project Discovery is a “mini-game” within the larger sci-fi universe of EVE Online. In it, players – guided by “Professor Lundberg” – are presented with high-resolution photomicrographs of cells, typically stained blue (nucleus), red (cytoskeleton), and green (protein of interest). Their task? To match that green signal with its cellular address – a process called spatial proteomics. To date, some 160,000 gamers have devoted more than 7 million minutes to solving that problem, generating 17.6 million classifications, Lundberg says. “This equals 65 Swedish working years.”
On December 4 2016, at the annual meeting of the American Society for Cell Biology in San Francisco, Lundberg, an associate professor at the KTH Royal Institute of Technology in Stockholm, and her colleagues formally launched the resulting resource.
The Cell Atlas is a free online website and database that details the gene expression and subcellular addresses of 12,036 genes and gene products across 22 human cell lines. Also included are organelle-specific proteomes, as well as catalogues of proteins that are cell cycle-dependent or distributed in more than one compartment.
To create the atlas, Lundberg and her 15-member team coupled a custom high-throughput confocal fluorescence microscopy workflow and a library of over 14,000 antibodies to map each protein to any of 30 different subcellular locales. These range from visually obvious structures like the nucleus, cytosol, and plasma membrane, to more esoteric objects, such as the “cytokinetic bridge.”
The atlas’s 82,374 images were collected on a custom oil-immersion Leica SP5 upright confocal microscope and workflow that the team spent years developing. At first, everything was done manually. Today, each protein is imaged in each of three cell lines, which are grown and stained in glass-bottomed 96-well plates and imaged at low resolution to get a sense of protein intensity and distribution. Those pictures are then fed into custom software that works out the optimal settings to capture those data, after which the samples are imaged a second time at high resolution. About 30% of samples still have to be rephotographed manually, Lundberg estimates, to cope with the human proteome’s wide dynamic range. Nevertheless, the team is able to process a thousand samples a week, she says.
The Cell Atlas is one of three subprojects under the aegis of the larger Human Protein Atlas project; there’s also a tissue atlas and a cancer atlas, both of which also are freely available online. “The common denominator between these different atlases is that they’re all doing spatial proteomics,” she says. “We are mapping where in the human body proteins are localized, but it’s being done at different levels.” The team has built up a vast library of well-validated, custom polyclonal antibodies to drive these efforts, 13,113 of which were used to build the Cell Atlas and all of which are available through the HPA website.
Each Cell Atlas page details the RNA abundance and protein localization patterns for the protein of interest, as well as a description of the gene function and links to external databases (such as Ensembl, UniProt, and Antibodypedia). In a nifty twist, the photomicrographic data are not static: users can turn different channels on and off to highlight, say, nuclear distribution (by turning off the nuclear DAPI channel).
Lundberg’s team’s analysis of the data to date show that about half of all proteins localize to multiple compartments and 16% vary in intensity and subcellular localization from cell to cell. “This indicates that these proteins may be involved in several biological processes, and that spatial confinement may be a key cellular regulatory mechanism,” she says.
Researchers can use these data, she adds, not only to determine the subcellular localization of their proteins of interest, but also to assign putative functions to unknown proteins. Systems biologists can use the data to “constrain” ‘omics datasets, for instance by eliminating hypothesized protein-protein interactions that cannot occur because the two proteins occupy different subcellular locales. And, machine learning and artificial intelligence researchers can use the data to hone pattern recognition algorithms.
All told, the Cell Atlas project has collected over 350,000 photomicrographs, Lundberg says. The challenge of sifting through them all and interpreting what they said led Lundberg and her team to contemplate a citizen-science project in the first place, she says. But that’s not the only reason: The project team initially assigned proteins to any of 10 “major” subcellular destinations. But images, she says, are dense with data, and given enough eyeballs looking at them, it would be possible to work out more precise cellular addresses.
Project Discovery players, she says, have proven remarkably adept at that, especially given that most are not trained scientists. “They’re not good at everything, but they’re doing quite good,” Lundberg says. “They’re … kind of at the level of good artificial intelligence at the moment.”
Lundberg plans to submit a paper detailing the EVE Online project “within a couple of months,” she says. A paper describing the Cell Atlas is under review.
Jeff Perkel is technology editor, Nature.