Some resources and tools related to noncoding RNAs

In ‘Meet some code-breakers of noncoding RNAs,’ the technology feature in the February issue of Nature Methods, we speak with a few scientists about the path ahead in methods for characterize the noncoding RNAs.

With their input, we compiled a list of some of resources and tools in this field.

We can gladly include additional resources. Please comment on this page. You can also tweet us: @naturemethods or @metricausa

Some resources and tools related to noncoding RNAs:

 

Resource Description Publication
DASHR Database of small human noncoding RNAs

Leung, Y.Y et al DASHR:database of small human noncoding RNAs. Nucleic Acids Res. 44:D216-22. (2016)

FANTOM CAT Functional Annotation of the mammalian genome (FANTOM) is an international consortium.

This resource is an atlas of human long noncoding RNAs with accurate 5’ ends

 

 

Chung-Chau, H. et al Annotation of noncoding transcripts for example to find functional lncRNAs that show an effect on global expression after knockout/knockdown Nature 543,  199–204  (2017).

Okazaki, Y. et al.Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.
420(6915):563-73 (2002).

Gencode Resource about human and mouse noncoding RNAs, drawing on data generated by the Encyclopedia of DNA Elements (ENCODE) consortium.Information about the noncoding RNA species and their annotations are here Harrow J, et al. GENCODE: The reference human genome annotation for The ENCODE ProjectGenome Research doi: 10.1101/gr.135350.111. (2012)
LNCipedia Database of annotations of  functional long noncoding RNAs manually curated from the scientific literature Clark MB, et al. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39: D146-151 (2011).
 lncRNAdb  Database of annotations of  functional long noncoding RNAs manually curated from the scientific literature Amaral, P.P et al lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39: D146-151.(2011).
lncRNAWiki A Wiki to encourage community-based curation of human long noncoding RNAs. Ma, L et al. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs Nucleic Acids Research43, D1, p. Pages D187–D192, (2015).

 

lncRNAtor A portal for long noncoding RNA with information such as expression profiles and coding potential. Data sources include TCGA, GEO, ENCODE and modENCODE. Park, C. et al. lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs. Bioinformatics. 30(17):2480-5. (2014).
MINTbase Database of tRNA fragments from 11,000 people and 32 cancer types Pliatsika, V.et al. Nucleic Acids Res. 46, D1, D152–D159 (2018).
miRBase Database of published miRNA sequences and annotations Griffiths-Jones S. et al. Nucleic Acids Res. 36, D154-158 (2008).
miRDip A resource with human data; for finding microRNAs that target a gene; or genes targeted by a microRNA Tokar, T. et al mirDIP 4.1- integrative database of human microRNA target predictions, Nucleic Acids Res. 46(D1):D360-D370. (2018).
miRGeneDB A database of validated and anotated human microRNA genes Fromm, B. et al et al. MirGene DB2.0: the curated microRNA GeneDatabase, manuscript in bioarXiv. doi: https://doi.org/10.1101/258749
Noncode A noncoding RNA database with information from 17 species especially long noncoding RNAs. The information is mined from the scientific literature and data resources such as lncRNAdb, and lncipedia.

It includes links to literature about tools such as ncFANs for functional annotation of lncRNAs,

Liu C, et al. NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Research, 2005, 33 (Database issue):D112-D115. (2005)
Regulome resources and data  Resources and data from the Center for Personal Dynamic Regulomes, including the ATAQ-Seq protocol and transcriptional landscape data from 13 cell types from healthy people and 3 cell types from people afflicted by leukemia. Corces MR, et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature Genetics  48(10):1193-203 (2016).
RNA central Resource hosted at the European Bioinformatics Institute that draws on a number of other database resources, such as

LncBase

This resource includes, for example, a database of experimentally supported miRNA:gene interactions and analysis tools and pipelines such as for miRNA pathway analysis

snOPY

snoRNA orthological gene database with information abut snoRNAs, snoRNA gene loci and target RNAs.

TarBase

Manually curated experimentally validated miRNA-gene interactions

 

 Tools 
miRDeep
miRDeep2
Tools for miRNA identification from RNA-seq data An, J et al miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data.Nucleic Acids Res.41(2):727-37 (2013).

Friedländer MR et al. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40(1):37-52. (2012)

 MiRNA prediction tool   miRNA prediction Miranda, KC et al. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes Cell126, 1203-1217, (2006).
 OASIS  Small non-coding RNA detection and expression analysis tool Capece, V. et al. Oasis: online analysis of small RNA deep sequencing data. Bioinformatics 31, 2205–2207 (2015).
Datasets
Analysis of 13 cell types; expression of primate and tissue-specific microRNAs Human miRNAs, their targets, and visualization of the loci on the human genome browser Londin, E, et al. Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs Proc. Natl. Acad. Sci.U.S.A. 112(10):E1106-15. (2015).

Sources: H. Chang, Stanford University School of Medicine; Rory Johnson, University of Bern, E. Marshall, BC Cancer Agency; M. Turner, Babraham Institute; U. Ohler, Max Delbrück Center for Molecular Medicine; I. Rigoutsos, Philadelphia University + Thomas Jefferson University; Nature Research.

 

 

XFEL projects, tools, data portals

Earlier this year, the EuXFEL’s first laser beam reached the ‘hutch’.

Earlier this year, the EuXFEL’s first laser beam
reached the ‘hutch’.

Jessica Mancuso

As of September 1, the European X-ray free-electron laser (EuXFEL) is ready for the research community’s experiments; the user page is here.

In the September issue of Nature Methodswe present some of the experimental ideas researchers are exploring in that facility.

Other XFELs are operational or in the works: FERMI facilityLinac Coherent Light Source at Stanford (LCLS)Pohang Accelerator Laboratory (PAL) X-ray Free-Electron Laser, SPring-8 Angstrom Compact Free ElectronLaser (SACLA)Swiss Free-Electron Laser (SwissFEL).

One day, there might even be a XFEL that fits on a table-top (see below). The day is already here when scientists need to analyze mountains of XFEL-data. The EuXFEL will likely make those mountains grow in height. There are tools for that and likely more tools to come (see far below for a list of some tools).

Tabletop XFEL

To complement the large XFEL facilities, a number of research groups are developing benchtop XFELs9,10. Such projects involve miniaturization of all aspects of the technology, including the accelerator. Some groups explore ways to pass a laser through plasma to produce bright, high-energy, short-pulsed beams. Separately, some researchers use a terahertz generator, which can provide sufficiently high pulse energies, says Franz Kärtner, a physicist at the University of Hamburg who also holds an appointment at MIT. His team, along with Petra Fromme of Arizona State University, is developing such a tabletop XFEL instrument.

The scientists would like to use the instrument for coherent diffractive imaging and spectroscopy experiments on photosystem II, a protein complex involved in photosynthesis. Their compact XFEL approach, which will use a terahertz generator, lasers and nonlinear optics, is calculated to achieve photon energies between 10 and 12 keV, hard X-rays that can be harnessed for imaging at atomic resolution, says Kärtner.

Although this compact XFEL will generate fewer photons per shot than a large-scale FEL—106 to 109 photons per shot as opposed to 1012 photons or more—the machine will be able to produce very short pulses, on the order of 0.5 femtoseconds. That is 10–100 times shorter than current FEL pulses. And if that comes to be, says Kärtner, the peak power of the instrument may be almost on par with that of an XFEL.

A terahertz accelerator module for a table-top XFEL in the making

A terahertz accelerator module for a table-top XFEL in the making

DESY/Heiner Müller-Elsner

In the instrument’s terahertz-driven accelerator there will be acceleration gradients between 500 MV/m to 1GV/m. It’s this high frequency that helps to compress electron bunches over short distances and that will let the developers to use compact electron guns. In this fashion, they will be able to shoot a coherent electron beam directly from a gun and emit an X-ray beam much like a FEL, says Kärtner.

Tabletop XFELs fill an important experimental gap between Röntgen’s X-ray tube and the large-scale FELs. “There is nothing in between,” says Kärtner. That’s akin to a situation in optical science in which researchers need to choose between a light bulb and a large-scale optical laser such as the one at the National Ignition Facility at Lawrence Livermore National Laboratory. If the developers of the compact XFEL succeed at packing enough photons into each shot, the instrument will have potential applications in many fields, he says, including enhanced characterization of materials or higher-resolution medical and structural biology imaging.

  1.  Kneip, et al. Nat. Phys. 6, 980–983 (2010).
  2.  Kärtner, X. et al. Nucl. Instrum. Methods Phys. Res. A 829, 24–29 (2016)

Data mountains

XFEL-based experiments produce mountains of data. At EuXFEL, there are two two-dimensional pixel detectors, which will each deliver 10-40 gigabytes of data every second of an experiment.

Experimental data will be housed in the facility’s online systems and then moved to offline disk-based systems also at the facility where researchers can access and analyze it, says Filipe Maia, a biophysicist at the Uppsala University.

The data torrent makes for “a daunting problem,” says Maia, “and currently there’s clearly a lack of user friendly tools.” This issue is a general trend and not unique to XFEL-based research, but researchers are getting better at handling datasets, which happens also as they familiarize themselves with the increasingly available tools. After publication, he hopes the XFEL-data will be transmitted to an online repository to share it with the community. One such resource is the Coherent X-ray Imaging Data Bank (CXIDB), which he built.


Here are some tools for analyzing and managing XFEL data                             

 Resource Description
CASS-CFEL-ASG Suite of tools for real-time monitoring of XFEL experiments, data analysis and visualization, raw data correction, crystal hit finding.
cctbx.xfel Suite of tools for processing measurements made during SFX experiments at an XFEL. Built on Computational Crystallographic Toolbox.
CrystFEL Software suite for processing SFX data.
 Cheetah Data analysis and high-throughput data reduction tools for SFX data.
 Condor Simulation of Flash X-ray imaging to help solve structures without needing crystallization
 Dragonfly  Software/algorithm for single-particle imaging with XFELs
 Hummingbird  Real-time monitoring of XFEL experiments
 Hawk  Package for analyzing and phasing diffraction patterns from single particle-based experiments
 IOTA Spot-finding software for XFEL-based diffraction images. Part of the cctbx.xfel suite.
 OnDA  Real-time monitoring and data analysis of XFEL experiments
 psana  A data analysis framework at LCLS
 SACLA analysis  framework Real-time data processing pipeline at SACLA for serial femtosecond crystallography; it uses modified Cheetah and CrystFEL.
WavePropaGator Software framework for simulating XFEL experiments.
 XATOM  Software calculating and simulating X-ray atom interaction.
Part of the software package Xraypac.
 XMDYN  Simulation tool for modeling dynamics of matter that is exposed to high-intensity X-rays. Part of the software package Xraypac.
 Resources and Portals 
 Coherent X-ray Imaging Data Bank (CXIDB) A database for coherent X-ray imaging experiments.
 LCLS data  analysis  Data analysis resources at LCLS.
 Protein Data  Bank (PDB)  Data repository for protein structures.
 SIMEX  A project that aims to develop an experimental simulation platform for use at XFELs

Sources: Henry Chapman, DESY; Janos Hajdu, Filipe Maia, Uppsala University; Sébastien Boutet, LCLS

LCLS: Stanford Linac Coherent Light Source; Linac Coherent Light Source, SLAC National Accelerator Laboratory (formerly named Stanford Linear Accelerator Center)
SACLA: Spring-8 Angstrom Compact Free Electron Laser
SFX: Serial Femtosecond Crystallography
XFEL: X-ray free-electron laser

Biology through rose-colored filters

In this month’s editorial, we visit the topic of red to near infrared fluorescent probes for use in fluorescence microscopy, with special emphasis on imaging live specimens.  The topic was inspired by a Commentary from Laissue and colleagues in this issue on the topic of assessing and reporting phototoxicity during live imaging.

As additional reading, we suggest a 2013 editorial on phototoxicity from our pages.

We welcome any feedback!

Stem cells: a conversation with Sally Temple and Lorenz Studer

Sally_LorenzFrom June 14-17, scientists convened in Boston for the annual meeting of the International Society for Stem Cell Research (ISSCR). Here is the browsable program.

In advance of the meeting, we had the opportunity to chat with Sally Temple, president of ISSCR, co-founder and scientific director of the Neural Stem Cell Institute and Lorenz Studer, founder and director of the Center for Stem Cell Biology at Memorial Sloan Kettering Cancer Center to hear about some of the sessions but also to learn about some larger trends shaping the stem cell biology field.

Here is a video (19 minutes).

And here is a podcast of this conversation, both moderated by Tal Nawy, a senior editor at Nature Methods.

 

Computable sugars: some computational resources in glycoscience

Glycoscience is sweet science

Glycoscience is sweet science

PhotoDisc/ Getty Images

As glycoscience advances, labs will increasingly want to ask questions about glycosylation sites on a protein or the structure of a sugar, says Raja Mazumder, a bioinformatician at George Washington University. They might ask for example: are there glycosyltransferases that are expressed in liver but not in the heart, or, which ones are overexpressed by a factor of three in more than two cancers. Such questions require infrastructure building, he says, because right now there is no mechanism to allow such queries. But he and others are building such capabilities. Mazumder along with William York at the University of Georgia are starting to build a glycoscience informatics portal.

Mazumder wants to leverage existing ontologies in the developer community in order to build systems that can be queried on a large-scale. For example, Mazumder is working with Cathy Wu at Georgetown University, who is developing the Protein Ontology. Such ontologies are collected, for example, by the non-profit OBO Foundry. To allow flexible querying, the computational resources will draw on different ontologies; ones that relate to glycans, genes, proteins, tissues, diseases and more.

Ontologies are part the team’s effort to build application program interfaces (APIs) that expose the data in a given database to incoming queries. Given how complex sugars are, the informatics framework has to be well-organized for both human and machine-based querying, says Mazumder.

When using the resource, a researcher will receive results that also document the search process itself such as the version of the queried database. “You need to be able to tell where you got that information from,” says Mazumder. Tracking data provenance matters especially in an age when databases continuously integrate information emerging in the literature.

For the Food and Drug Administration, Mazumder is developing computational standards for high-throughput sequencing, which he wants to also apply to glycoscience. His ‘biocompute object’ captures the given computational workflow a lab might have used to generate results: the software used, the databases queried and their version, and identifiers of data inputs and outputs. These biocompute objects are intended to help regulatory scientists interpret submitted work. It can also help scientists generally see if, for example, the version of software they used worked as it should, says Mazumder.

Too often labs use computational tools without benchmarking them, says Mazumder. “It would be unthinkable for a wet-lab scientist to not have a positive and negative control,” he says.  In informatics, developers benchmark their software but users often do not have these habits. “They don’t even know: if I don’t find anything, is it because my software did not run well or not?”

As labs move to big data analysis in genomics and also, eventually, in glycoscience, this aspect is ever more important, says Mazumder. In his view, biocompute objects will help glycobiology researchers communicate with one another about their results, such as where on a protein they found a sugar with a given structure. More generally, it will help glycoscientists to have a better way to connect the available sugar resources as they pursue their questions of interest.


Here are some resources that glycoscientists can tap into:                             

 Category Resource Description
General resources and funding information
Transforming Glycoscience: A Roadmap for the Future Report by the National Research Council of the National Academies of Science
NIH Common Fund program in glycoscience  Funding opportunities from the NIH Common Fund program in glycoscience
A roadmap for Glycoscience In Europe by BBSRC, EGSF, European Science Foundation   Glycoscience roadmap for Europe
GlycoNet Resources related to glycoscience research in Canada, based at the University of Alberta where the Alberta Glycomics Centre is located
National Center for Functional Glycomics A Glycomics-related Biomedical Technology Resource Center based at Beth Israel Deaconess Medical Center, Harvard Medical School with resources on, for example, microarrays and microarray services, protocols, training and databases
Databases and  portals 
CAZy Carbohydrate-Active Enzymes, a database of enzyme families that degrade, modify or create glycosidic bonds
Consortium for Functional Glycomics Resources and glycoscience data. Part of the National Center for Functional Glycomics.
ExPASy Software tools and databases to simulate, predict and visualize glycans, glycoproteins and glycan-binding proteins
Glycan Library  A list of lipid-linked sequence-defined glycan probes
Glyco3D A portal for structural glycoscience
GlycoBase 3.2 A database of N– and O-linked glycan structures with HPLC, UPLC, exoglycosidase sequencing and mass spectrometry data
GlycoPattern Portal for glycan array experimental results from the Consortium for Functional Glycomics
Glycosciences.de Collection of databases and tools in glycoscience
GlyToucan Repository for glycan structures based in Japan
MatrixDB A database of experimental data of interactions by proteoglycans, polysaccharides and extracellular matrix proteins
Repository of Glyco-enzyme expression constructs University of Georgia Complex Carbohydrate Research Center repository for glyco-enzyme constructs
SugarBind A database of carbohydrate sequences to which bacteria, toxins and viruses adhere
UniCarbKB A resource curated by scientists in in five countries. It includes GlycoSuiteDB, a database of glycan structures; EUROCarbDB, an experimental and structural database and UniCarb-DB, a mass spec database of glycan structures
Software tools
CASPER Web-based tool to calculate NMR chemical shifts of oligo- and polysaccharides
Glycan Builder An online tool at ExPASy for predicting possible oligosaccharide structures on proteins
GlycoMiner/GlycoPattern Software tools to automatically identify mass spec spectra of N-glycopeptides
GlyMAP An online resource for mapping glyco-active enzymes
NetOGlyc Software tool for predicting O--glycosylation sites on proteins
SweetUnityMol Molecular visualization software

Sources: NIH, R. Mazumder, George Washington University; New England Biolabs, Thermo Fisher Scientific, Nature Research