Tara Oceans back home

Tara
Tara sailed more than 60,000 miles.

The Tara Oceans expedition (oceans.taraexpeditions.org) just arrived last Saturday in Lorient, France, after sailing across the seas of the planet for more than 2 years, collecting samples of planktonic life and recording physical, geographical and climatic parameters in a systematical manner over a total of 153 stations.

A couple of weeks ago, on March 13, we were fortunate to be able to join Eric Karsenti who had just boarded Tara in the Azores. In this interview (listen) and an accompanying invited Editorial “Towards an Oceans Systems Biology” (Karsenti, 2012), Eric explains how the data collected by the expedition will help “understanding how populations of organisms are structured by their interaction with the environment and how such complex systems have evolved” in the marine ecosystem.

The integration of the collected biological and geochemical data into predictive models will represent a formidable challenge and will necessitate the development of appropriate analyses methods (Raes et al, 2011). But preliminary results already indicate that the data will provide exciting insights into the biodiversity of the marine environment: “it looks like there are many more eukaryotic species than bacteria and 90% of these species are unknown”.

Beyond its scientific outcome, the philosophy of the expedition was also to “promote broader thinking” by revealing the interdependence between marine life and environment and thus reminding us “we all depend on each other on this planet”.

A nice lesson in systems biology!


Karsenti E (2012) Towards an ‘Oceans Systems Biology’. Mol Syst Biol 8:575

Raes J, Letunic I, Yamada T, Jensen LJ & Bork P (2011) Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data. Mol Syst Biol 7:473

[Research highlight] modENCODE releases extensive functional investigation of fly and worm genomes

Recently, a series of publications by members of the modENCODE consortium were released online at Science, Nature, and Genome Research. These works collectively describe a massive effort to functionally characterize and annotate the Drosophila melanogaster and Caenorhabditis elegans genomes, including in-depth analyses of genes and transcripts, epigenetic marks, transcription factor binding, and replication timing, across a range of developmental and tissue sources.

Integrated analyses of these data are described in two articles released at Science (Gerstein et al, 2010; modENCODE Consortium et al, 2010). These works provide compelling support for the existence of highly occupied target regions (HOT) regions — regions of the genomes that bind a complex mix of many transcription factors, but whose connection with gene regulation is still largely unclear — and, show that the dense epigenetic datasets can be used to segment the genomes into “chromatin states” that have distinct functional properties (see also the recent work by Filion et al, 2010)

In a related Perspective, Mark Blaxter, declares that these works have provide an important step toward the ability “to compute an organism from its genome” (Blaxter 2010). A prime example of progress toward this goal is provided by the particularly comprehensive genomic regulatory network built by the Drosophila modENCODE team, which is inferred from a combination of ChIP-based transcription factor binding, sequence motifs, epigenetic marks, and coexpression (modENCODE Consortium et al, 2010). A relatively simple linear combination of predicted regulatory inputs can predict the expression of about one quarter of the transcriptome with some accuracy. In addition, the authors find that the remaining unpredictable genes tend to have noisier expression levels, suggesting that they may be intrinsically more weakly regulated.


 

Blaxter M (2010) Genetics. Revealing the dark matter of the genome. Science 330:1758-9

Filion GJ, van Bemmel JG, Braunschweig U, Talhout W, Kind J, Ward LD, Brugman W, de Castro IJ, Kerkhoven RM, Bussemaker HJ, van Steensel B (2010) Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143:212-24

Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K et al (2010) Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science 330:1775-1787

modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, Washietl S, Arshinoff BI, Ay F, Meyer PE, Robine N, Washington NL, Di Stefano L, Berezikov E, Brown CD et al (2010) Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science 330:1787-1797

Keystone Symposium – Omics Meets Cell Biology (II)

thumb090202a.jpg

Before I carry on with a summary of the second part of the Keystone Symposium ‘Omics Meets Cell Biology’, I should clarify that this post and the previous one dedicated to this conference are not intended to provide an comprehensive account of all the talks but rather to communicate some general (and subjective) impressions of the meeting. To keep these posts reasonably short (and sometimes due to a lack of memory…), I had to omit several of the excellent presentations given at this meeting. The full program and complete list of speakers is available at the Keystone Symposium website.

Many of the presentations given during the second part of the meeting reported findings derived from cell-based high- or medium-throughput functional screens, most of them relying on RNAi-mediated knock-down. Here is an overview of the screens presented during this meeting, illustrating by their diversity in scope and scale the versatility of this method:

Focus # genes tested Type Speaker
autophagy 21’000? RNAi M Lipinski
sensory organ dev. 20’000 RNAi J Mummery-Widmer
cell polarity 16’000 RNAi J Ahringer
imatinib modifiers 9500 (pooled) RNAi D Sabatini
viral entry 4000 RNAi L Pelkmans
cell-cell contacts 2000 RNAi T Pawson
cell migration 1000 RNAi J Brugge
centrosome 113 RNAi L Pelletier
bipolar spindle 45 RNAi R Medema
DNA repair RNAi D Durocher
neuronal differentiation 700 TF overexpression M Snyder
gene-centered TF location yeast 1-hybrid library M Walhout
protein degradation reporter library S Elledge

Perhaps not surprisingly, many speakers emphasized that RNAi screens invariably need to be followed up by time-consuming and tedious validations. The off-target problem in mammalian cell-based RNAi screens appears also to be taken very seriously and it was reported that from 4-7 siRNA directed against the same gene were necessary to reach a good level of confidence. In view of the increasing number of RNAi-based functional screens, standards for the description of such experiments (eg. MIARE, MIACA) are likely to become increasingly useful.

In systems biology, network models are often central for the interpretations of omics data related to molecular interactions and they allow to generate biological insights which are different from those derived from the more classical screening-mechanistic-dissection paradigm. In this regard, Uwe Sauer presented exciting work on the relationship between transcriptional regulatory networks, protein expression and the state of the yeast metabolic network. Using a combination of genetic approach and drug perturbations, a series of parallel ‘fluxomic’ and metabolomic measurements revealed that metabolic fluxes, in contrast to metabolite concentrations, remain robust to perturbations and are apparently affected only by a handful of transcription factors in a given condition at steady state. At the computational level, integration of different types of data represents significant challenges. For example, it is far from trivial to find ways to exploit the information contained in interaction networks and integrate it with other types of large-scale molecular measurements. Trey Ideker exposed an efficient solution to this problem within the context of microarray profiling of breast cancers and showed that expression data can be combined with information on protein physical interactions to define improved and biologically meaningful pathway-based biomarkers for the classification of metastatic vs non-metastatic tumors.

While superposing parallel datasets leads to a ‘vertical’ integration of networks, Marian Walhout presented an approach to integrate ‘horizontally’ transcriptional and miRNA-dependent regulatory links and map a composite transcription factor/miRNA regulatory network in Caenorhabditis elegans. In this elegant work, the yeast one-hybrid assay was used as a gene-centric screening method to identify regulatory links between hundreds of transcription factors and promoters of both miRNA genes and genes encoding transcription factors. Closing the loop, the network was completed by computationally predicting the transcription factors potentially targeted by miRNAs. Interestingly, the resulting network showed numerous composite motifs including negative feedback loops (TF → miR –| TF), which are otherwise under-represented in pure transcriptional regulatory neworks.

Completion of network models may require tedious and repetitive work. To the question “who will fill the gaps?”, Steve Oliver replied: “a Robot Scientist”. He showed that an actual implementation of such a robot is able to iteratively use a computational model of the yeast metabolic network to automatically design informative experiments, perform them and use the results to extend the model. In an effort to provide a genome-scale overview of the molecular interactions that underly regulation of gene expression, Tim Hughes presented a variety of microarray-based technologies to systematically map transcription factor-DNA, nucleosome-DNA and protein-RNA interactions. The latter results were particularly intriguing given that the high-throughput identification of targets of RNA-binding proteins remains a relatively unexplored route and may reveal novel insights into the complexity of post-transcriptional regulation.

To conclude on a somewhat different note, it was also interesting to observe that an increasing number of studies were accompanied by extensive web resources providing access to the respective datasets:

Resource Lab
PhophoPep R Aebersold
Human Protein Atlas M Uhlen
3Dcomplexes.org S Teichmann
Nature Cell Migration Gateway J Brugge
EDGEdb.org M Walhout
CellCircuits T Ideker
STRING C von Mering

This situation underscores the need of a proper infrastructure to host and share (or publish?) large datasets in biology and the central role of web technologies in this regard. In view of the proliferation of biological databases, I wonder whether it might be helpful to have general recommendations on some minimal requirements for this type of databases—eg. type of searching, visualization, data integration functionalities, existence of a (web) APIs, download of datasets, possibility to integrate external datasets, etc…? Or would perhaps something like a ‘Minimum Information About a Biological Database’ be useful to specify the capabilities of databases? One may also dream that these databases will become progressively interoperable and eventually include web-based APIs facilitating programmatic access to the information stored, ultimately sending Omics in the Cloud

thumb090202b.jpg

And, oh yes, the slopes were very nice, even though, I have to admit the air was thin and a little fresh…