Is rare always real?

Deep sequencing of metagenomic datasets by next-generation technology, has revealed a richness and diversity of microbial species that far exceeded previous expectations.

In 2006 Mitchell Sogin presented the concept of a rare biosphere, low-abundance microbial populations that have persisted over large evolutionary time scales. Sogin described it as a ‘potentially inexhaustible reservoir of genomic innovation’ which lets microbial communities recover from environmental assault and allows them to adapt to changed circumstances.

How do scientists determine microbial diversity? Sequence reads covering the hypervariable regions of 16S rRNA genes are often used for classification. These ‘pyrotags’ (named thus since they are obtained on Roche’s 454 sequencer) can be classified by matching them against a rRNA sequence database.

The challenge is that for most microbial species the 16S rRNA gene is not known. An alternative is shotgun sequencing and grouping of reads into operational taxonomic units; i.e. a cluster of sequences with a defined difference to a neighboring cluster — commonly a sequence difference of 3% is required at the species level, 5% at the genus level.

But with predictions of an ever larger rare biosphere voices of caution are starting to be heard saying that its size is overestimated. Is everything that is classified as a new and rare species indeed real, or simply a sequencing error?

Most recently we featured an error correction program in our own pages that addressed this issue; and a note of caution by Phil Hugenholtz and colleagues about ‘Wrinkles in the rare biosphere’ just appeared in “Early View” in Environmental Microbiology.

How well founded are the concerns of an artificially inflated rare biosphere? Are current estimates of the rare biosphere really 10-fold too high? If so, what are the consequences — does it mean that the rare biosphere is far less important than assumed? Is it not such an inexhaustible reservoir after all but just background noise?

We understand that these are very controversial issues and we would love to hear from you.

Metagenomics versus Moore’s law

Moore’s law refers to the trend observed in computing hardware that the number of transistors on a computer chip doubles about every two years, thus effectively doubling computing power. This has been considered quite a rapid increase.

However, this increase pales in comparison to recent and continuing advances in the throughput of DNA sequencing technology that have resulted in an astonishing increase in the production of DNA sequence by biologists. This is certainly true in the field of metagenomics which involves shotgun sequencing of the genomes (or transcriptomes) of all the organisms in an environmental sample. Biologists are adopting this technology at an rate that was completely unanticipated by most people in the field. This is creating a situation where comprehensive analysis of the resulting sequences, whose analysis is far more complex than for single-genome sequence, is becoming computationally intractable with existing resources and pipelines. The Joint Genome Institute’s call for large scale (Terabase) "Grand Challenge” metagenomic projects highlights the scale of datasets that people are now discussing.

The editorial in the September issue of Nature Methods discusses this situation and calls for concerted efforts to ameliorate the metagenome-analysis gridlock that appears imminent. The recently formed M5 (metagenomics, metadata, metaanalysis, multiscale-models and metainfrastructure) Consortium will be proposing a promising solution, the ‘M5 Platform’, later this year. We hope these efforts will find support and be successful at ensuring this deluge of valuable data is analyzed efficiently and productively.

Delay in delivery of Nature Methods in Italy

Our print subscribers in Italy will unfortunately experience a delay in receiving their print copies of the August edition of Nature Methods. Regretably, all 2,000 copies delivered to Italy were stolen and haven’t been recovered. We are working to have the issue reprinted and delivered as soon as possible.

We often receive comments from our print subscribers that their copy of Nature Methods tends to get pilferred from their mailbox or desk but we certainly never expected to witness a theft of this scale. We are doubtful that demand for the journal is such that a black market has developed for copies at cut-rate prices but it is humorous to imagine what the culprits response was when they opened the boxes.

Top downloads for July ’09

Two Correspondences made the list of top downloads for July coming in at #3 and #4, demonstrating that while this format may not report new methods it does have information of high interest to readers. The two top downloads seem to highlight a high level of interest in assaying single cells and using FRET to examine protein dynamics.

Top 8 research papers published in the July issue

1. Quantitative analysis of gene expression in a single cell by qPCR

2. Mapping the structure and conformational movements of proteins with transition metal ion FRET

3. Limitations and possibilities of small RNA digital gene expression profiling

4. Enabling IMAC purification of low abundance recombinant proteins from E. coli lysates

5. In vivo fluorescence imaging with high-resolution microlenses

6. Protein interaction platforms: visualization of interacting proteins in yeast

7. Agouti C57BL/6N embryonic stem cells for mouse genetic resources

8. Reaching the protein folding speed limit with large, sub-microsecond pressure jumps

There has been very little movement in the ten most popular papers published prior to our July issue and downloaded during July. The exception is the appearance of the HUPO test sample paper from June appearing at #7. The papers below this have changed but mostly because a large number of papers have very similar download numbers and they shuffle around from month to month.

Top 10 research papers published prior to the July issue

1. Mapping and quantifying mammalian transcriptomes by RNA-Seq

2. mRNA-Seq whole-transcriptome analysis of a single cell

3. Universal sample preparation method for proteome analysis

4. Stem cell transcriptome profiling via massive-scale mRNA sequencing

5. Isolation of human iPS cells using EOS lentiviral vectors to select for pluripotency

6. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing

7. A HUPO test sample study reveals common problems in mass spectrometry–based proteomics

8. Generation of transgene-free induced pluripotent mouse stem cells by the piggyBac transposon

9. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes

10. A versatile tool for conditional gene expression and knockdown

New methods in the literature

Our August issue went live last week; check out the Research Highlights section for a few “news” stories about interesting new methods described in the literature over the last month or two. Unfortunately we cannot highlight every interesting methods paper we find in the pages of the journal, so check out some of the others we considered that didn’t quite make the cut.

Proteomic analysis of S-nitrosylation and denitrosylation by resin-assisted capture

Nature Biotechnology 27, 557-559 (2009)

Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing

Science 324, 1210-1213 (2009)

Validated germline-competent embryonic stem cell lines from nonobese diabetic mice

Nature Medicine 15, 814-818 (2009)

Unfolding Individual Als5p Adhesion Proteins on Live Cells

ACS Nano, 3, 1677-1682 (2009)

Development of aliphatic biodegradable photoluminescent polymers

PNAS 106, 10086-10091 (2009)

Directing cell motions on micropatterned ratchets

Nature Physics 5, 606-612 (2009)

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding

Genome Research, published online 22 June 2009

Twin-spot MARCM to reveal the developmental origin and identity of neurons

Nature Neuroscience 12, 947-953 (2009)

The establishment of gene silencing at single-cell resolution

Nature Genetics 41, 800-806 (2009)

DNA relaxation dynamics as a probe for the intracellular environment

PNAS 106, 9250-9255 (2009)

Interactive exploration of chemical space with Scaffold Hunter

Nature Chemical Biology 5, 581-583 (2009)

Bioactivity-guided mapping and navigation of chemical space

Nature Chemical Biology 5, 585 – 592 (2009)