The GEBA-MDM project overview

Reconstructing bacterial and archaeal genomes has revolutionized our understanding of microbial metabolism as well as evolutionary processes, and significantly sped up discoveries made in bioprospecting. We have surpassed the 5,000 mark for bacterial and archaeal sequenced genomes worldwide, yet the great majority of bacterial and archaeal genomes sequenced to date are of rather limited phylogenetic diversity. These findings gave rise to the initiation of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) Project 1, which populated the tree of life with phylogenetically diverse reference genomes. While the bulk of all sequenced microbial genomes are derived from cultivated representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes of these largely mysterious species. Following in the footsteps of the GEBA project, we initiated the Microbial Dark Matter (MDM) Project in 2011 to massively expand genomic representation further by targeting 200 representatives of candidate phyla (phyla proposed on the basis of environmental sequences that have no cultivated representatives) via high throughout single-cell sequencing.

The insight derived from the ~200 MDM genomes belonging to 29 major uncharted branches of the tree were invaluable 2, providing a first glimpse into the coding potential and the phylogeny of candidate phyla. The single-cell data enabled us to resolve numerous intra- and inter-phylum level relationships and propose new superphyla. In addition we named 18 candidate phyla for which we greatly expanded sequence space. We discovered unique genomic and metabolic features such as a previously unseen codon reassignment for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete bacterial-like sigma factors in Archaea. The single-cell genomes also improved the binning of metagenome data by read assignment to an organism, facilitating our ability to interpret sequence data from diverse environments, which will be of tremendous value for microbial ecology and evolutionary studies to come.

The propose phase II of the MDM project is designed to further deepen our understanding of dark matter by targeting 1000 genomes from candidate phyla. Habitats of high phylogenetic diversity (PD), as based on SSU rRNA surveys, will be selected for single-cell sorting for the recovery of single amplified genomes (SAGs) belonging to candidate phyla. We expect this to be a 2-year project that will require a total sequencing effort of 3 flowcells for a total of ~1.1 Tbp (assuming a pooling strategy of 48 SAGs/ Hiseq channel). We propose to only target taxonomic groups within candidate phyla with no or few sequenced representatives to maximize phylogenetic coverage, the selection of which will in part be driven by the outcome of the single-cell sorts. For some taxonomic groups we aim to specifically target and sequence populations.

1. Wu, D. et al. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462, 1056-1060 (2009).

2.  Rinke, C. et al. Insights into the Phylogeny and Coding Potential of Microbial Dark Matter. Nature (2013).