Metagenomics versus amplicon sequencing
The difference between metagenomics and amplicon sequencing, i.e. why a 16S rRNA survey is NOT metagenomics.
This post was triggered by a manuscript in which the authors referred to their 16S rRNA gene survey as “amplicon-metagenomics” and “amplicon-based metagenomics”. A subsequent google search revealed that these misleading terms have been used in several scientific papers, including in Nature journals, e.g. (Xu et al. 2015). The range of chimeric terms invented for 16S amplicon studies includes misnomers such as “16S rRNA gene-based metagenomic analysis” and “16S rRNA metagenomics-based survey”.
A large sequencing company takes the cake by jumping on this misleading bandwagon and naming their app, designed to classify 16S rRNA amplicon reads, the “16S Metagenomics app”. They also provide their own curated versions of the public Greengenes 16S database and call it a “16S Metagenomics Database”.
Differences between metagenomics and amplicon sequencing
Let us give these colleagues the benefit of the doubt and assume they are not familiar with the differences between metagenomics and amplicon sequencing. Indeed, there are similarities, since both terms refer to techniques designed to bypass the traditional culturing bottleneck and hence work on DNA extracted directly from the environment. However, the differences in (1) underlying methodology, (2) data utilisation, and (3) definitions are substantial.
In contrast, metagenomics targets all DNA in a sample, with the aim to recover genomes or at least large fragments thereof. Nowadays shotgun sequencing, a technique to sequence random DNA fragments, has become the main workhorse in metagenomics. Shotgun metagenomics results in short DNA sequences (reads), which are bioinformatically assembled into larger fragments (contigs and scaffolds). These fragments are then binned to recover genomes, termed metagenome-assembled genomes (MAGs). Recently long read sequencing technologies (e.g. by Oxford Nanopore and PacBio) haven been paired with shotgun sequencing to generate MAGs of superior quality, but that is again another story.
2) Data utilisation
Metagenomics can provide a similar result, e.g. read aligners can extract 16S rRNA marker gene reads from metagenomic datasets for taxonomic assignments. However, in addition to this, the recovered genome information from metagenomic datasets permits a much more detailed and informative analysis. Taxonomic assignments from metagenomic datasets can achieve species and even strain-level resolution due to the use of up to a million of marker genes (Truong et al 2015). Gene annotations allow the detection of encoded enzymes and pathways, which in turn enables the inference of metabolic traits that can provide insights into the ecological functions of the recovered genomes.
Hence, metagenomics has been used to link microbial dynamics to biogeochemical cycles, decipher evolutionary processes shaping entire microbiomes (e.g. see Grossart et al. 2019), investigate microbial roles in functional ecology, apply genome based culturing approaches and much more.
Metagenomics, derived from the term “Metagenome” first used by Handelsman (1998), has been defined as “the genomic analysis of a population of microorganisms” (Handelsman 2004). The authors go on to say that this “direct isolation of genomic DNA from an environment circumvents culturing the organisms under study.”
More recently, Marchesi and Ravel (2015) proposed an updated definition of metagenomics as “the process used to characterize the metagenome, from which information on the potential function of the microbiota can be gained.” The authors also define the metagenome as the “collection of genomes and genes from the members of a microbiota, which is obtained through shotgun sequencing of DNA extracted from a sample (metagenomics)”. Both definitions make explicit that amplicon sequencing, targeting only a selected set of genes, does not fulfil the criteria of metagenomics.
Based on the differences in 1) the applied methodology, 2) the resulting data/information, and 3) the definition of the term, metagenomics is clearly distinguished from amplicon sequencing. Therefore, disguising an amplicon, e.g. 16S, study as a metagenomics analysis is not only a bad practice, it also misleads the reader to believe that genomic data with all its possible inferences will be presented, when in fact the analysis is based on a single gene alone.
I hope this post clarifies what sets metagenomics apart from amplicon sequencing and will help to spread the use of the appropriate terms in the literature.
Q: What if I target multiple genes with amplicon sequencing? If I develop primers to amplify tens or even hundreds of genes, is it metagenomics then?