The difference between metagenomics and amplicon sequencing
(i.e. why a 16S survey is NOT metagenomics)
This post was triggered by a manuscript in which the auhthors referred to their 16S rRNA gene survey as “amplicon-metagenomics” and “amplicon-based metagenomics”. A subsequent google search revealed that these misleading terms have been used in several scientific papers, including in Nature journals, e.g. (Xu et al. 2015). The range of chimeric terms invented for 16S amplicon studies includes misnomers such as “16S rRNA gene-based metagenomic analysis” and “16S rRNA metagenomics-based survey”.

A large sequencing company takes the cake by jumping on this misleading bandwagon and naming their app, designed to classify 16S rRNA amplicon reads, the “16S Metagenomics app”. They even provide their own curated versions of the Greengenes 16S database and call it a “16S Metagenomics Database”.

Let us give these scientists the benefit of the doubt and assume they are not familiar with the differences between metagenomics and amplicon sequencing. Indeed, there are similarities, since both terms refer to techniques designed to bypass the traditional culturing bottleneck and hence work on DNA extracted directly from the environment. However, the differences in (1) underlying methodology, (2) data utilisation, and (3) definitions are vast.

1) Methodology
Amplicon sequencing, also known as “metabarcoding” or “marker gene sequencing” targets one or multiple genes via specific primers. The goal is to amplify only these selected genes for subsequent sequencing (e.g. on Illumina platforms). The most utilised gene is the 16S rRNA gene, which became the gold standard for bacterial and archaeal taxonomy assignments. Well, recently genome-based taxonomy has taken off, but that is a different story.

In contrast, metagenomics targets all DNA in a sample, with the aim to recover genomes or at least large fragments thereof. Nowadays shotgun sequencing, a technique to sequence random DNA fragments, has become the main workhorse in metagenomics. Shotgun metagenomics results in short DNA sequences (reads), which are bioinformatically assembled into larger fragments (contigs and scaffolds), which are then binned to recover entire genomes, termed metagenome-assembled genomes (MAGs). Recently long read sequencing technologies (e.g. by Oxford Nanopore and PacBio) haven been paired with shotgun sequencing to generate MAGs of superior quality, but that is again another story.

 

Image courtesy: astrobiomike.github.io/misc/amplicon_and_metagen

 

2) Data utilisation
Amplicon and metagenomics data are commonly used to answer different questions, e.g. in microbial ecology. The main use of amplicon sequencing data is to establish community profiles, e.g. microbial profiles based on 16S rRNA gene counts. Metagenomics can provide similar result, e.g. read aligners can extract 16S rRNA marker gene reads from metagenomic datasets for taxonomic assignments. However, in addition to this, the recovered genome information from metagenomic datasets permits a much more detailed and informative analysis. Taxonomic assignments from metagenomic datasets can achieve species and even strain-level resolution due to the use of up to a million of marker genes (Truong et al 2015). Gene annotations allow the detection of encoded enzymes and pathways, which in turn enables the inference of metabolic traits that can provide insights into the ecological functions of the recovered genomes. Hence, metagenomics has been used to link microbial dynamics to biogeochemical cycles, decipher evolutionary processes shaping entire microbiomes (e.g. see Grossart et al. 2019), investigate microbial roles in functional ecology, apply genome based culturing approaches and much more.

3) Definition
Finally, the definition of the term metagenomics reflects the fact that this technique targets the entire genome and does not refer to the analysis of only one gene or a set of selected genes. Metagenomics, derived from the term “Metagenome” first used by Handelsman (1998), has been defined as “the genomic analysis of a population of microorganisms” (Handelsman 2004). The authors go on to say that this “direct isolation of genomic DNA from an environment circumvents culturing the organisms under study.” More recently, Marchesi and Ravel (2015) proposed an updated definition of metagenomics as “the process used to characterize the metagenome, from which information on the potential function of the microbiota can be gained.” The authors also define the metagenome as the “collection of genomes and genes from the members of a microbiota, which is obtained through shotgun sequencing of DNA extracted from a sample (metagenomics)”.  Both definitions make explicit that amplicon sequencing, targeting only a selected set of genes, does not fulfil the criteria of metagenomics.

In summary, based on the differences in 1) the applied methodology, 2) the resulting data/information, and 3) the definition of the term, metagenomics is clearly distinguished from amplicon sequencing.  Therefore, disguising an amplicon, e.g. 16S, study as a metagenomics analysis is not only bad practice, it also misleads the reader to believe that genomic data with all its possible inferences will be presented, when in fact the analysis is based on a single gene alone.

I hope this post clarifies what sets metagenomics apart from amplicon sequencing and that it will help reduce the practice of using misleading characterisations for 16S rRNA gene based and other amplicon studies.  

 

Additional FAQs:

Q: What if I target multiple genes with amplicon sequencing? If I develop primers to amplify tens or even hundreds of genes, is it metagenomics then?
A: No. Amplifying hundreds of genes with targeted primers is still amplicon sequencing because you are targeting specific genes instead of the entire genome. It does not equal metagenomics, a technique based on shotgun sequencing.