CD Genomics-the genomics service company
Support Documents The CD Genomics Way of Thinking Explore the scientific documents we’ve developed, including sample submission guidelines, principles, applications, and bioinformatics of genetic technologies.
Home / Resource / Support Documents / Genome Research / Bioinformatics Analysis of 16S rRNA Amplicon Sequencing

Bioinformatics Analysis of 16S rRNA Amplicon Sequencing

This article provides a brief introduction to good practices for the bioinformatics analysis of 16S rRNA sequencing by NGS (next-generation sequencing). The bioinformatics pipeline involves two main stages: the preprocessing of data (quality control) and quantification (including taxonomic profiling and predictive metagenomics profiling).

Bioinformatics Analysis of 16S rRNA Amplicon Sequencing

Figure 1. Bioinformatics pipeline for NGS-based 16S rRNA amplicon sequencing (Mataragas et al. 2018).

Table 1. Software and statistical tests used in each stage of the pipeline (Mataragas et al. 2018).

Pipeline step Statistical test and software used Alternative software
Processing Qiime v.1.9.0 SILVAngs pipeline
BMPOS pipeline
Taxonomic profiling (OTUs) SILVAngs pipeline using the SILVA database BMPOS pipeline using the Greengenes databases
EzBioCloud database
One Codex pipeline
Statistical comparison of the metagenomic samples ANOSIM using the Past software Stamp
Microbial community overview Community-Analyzer
Stacked bars chart using GraphPad Prism software
Statistical significance of the identified OTUs METAGENassist MicrobiomeAnalyst
Symbiotic and antagonistic relationships within the microbial community Heatmap using the METAGENassist software MicrobiomeAnalyst
Predictive Metagenomics profiling (PMP) Tax4Fun Picrust
Statistical analysis of the PMP results Kruskal-Wallis H-test with Tuckey-Kramer corrected for multiple tests according to Benjamini-Hockberg
False Discovery Rate using the Stamp software
Orientation of the metagenomic samples of the most abundant KEGG pathways Principal Component Analysis (PCA) using the Past software MicrobiomeAnalyst
Metabolic interactions within the microbial community MMinte
  • Preprocessing to eliminate uninformative data

Removal of adapters, PCR primers, and low quality bases is a necessary step for quality control of sequences. And there are a variety of integrated tools have been developed for this purpose. ‘Q’ is the output quality score for Illumina platforms (Q10 represents 1 error is expected for every 10 bases; Q20 represents 1 error is expected for every 100 bases…). Elimination of sequences with low quality scores can improve the accuracy of bioinformatics analyses. Compared with shotgun sequencing, this is more significant for 16S rRNA amplicon sequencing. For 16S rRNA gene sequencing, it is supposed to set a quality threshold as high as possible and to trim sequences along the entire length.

  • Taxonomical classification of bacterial sequences

Prior to taxonomic classification, bacterial 16S rRNA genes are clustered by two main approaches. One is to cluster these sequences into phylotypes based on their similarity to reference database, the other is to cluster sequences into operational taxonomic units (OTUs) using a 97% similarity threshold, only according to their similarity. The available reference databases for annotation of 16S rRNA gene include Greengenes database, Ribosomal database project (RDP), SILVA, and Human microbiome project (HMP).

  • Beta (β) diversity to compare microbiomes

Beta (β) diversity measures the difference in bacterial community composition for different samples. Before quantifying β diversity, the read counts (reads mapped to each taxon) must be normalized to minimize the technical variability between samples. There are two common normalization procedures: the total sum and upper quartile normalization.

There are two main methods for quantifying β diversity: phylogenetic β diversity that considers the evolutionary differences between communities (such as UniFrac), and non-phylogenetic or taxon-based methods (such as Bray-Curtis dissimilarity). Once distances or dissimilarities between samples have been determined, they can be ordinated in a low-dimensional space to better illustrate how closely related they are to each other. The two most commonly used ordination tools are principal coordinate analyses (PCoA) and non-metric multidimensional scaling (NMDS).

Bioinformatics Analysis of 16S rRNA Amplicon Sequencing

Figure 2. NMDS and PCoA for quantification of beta diversity (Jovel et al. 2016).

  • Predictive metagenomics profiling

OTU abundance table can be further used to presume for metabolic functions. It is a process to understand the role of the microbiome on host metabolism and disease. There are currently three powerful tools for predictive metagenomics profiling (PMP): PICRUSt, Tax4Fun, and Piphillin.

Future perspectives

16S rRNA amplicon sequencing is popular due to its cost-efficient, time-effective, and informative features. But it is also limited by several disadvantages. First, 16S is well suited for multiple patients, longitudinal studies, but provides limited taxonomic and functional information. Second, The PCR amplification of different regions of 16S rRNA gene may generate discordant results owing not only to the distinct binding affinities for the corresponding flanking conserved regions, but also owing to the resolution of each variable region across taxa. Therefore, full-length 16S rRNA sequencing or shotgun metagenomics may sometimes be more favorable, especially the latter.


  1. Mataragas M, Alessandria V, Ferrocino I, et al. A bioinformatics pipeline integrating predictive metagenomics profiling for the analysis of 16S rDNA/rRNA sequencing data originated from foods. Food Microbiology, 2018.
  2. Yang B, Wang Y, Qian P Y. Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. BMC bioinformatics, 2016, 17(1): 135.
  3. Jovel J, Patterson J, Wang W, et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Frontiers in microbiology, 2016, 7: 459.

What would you like to discuss?

With whom will we be speaking?

Please input "genomics" as verification code.

* is a required item.

Get cutting-edge science information from CD Genomics sent straight to your inbox every month.


45-1 Ramsey Road, Shirley, NY 11967, USA
Tel: 1-631-275-3058
Fax: 1-631-614-7828