16S rRNA Amplicon Sequencing for Gut, Oral, and Environmental Microbiome Analysis: From Sample to Community Profile

You have your samples in the freezer. You know 16S rRNA amplicon sequencing is the right tool — not shotgun metagenomics, not qPCR, not culturomics. But the gap between "I know I need 16S" and "my data is in hand and makes biological sense" is wider than most protocols admit. Which hypervariable region? How many reads? Biological replicates or just technical? And what do you do when your soil sample yields 12 ng of DNA, half of it humic acid?

This article walks through the decisions that determine whether a 16S project produces meaningful community profiles or uninterpretable noise. We organize the discussion by sample type — gut, oral, and environmental — because the right answer to nearly every methodological question depends on where your samples come from.

The 16S rRNA Gene as a Molecular Clock

The 16S ribosomal RNA gene is roughly 1,500 base pairs long and contains nine hypervariable regions (V1 through V9) interspersed among highly conserved stretches. The conserved regions serve as universal primer binding sites; the variable regions provide phylogenetic signal. No single variable region captures the full taxonomic resolution the gene can offer.

V3-V4 is the de facto standard for gut microbiome studies. It spans roughly 460 bp and captures enough variation to resolve most genera and some species within the major gut phyla — Firmicutes, Bacteroidota, Actinobacteriota, and Proteobacteria. The Earth Microbiome Project standardized on the 515f/806r primer pair targeting V4, and tens of thousands of publicly available datasets use this amplicon, making it the most cross-comparable choice for gut work.

For oral microbiome samples, V1-V3 consistently outperforms V3-V4. The oral cavity is dominated by streptococci, and species-level discrimination within Streptococcus requires the hypervariability captured by V1 and V2. A 2025 simulation study across oral taxa found that V2V3 alone identified 135 species, while pooling multiple amplicon regions reached 204 species — but V3-V4 alone missed key oral pathogens. If your study involves subgingival plaque, saliva, or tongue dorsum samples and you care about species-level assignments, choose V1-V3.

Full-length 16S sequencing — amplifying the entire ~1,500-bp gene via PacBio HiFi or Oxford Nanopore — raises species-level classification from roughly 48% (Illumina V3-V4) to 63-76%. For research applications requiring species-level pathogen discrimination — such as distinguishing Staphylococcus aureus from S. epidermidis in culture collections or research cohorts — that gap matters. The trade-off is per-sample cost: short-read amplicon sequencing can pool hundreds of samples in a single MiSeq run, while long-read runs typically accommodate fewer samples at higher per-read cost.

16S rRNA gene structure showing V1-V9 hypervariable regions and amplicon coverage of V3-V4, V1-V3, and full-length strategies Figure 1: 16S rRNA Gene Structure and Hypervariable Region Coverage

End-to-End 16S Sequencing Workflow

A 16S project moves through five stages: sample collection and preservation, DNA extraction, library construction, sequencing, and bioinformatic analysis. Each stage creates opportunities for bias — some addressable, some you have to live with and document.

Sample Collection and Preservation

The single most consequential variable is how fast microbial metabolism stops after collection. For fecal samples, immediate freezing at -80°C remains the gold standard, but DNA/RNA Shield or 95% ethanol at room temperature preserves community composition adequately for 16S when cold chain logistics are unavailable. Oral swabs and subgingival plaque samples degrade faster — aim to freeze or immerse in stabilization buffer within 30 minutes of collection. For environmental water samples, filter immediately on-site and freeze filters; for soil, sieve to 2 mm and freeze or dry within hours.

DNA Extraction

Extraction method introduces more compositional bias than any other wet-lab step. Bead-beating lyses Gram-positive cell walls more effectively than enzymatic lysis alone, but the bead size and beating duration shift the resulting community profile. The practical rule: pick one extraction kit and use it for every sample in your study. Do not mix kits, and do not switch kit lot numbers mid-project without running a side-by-side comparison on a subset of samples.

Yield requirements: most library preparation protocols ask for 1-5 ng/μL of DNA at a minimum volume of 10-20 μL. Fluorometric quantification (Qubit or PicoGreen) is essential — NanoDrop alone overestimates concentration in the presence of RNA, salts, or humic acids. For environmental samples, OD 260/230 below 1.5 signals humic acid carryover, which inhibits downstream PCR. A post-extraction cleanup with SPRI beads or a commercial inhibitor removal kit can salvage borderline samples.

Library Construction: Single vs. Dual-Index Barcoding

Library preparation appends sample-specific barcodes (indexes) and sequencing adapters to the amplicon. Dual-index barcoding, where unique i5 and i7 index pairs identify each sample, is now standard for any project pooling more than 48 samples on one sequencing lane. Single-index schemes create index-switching artifacts — typically 0.1-0.5% of reads get misassigned to the wrong sample — which inflates spurious ASV counts in low-biomass samples sitting adjacent to high-biomass samples on the same flow cell.

When pooling samples for sequencing, two additional considerations apply. First, balance the total DNA input across samples — a 96-well plate where well A1 has 50 ng and well H12 has 2 ng will produce dramatically uneven read counts after normalization. Second, for low-biomass samples, consider running them on a separate sequencing lane from high-biomass samples, or at minimum, physically separate them on the plate (e.g., group low-biomass samples in columns 1-3 rather than interspersing them). This limits the impact of index-switching artifacts on your lowest-concentration samples, which are also the ones most vulnerable to contamination.

Sequencing Platform Options

Platform	Amplicon	Throughput per Run	Typical Cost per Sample (96-plex)	Best For
MiSeq v2 (2×250)	V3-V4, V4	12-15M reads	Low-moderate	Small-medium projects, V1-V3
MiSeq v3 (2×300)	V3-V4, V1-V3	22-25M reads	Moderate	Overlapping paired-end for longer amplicons
NovaSeq SP/XP	V3-V4, V4	800M+ reads	Low (at scale)	Large cohorts, 200+ samples
Nanopore MinION	Full-length 16S	Variable (user-controlled)	Moderate	Species-level resolution, field deployment
PacBio Sequel II	Full-length 16S	4M CCS reads	Higher	Highest accuracy long reads

For most academic projects with 50-200 samples targeting V3-V4 or V1-V3, the MiSeq v3 chemistry (2×300 bp) provides adequate coverage depth at the lowest practical cost. NovaSeq becomes economical above roughly 300 samples and is the preferred platform for large cohort studies, though it requires careful lane-allocation planning to avoid batch effects. Full-Length 16S/18S/ITS Amplicon Sequencing via Nanopore or PacBio is the choice when species-level taxonomy is scientifically necessary — for biomarker discovery, research isolate characterization, or studies of genera with high species diversity like Bifidobacterium or Pseudomonas.

Figure 2: End-to-End 16S rRNA Amplicon Sequencing Workflow

Gut Microbiome: From Healthy Cohorts to Disease Studies

The gut microbiome is the most sequenced ecosystem on Earth. Because of this, the reference databases are richest for gut taxa, and the methodological norms are most mature. But the maturity of the field has a downside: statistical power requirements have escalated, and reviewers now expect study designs that many PIs underestimate at the grant stage.

V4 or V3-V4 for Fecal Samples

For human fecal samples, either V4 alone (515f/806r, ~250 bp) or V3-V4 (~460 bp) produces robust genus-level profiles. V4-only has the advantage of perfect overlap with the Earth Microbiome Project, enabling direct comparison to thousands of published samples. V3-V4 provides marginally better species-level discrimination in the Bacteroidota and Firmicutes. For mouse fecal samples, the same primer sets work, but be aware that mouse gut communities are far less diverse than human — targeting 50,000 reads per sample is more than sufficient, whereas human samples benefit from 80,000-100,000.

Study Design: Replicates, Confounders, and Longitudinal Sampling

The most common design error in gut 16S studies is insufficient biological replication. A single mouse cage or a single timepoint from one human subject is not a replicate of anything except that cage or that person. For human cross-sectional studies, a minimum of 20-30 subjects per group is necessary to detect genus-level abundance differences of 2-fold or greater with 80% power, and that assumes the groups are reasonably homogeneous in diet, age, and medication history. In practice, many published studies with n=10 per group are underpowered, and the "statistically significant" taxa they report are as likely noise as signal.

Longitudinal designs — multiple timepoints from the same subjects — are statistically more efficient because each subject serves as their own control. A study with 15 subjects sampled at three timepoints can outperform a cross-sectional study with 40 subjects per group in detecting within-subject shifts. The caveat: longitudinal designs require explicit paired-sample statistical models (paired PERMANOVA, mixed-effects models with subject as random effect). Running a standard unpaired test on paired data discards the statistical power you paid to create. In practical terms: if you collected three timepoints from the same 20 subjects, you have 60 samples — but treating all 60 as independent inflates your false positive rate because samples from the same person are correlated. A mixed-effects model with subject ID as a random intercept accounts for this within-subject correlation.

For dietary intervention studies and randomized controlled trials, the practical benchmark has shifted. Recent RCTs with 16S as a primary outcome routinely enroll 80-200 subjects and collect fecal samples at baseline, midpoint, endpoint, and washout. 16S/18S/ITS Amplicon Sequencing at this scale demands careful batching: randomize treatment and control samples across sequencing runs, never sequence all controls in one run and all treatments in another. Batch effect is real, and it confounds treatment effect when allocation is not randomized across plates.

When 16S Is Not Enough

If your biological question involves strain-level transmission, antimicrobial resistance gene content, or metabolic pathway activity, 16S taxonomy alone will not answer it. See the complementary service discussion in "How to Plan Your 16S Project" below for a full breakdown.

Oral Microbiome: Beyond Caries

The oral cavity contains at least 700 bacterial species distributed across distinct niches — subgingival crevice, supragingival plaque, tongue dorsum, buccal mucosa, and saliva. Each niche has a different community structure, and the optimal 16S strategy differs by niche.

Why V1-V3, Not V4

The oral microbiome is dominated by streptococci, and as noted earlier, V1-V3 provides far better streptococcal species discrimination than V3-V4 — S. mitis, S. oralis, and S. pneumoniae share near-identical V4 sequences but are resolved by V1-V2. However, primer choice alone is insufficient without the right reference database. In a 2025 benchmarking study, even the optimal V-region underperformed when paired with a generic database, which brings us to eHOMD.

eHOMD: The Oral-Specific Database

For taxonomic classification of oral 16S data, the extended Human Oral Microbiome Database (eHOMD) provides species-level resolution that SILVA and Greengenes2 cannot match. eHOMD is curated specifically for oral taxa and includes provisional species designations for uncultured oral bacteria. The practical workflow: run DADA2 to generate ASVs, classify against SILVA for broad taxonomy, then re-classify against eHOMD for oral-specific resolution. This two-step approach catches oral taxa that SILVA misclassifies or leaves at genus level.

Sample Types and Collection

Subgingival plaque collected with paper points provides the most clinically informative signal for periodontitis studies but yields the lowest DNA quantities — often 1-5 ng total. Saliva is high-yield but represents a pooled community that blurs niche-specific signals. Tongue dorsum swabs capture a distinct community enriched in anaerobes that correlates surprisingly well with halitosis-associated volatile sulfur compound production. For studies linking oral health to systemic conditions, sampling multiple niches is ideal, but if only one sample type is feasible, subgingival plaque provides the strongest disease association signal.

A 2026 population-based study (PAROMIND, n=1,026) using subgingival 16S profiling linked Porphyromonas, Fretibacterium, Tannerella, and Dialister abundances to cognitive decline, reinforcing what the periodontal literature long suspected: the oral cavity is a window to systemic inflammation. Studies of this scale are becoming the expected norm for oral-systemic connection research.

Oral cavity cross-section showing five sampling sites with recommended 16S regions and DNA yields Figure 3: Oral Microbiome Sampling Sites and Recommended 16S Strategies

Environmental 16S: Soil, Water, and Extreme Environments

Environmental samples break the standard 16S playbook. Reference databases are sparse, community diversity is orders of magnitude higher than host-associated samples, and the physical matrix — soil humics, sediment particles, filter membranes — interferes with every step from extraction to PCR.

The Low-Biomass Problem

A gram of rich soil may yield micrograms of DNA, but a liter of oligotrophic seawater filtered onto a 0.22 μm membrane may yield nanograms. Low-biomass samples amplify every source of contamination: kit reagents (the "kitome"), lab air, pipette tips, and even the sterile water used for blanks. The minimum defense is running at least three types of negative controls in every sequencing batch: an extraction blank (no sample, processed through the entire extraction workflow), a PCR blank (molecular-grade water substituted for template), and a field blank (a sterile swab or filter exposed to the sampling environment). If a taxon appears at higher relative abundance in a negative control than in your samples, exclude it.

For studies involving groundwater, deep marine sediment, glacial ice, or other extremely low-biomass matrices, Absolute Quantitative 16S/18S/ITS Amplicon Sequencing, which adds spike-in standards to convert relative abundances to absolute cell counts per sample, provides a critical sanity check when total 16S copy numbers are near the detection limit.

Soil: Humic Acids and Inhibitor Management

Soil DNA extractions are uniquely challenging because humic acids co-extract with DNA and inhibit Taq polymerase. The visible sign is brown-colored eluate; the invisible sign is qPCR Cq values that shift 3-5 cycles later than expected. DNeasy PowerSoil Pro remains the most widely validated option. For high-humic soils, post-extraction cleanup with SPRI beads at a 0.8x ratio removes most inhibitors without substantial DNA loss. Do not dilute the DNA to overcome inhibition — you are also diluting the template, and low-abundance taxa will drop below detection.

ASV vs. OTU in Environmental Contexts

For environmental samples, DADA2's default error model can over-split genuine biological microdiversity — a single genome can generate multiple ASVs because of intragenomic 16S copy variation. HmmUFOtu, a de novo OTU clustering tool, retains 89-93% of reads compared to DADA2's 18-44% in some environmental datasets — a performance gap documented in a 2025 Environmental Microbiome benchmarking study using a 227-strain mock community — making it a better choice when sample diversity is high and reference coverage is low. If using ASVs, consider post-clustering at 97-99% identity to collapse likely intragenomic variants — the 2025 consensus from an Environmental Microbiome benchmarking study is that this trade-off sacrifices some biological resolution but substantially reduces spurious taxa.

Marine and Freshwater

For water samples, filter enough volume to capture the microbial biomass without clogging the membrane. Sterivex filters (0.22 μm) are the standard for seawater and large-volume freshwater. For turbid freshwater, pre-filter through a 5 μm membrane to remove particulates, then collect microbes on a 0.22 μm membrane. Filter membrane material matters: polyethersulfone (PES) membranes generally yield higher DNA recovery than polycarbonate for bacterial cells, but polycarbonate is preferred when eukaryotic DNA (18S) will also be extracted from the same filter.

For marine samples, sequencing V6-V8 captures more phylogenetic diversity than V4 in under-characterized aquatic clades including SAR11, marine Actinobacteria, and uncultured Gammaproteobacteria. Freshwater samples, particularly from eutrophic lakes, benefit from V4 for cross-comparability with existing freshwater datasets. In both cases, the limited representation of aquatic taxa in reference databases means that a high proportion of ASVs may classify only to family or order level — this is a database limitation, not a sequencing failure, and filtering these unclassified ASVs will discard ecologically meaningful members of your community.

Figure 4: Environmental 16S Low-Biomass Workflow with QC Checkpoints

From FASTQ to Community Profile

Bioinformatic analysis converts millions of short reads into interpretable community profiles. The pipeline choices you make here are as consequential as the wet-lab decisions upstream.

ASV vs. OTU

Amplicon sequence variants (ASVs) produced by DADA2 are now the default for most 16S studies. ASVs offer single-nucleotide resolution, are reproducible across studies, and eliminate the arbitrary 97% clustering threshold of traditional OTUs. However, the over-splitting problem is real — especially for taxa with multiple rRNA operon copies (Bacillus, Clostridium, and many environmental bacteria). If your ASV table shows 5,000+ ASVs from 30 gut samples, something is likely wrong. Filtering ASVs present in fewer than 2-3 samples or at a mean relative abundance below 0.01% usually cleans up artifacts without losing ecologically meaningful rare taxa.

DADA2 Pipeline Essentials

The standard DADA2 workflow in R processes paired-end reads through quality filtering (filterAndTrim with maxEE=c(2,2)), error model learning, sample inference, paired-end merging (minimum overlap 12-20 bp), chimera removal, and taxonomy assignment. Two parameters that deserve more attention than they get:

1. Minimum overlap for merging: Set too low (8-10 bp) and you get spurious merged reads; set too high (over 30 bp) and you lose reads from the right tail of the amplicon length distribution. For V3-V4 with 2×300 sequencing, 20 bp is a safe default.

2. Taxonomy assignment: SILVA v138.1 remains the most broadly validated reference, but Greengenes2 and GTDB offer advantages for specific questions. Greengenes2 is phylogenetically consistent and well-suited for gut taxa; GTDB provides genome-based taxonomy that avoids outdated phenotypic classifications. For oral samples, the two-step SILVA-then-eHOMD approach described above is the current best practice.

Alpha and Beta Diversity: Choosing the Right Metric

Observed ASVs and Shannon diversity are the most commonly reported alpha-diversity metrics, and they are often the wrong ones for the biological question. If you care about richness (how many taxa are present), use Chao1 or observed ASVs. If you care about evenness (how equally taxa are distributed), use Shannon or Simpson. If you care about phylogenetic diversity, use Faith's PD. Reporting Shannon because every paper reports Shannon is a missed opportunity to align the metric with the question.

For beta diversity, weighted UniFrac incorporates both presence/absence and relative abundance of phylogenetically related taxa; unweighted UniFrac considers only presence/absence. Bray-Curtis is a non-phylogenetic alternative that performs well when reference phylogenies are unreliable — as is often the case for environmental samples with poorly characterized taxa.

PICRUSt2 and the NSTI Caveat

PICRUSt2 and Tax4Fun2 predict functional gene content from 16S data by matching ASVs to the nearest sequenced genomes in a reference database. The key quality metric for PICRUSt2 is the Nearest Sequenced Taxon Index (NSTI) — the average phylogenetic distance between each ASV in your sample and its closest sequenced reference genome. The default NSTI cutoff is 2.0. NSTI values above 0.15 are considered high for human gut samples and indicate that a substantial fraction of your community lacks close sequenced relatives. For environmental samples, NSTI values routinely exceed 0.5, at which point functional predictions should be treated as suggestive at best. Do not build a paper's central conclusion on PICRUSt2 output from samples with NSTI above 0.25.

If functional inference is central to your research question, skip PICRUSt2 and invest in Metagenomic Shotgun Sequencing, which directly sequences the gene content rather than predicting it from taxonomy. The cost differential has narrowed considerably: a shallow shotgun metagenome (5M reads/sample) now costs roughly 2-3 times a 16S amplicon library and provides direct functional annotation plus improved species-level taxonomy. For projects where functional questions are the primary endpoint, this is money well spent.

Figure 5: Bioinformatics Pipeline from FASTQ to Biological Insight

How to Plan Your 16S Project

The difference between a project that finishes in 8 weeks and one that drags on for 6 months often comes down to decisions made before the first sample is collected.

Consider a scenario we see frequently: a PhD student has collected 48 fecal samples from a dietary intervention trial. She has budget for one MiSeq run. The question is not "can I sequence these?" but "how do I allocate 48 samples across one 96-well plate, which controls do I include, and what read depth can I realistically expect?" The answers determine whether three years of sample collection produce publishable data or a frustrating lesson in experimental design.

Replicates and Sequencing Depth

Biological replicates (independent samples from different subjects or field plots) are non-negotiable. Technical replicates (the same sample sequenced twice) are almost never worth the cost — modern library preparation and sequencing are precise enough that technical replication adds negligible information for 16S.

Sample Type	Minimum Biological Replicates per Group	Recommended Reads per Sample	Notes
Human fecal	20-30	80,000-100,000	More for cross-sectional; fewer for longitudinal
Mouse fecal	5-8	50,000	Cage effects; treat cage as a random effect
Subgingival plaque	15-25	50,000-80,000	Low biomass; monitor negative controls closely
Saliva	20-30	50,000-80,000	Pooled community; higher within-group variance
Soil (agricultural)	8-12 per treatment	80,000-100,000	High diversity; spatial heterogeneity
Water (filtered)	5-8 per site	50,000-80,000	Volume-dependent; negative controls essential

Budget and Platform Logistics

The per-sample cost of 16S sequencing has dropped dramatically, but hidden costs remain. Library preparation kits, DNA extraction reagents, shipping, and bioinformatics time add 30-50% to the sequencing-only quote. When comparing CRO quotes, ask for an all-inclusive per-sample price covering extraction through basic bioinformatics (FASTQ + ASV table + taxonomy). For projects with over 96 samples, confirm that the CRO randomizes samples across sequencing plates rather than batching by group — this should be non-negotiable and explicitly stated in the service agreement.

Amplicon Sequencing Services at CD Genomics cover the full 16S workflow from sample QC to data delivery, including low-biomass sample handling and customized bioinformatics support. For projects where budget is the primary constraint, the article "Cost-Effective Amplicon Sequencing for Student Projects, Pilot Studies, and Small Labs" outlines strategies to reduce cost without sacrificing data quality.

When 16S Alone Is Insufficient

A 16S survey tells you who is there. It does not tell you what they are doing, what genes they carry, or whether they are alive or dead at the time of sampling. If your hypotheses require functional annotation, consider complementing 16S with Metagenomic Shotgun Sequencing. If you need to know which community members are transcriptionally active, Metatranscriptomic Sequencing or RNA-Seq adds an expression layer. If you have isolated a specific bacterial strain of interest and want to characterize its genome, Bacterial Whole Genome de novo Sequencing provides complete genomic context that 16S cannot.

For species-level identification without culturing, Microbial Identification services integrate 16S profiling with complementary approaches. And if you are targeting taxa at the very edge of 16S resolution, Full-Length 16S/18S/ITS Amplicon Sequencing using long-read platforms closes the resolution gap.

For a broader view of how 16S fits into the amplicon sequencing landscape — including 18S, ITS, and DNA barcoding options — see the article "Amplicon Sequencing Services for Microbiome and Biodiversity Research: 16S, 18S, ITS, and DNA Barcoding Solutions."

FAQ

How many reads per sample do I really need for 16S?

50,000-100,000 for most sample types. Human fecal and soil need the high end; mouse gut and low-diversity consortia can work with 30,000-50,000. Run a rarefaction curve to confirm saturation at your chosen depth.

V3-V4 or full-length 16S — which should I choose?

Short-read V3-V4 for genus-level profiling on a budget. Full-length 16S (PacBio or Nanopore) when species-level resolution matters — pathogen discrimination, biomarker discovery, or genera with high species diversity like Pseudomonas and Bifidobacterium.

Can I compare 16S data generated on different sequencing platforms or primer sets?

Only with caution. Platform and primer effects are real and can be larger than the biological effect you are studying. If you must combine datasets, use ComBat or MMUPHin for batch correction, and acknowledge the limitation explicitly. Never pool data from different primer sets without batch-adjusting.

How many biological replicates do I need?

At least 5 per group for animal studies (with cage as a random effect), 20-30 per group for human cross-sectional studies, and 8-12 per treatment for agricultural field plots. These numbers assume medium-to-large effect sizes (2-fold abundance differences). If you are looking for subtle shifts, double them.

What negative controls should I include?

Three types per sequencing run: an extraction blank (no-sample control processed through the entire extraction workflow), a PCR blank (water substituted for DNA template), and a field blank (sterile collection device exposed to the sampling environment). If a taxon appears in a negative control at greater than 1% of its abundance in real samples, exclude it.

Should I use ASVs or OTUs?

ASVs (via DADA2) for most applications — they are reproducible, offer single-nucleotide resolution, and are the current standard. OTUs (via HmmUFOtu or UPARSE) when working with environmental samples where reference databases are sparse and intragenomic 16S variation causes over-splitting.

How reliable is PICRUSt2 functional prediction?

Reliable for human gut samples (NSTI typically below 0.15) where reference genome coverage is excellent. Unreliable for environmental samples (NSTI often above 0.5) and non-model host species. Always report NSTI values and treat predictions from high-NSTI samples as hypothesis-generating, not conclusive.

What is the turnaround time for outsourced 16S sequencing?

Typical CRO timelines range from 2-8 weeks depending on project size, sample type, and bioinformatics deliverables. Factor in an additional 2-3 weeks for sample shipping, customs clearance (if international), and quality control. Communicate the expected timeline before collecting samples.

References:

Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research. 2013;41(D1):D590-D596. doi:10.1093/nar/gks1219
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods. 2016;13(7):581-583. doi:10.1038/nmeth.3869
Wemheuer F, Taylor JA, Daniel R, et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environmental Microbiome. 2020;15:11. doi:10.1186/s40793-020-00358-7
Tabari K, Goyal A, Floyd A, et al. FAVABEAN and FALAPhyl: open-source pipelines for scalable 16S rRNA microbiome data processing and visualization. PLoS ONE. 2026;21(4):e0331145. doi:10.1371/journal.pone.0331145
Escapa IF, Chen T, Huang Y, Gajare P, Dewhirst FE, Lemon KP. New insights into human nostril microbiome from the expanded Human Oral Microbiome Database (eHOMD). mSystems. 2018;3(6):e00187-18. doi:10.1128/mSystems.00187-18
Chen T, Yu WH, Izard J, Baranova OV, Lakshmanan A, Dewhirst FE. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database. 2010;2010:baq013. doi:10.1093/database/baq013
Thompson LR, Sanders JG, McDonald D, et al. A communal catalogue reveals Earth's multiscale microbial diversity. Nature. 2017;551(7681):457-463. doi:10.1038/nature24621
Nearing JT, Douglas GM, Hayes MG, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications. 2022;13(1):342. doi:10.1038/s41467-022-28034-z

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

Related Services

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.