Shotgun Metagenomic Sequencing for Gut Microbiome Studies: From Fecal DNA to Taxonomic and Functional Profiling of Large Clinical Cohorts

A gastroenterologist launches a 300-patient prospective study to understand why some patients with Crohn's disease respond to biologic therapy and others do not. The grant is written, the IRB approved, the clinical coordinators hired. Then comes the microbiome question: what sequencing approach will deliver the answers the grant reviewers asked for?

16S rRNA gene sequencing would reveal which bacterial genera differ between responders and non-responders. But the reviewers wanted metabolic pathways — short-chain fatty acid synthesis, bile acid transformation, drug metabolism — and they asked about fungi and viruses, not just bacteria. 16S cannot answer those questions. Shotgun metagenomics can.

This guide walks through the end-to-end workflow for a large-scale gut microbiome study using shotgun metagenomic sequencing, from the moment a fecal sample leaves a participant's home to the point where a biostatistician runs the final differential abundance model. It covers what 16S-focused guides omit: how to preserve samples at scale, how to deplete host DNA without distorting microbial profiles, which taxonomic and functional profilers to choose for different questions, and how to size a cohort so the results survive multiple testing correction.

Figure 1: Six-stage workflow diagram of the shotgun metagenomics gut microbiome pipeline from fecal sample collection and preservation
  through DNA extraction, library preparation, sequencing, taxonomic and functional profiling, to clinical insight.Figure 1: Shotgun metagenomics gut microbiome workflow — six stages from sample collection to clinical insight

The Human Gut Microbiome — More Than Composition

The gut microbiome is not a census. Knowing that Bacteroides thetaiotaomicron constitutes 8% of a sample and Faecalibacterium prausnitzii 5% is a start. But what matters biologically is what those organisms are doing — which polysaccharides they ferment, which short-chain fatty acids they produce, whether they carry toxin genes, and whether their metabolic output modulates host immunity.

Shotgun metagenomic sequencing addresses this by sequencing all the DNA in a stool sample — microbial, human, and dietary — rather than amplifying a single marker gene. The result is not just a list of taxa but a catalog of genes, pathways, and functional capacities. This matters because different strains of the same species can carry radically different gene repertoires. One Escherichia coli strain in the gut may be a harmless commensal; another, carrying the pks genomic island, produces colibactin, a genotoxin linked to colorectal cancer. 16S rRNA sequencing sees E. coli. Shotgun sequencing distinguishes the harmless strain from the dangerous one (1).

Three capabilities separate shotgun metagenomics from amplicon-based approaches. First, species- and strain-level taxonomy. 16S reliably reaches genus level for most bacteria. Shotgun metagenomics with tools like MetaPhlAn 4 identifies over 3,000 species in a typical stool sample and, with strain-level profilers like inStrain, can track individual strains across timepoints and between individuals — essential for transmission studies and FMT donor-recipient tracking. Second, functional profiling. HUMAnN 3 maps sequencing reads onto pathway databases like MetaCyc and KEGG, quantifying the abundance of metabolic pathways and enzyme families without requiring genome assembly. Third, coverage of non-bacterial microbes. Fungi, archaea, DNA viruses, and protists are invisible to 16S but captured by shotgun sequencing, providing a more complete picture of the gut ecosystem (2).

The trade-off is cost. A shotgun metagenomic library at 20 million read pairs per sample costs more than a 16S V3-V4 library at 50,000 reads. But the gap is narrowing. Shallow shotgun metagenomics — generating 2 to 5 million reads per sample — now provides species-level taxonomic resolution at a cost approaching that of 16S. For large cohort studies where functional profiling is not the primary endpoint, shallow shotgun is an increasingly pragmatic middle ground.

CD Genomics' Metagenomic Shotgun Sequencing service supports both deep and shallow shotgun approaches, adapting sequencing depth to study-specific needs — pathway-level functional profiling at 20 million reads per sample, or species-level taxonomic screening at 3 million reads.

Figure 2: Three-panel comparison of 16S amplicon, shallow shotgun, and deep shotgun metagenomics showing differences in cost per sample,
  taxonomic resolution, functional gene detection, and suitability for cohort-scale versus in-depth studies.Figure 2: 16S vs. shallow shotgun vs. deep shotgun metagenomics — cost, resolution, and functional coverage comparison

Clinical-Scale Study Design

The decisions made before the first sample is collected determine whether results survive statistical scrutiny. Three of them matter most.

  • Cohort Size and Statistical Power

A 2025 analysis by Zouiouich and colleagues used shallow shotgun metagenomic data from fecal samples collected six months apart to calculate intraclass correlation coefficients — measures of temporal stability — for hundreds of microbial features. The conclusion was sobering. Detecting a modest disease association (odds ratio of 1.5) with 80% power requires hundreds to thousands of cases, not dozens. For low-prevalence species present in 5 to 10% of individuals, the required sample size exceeds 15,000 with a single specimen per participant. With three specimens per person, this drops to roughly 6,000. Matching each case to three controls provides a further reduction (3).

In practice, a well-powered study starts at roughly 100 to 200 subjects per group for common species and functional pathways, and substantially more for rare features. Pilot studies with smaller cohorts remain informative, but the statistical limitations should be acknowledged in the manuscript.

  • Sample Collection and Preservation at Scale

The gold standard is immediate freezing at -80 degrees C. In a multi-site study, this is rarely achievable. Participants collect samples at home. Clinics are in different cities. Freezers fail. The pragmatic question is which preservation method introduces the least bias.

Three options dominate. OMNIgene GUT (DNA Genotek) provides room-temperature stability for up to eight weeks and is the most extensively validated commercial option for large multi-site studies. The standardized kit reduces handling variability across collection sites. Zymo DNA/RNA Shield offers comparable performance with the added advantage of simultaneous RNA stabilization — useful if metatranscriptomics may be added later. Ninety-five percent ethanol is the leading low-cost alternative, validated against OMNIgene GUT for microbiome profiling with comparable stability at a fraction of the cost (4).

A practical caveat: a 2024 study found that for shotgun metagenomics, samples stored without preservative for up to 24 hours followed by freezing performed comparably to ethanol-preserved samples. Samples from participants with active intestinal inflammation showed elevated human DNA fractions when stored with ethanol — relevant for IBD cohorts.

The recommendation for a multi-site study: standardize on one commercial collection kit across all sites, provide identical collection instructions with illustrated guides, record the time between collection and freezer storage for every sample, and include kit-blank negative controls in each batch. The cost of commercial kits is real but smaller than the cost of an underpowered study with uninterpretable batch effects.

  • Metadata That Matter

The gut microbiome covaries with diet, medication, age, BMI, and geography. A study that does not capture these variables cannot disentangle disease-associated microbial shifts from confounding.

At minimum, collect: age, sex, BMI; medication use — especially antibiotics, proton pump inhibitors, metformin, and immunosuppressants — with timing and dosage; dietary pattern, even if only a simple omnivore versus vegetarian classification; bowel movement frequency and consistency; and country of residence. For longitudinal studies, collect this metadata at every timepoint. Medication use is a dominant confounder — metformin alone explains more gut microbiome variation than many disease states, and failure to adjust for it generates spurious associations (5).

Figure 3: Comparison of four fecal sample preservation methods — OMNIgene GUT, Zymo DNA/RNA Shield, ethanol, and fresh frozen — showing
  effects on DNA stability, microbial community composition, and suitability for multi-site cohort studies.Figure 3: Fecal sample collection and preservation methods — OMNIgene, Zymo, ethanol, fresh frozen comparison

DNA Extraction and Library Preparation

DNA extraction method accounts for approximately 21% of overall microbiome variation and significantly affects roughly 32% of detected species. The field has largely converged on bead-beating-based kits. The QIAamp PowerFecal Pro uses 0.1-mm zirconium beads with approximately six minutes of mechanical lysis, providing efficient recovery of Gram-positive organisms whose thick peptidoglycan walls resist gentler enzymatic lysis.

The key principle is not which specific kit to use but that every sample in a given study must be extracted with the same kit, same reagent lot, and same technician. Batch effects from extraction are real, detectable, and preventable.

  • Host DNA Depletion

A stool sample from a healthy individual is roughly 99% microbial DNA. In samples from participants with intestinal inflammation — where epithelial shedding and blood in the stool elevate the human DNA fraction above 50% — a shotgun library without host depletion wastes most sequencing capacity on human reads.

The lyPMA method, combining osmotic lysis of human cells with propidium monoazide treatment to block amplification of free human DNA, reduces the human read fraction from roughly 89% to 8.5% in high-host samples with minimal taxonomic bias. The NEBNext Microbiome DNA Enrichment kit and QIAamp DNA Microbiome Kit are alternatives, though both introduce some AT-rich organism bias.

Computational host filtration after sequencing is essential regardless of whether wet-lab depletion was performed. Current best practice aligns reads against a combined human reference including both GRCh38 and the complete telomere-to-telomere assembly T2T-CHM13v2.0, which resolves Y-chromosome regions missing from GRCh38 that can otherwise leak into microbial classifications (6).

CD Genomics' Metagenomic Shotgun Sequencing service includes host DNA depletion assessment as part of quality control, flagging samples with elevated host fractions and offering depletion options when needed.

Figure 4: Efficiency comparison of host DNA depletion methods including lyPMA chemical treatment, NEBNext enrichment kit, and computational
  filtration, showing reduction in human read fraction from high-host stool samples with minimal taxonomic bias.Figure 4: Host DNA depletion — lyPMA, NEBNext, and computational filtration efficiency comparison

Taxonomic Profiling — Who Is There, and at What Resolution

Once sequencing data are cleaned of human reads and quality-filtered, the first analytical step is taxonomic profiling. The choice of tool shapes the results as much as the experimental design does.

  • Read-Based Profilers: Kraken 2 and MetaPhlAn

Kraken 2 uses k-mer matching against a comprehensive reference database to assign each read to the lowest common ancestor in the taxonomic tree. It is fast and sensitive — classifying a high fraction of microbial reads — but its sensitivity to database composition means that species absent from the database may be assigned to the closest present relative, sometimes incorrectly. Kraken 2 with the PlusPFP database, which includes protozoa, fungi, and plants, is the standard for comprehensive classification.

MetaPhlAn 4 takes a different approach. Instead of classifying every read, it aligns against a curated set of clade-specific marker genes. This makes it less sensitive to database contamination and more precise for abundance estimation of detectable organisms, but it will miss species not represented in its marker catalog. MetaPhlAn 4 identifies approximately 3,400 species in a typical human gut sample and is the default profiler for large consortium studies (7).

For most gut microbiome studies, using both tools in parallel is the pragmatic choice. Kraken 2 captures broader diversity including fungal and dietary DNA. MetaPhlAn 4 provides more conservative, better-validated abundance estimates for the core human gut microbiota.

  • Strain-Level Tracking

Species-level taxonomy is often insufficient. Two participants may both carry Bacteroides thetaiotaomicron, but one harbors a strain that efficiently degrades dietary fiber into butyrate precursors while the other's strain lacks the necessary polysaccharide utilization loci.

inStrain uses whole-genome read alignment to compare populations at single-nucleotide resolution, calculating a population ANI metric that accounts for both major and minor alleles within a sample. Its precision is extraordinary — distinguishing strains that diverged as recently as two years ago — making it the tool of choice for transmission studies and FMT donor-recipient tracking. A 2025 meta-analysis of 810 mother-infant pairs used strain-level profiling to show that approximately 30% of shared species represent true strain transmission from mother to infant, with Bifidobacterium bifidum and B. longum among the most consistently transmitted species.

StrainPhlAn 3 uses consensus SNPs in species-specific marker genes, trading inStrain's whole-genome precision for lower computational cost. It profiles fewer species per sample but is well-suited to tracking dominant strains across hundreds of samples. For a cohort study examining strain-level associations with disease outcome, inStrain provides higher-resolution data; for tracking a specific pathogen or probiotic strain across a large population, StrainPhlAn is often sufficient (8).

Figure 5: Comparison of four taxonomic profiling tools — Kraken 2, MetaPhlAn 4, inStrain, and StrainPhlAn — showing their resolution levels
  from species to strain, database coverage, computational requirements, and recommended use cases.Figure 5: Taxonomic profiler comparison — Kraken 2, MetaPhlAn 4, inStrain, StrainPhlAn resolution and use cases

Functional Profiling — What the Community Can Do

Taxonomic profiling tells you which microbes are present. Functional profiling tells you what those microbes are capable of — a fundamentally different and often more clinically relevant question.

  • HUMAnN 3 and Pathway-Level Analysis

HUMAnN 3 (HMP Unified Metabolic Analysis Network) maps shotgun reads onto the UniRef90 protein cluster catalog and then rolls protein family abundances into metabolic pathway abundances using MetaCyc and KEGG. The output is a pathway abundance table — how many reads map to butyrate synthesis, bile acid transformation, tryptophan-to-indole conversion — for each sample. These can be compared between groups using the same differential abundance tools applied to taxonomic data.

Several functional categories consistently produce meaningful results in gut microbiome studies. Short-chain fatty acid synthesis pathways — acetate, propionate, and butyrate production — are linked to colonic health and immune regulation. Carbohydrate-active enzymes (CAZymes) reveal a community's capacity to break down specific dietary fibers. Antibiotic resistance genes, profiled against the CARD database, quantify the gut resistome — a clinically relevant parameter in hospitalized and immunocompromised populations. Virulence factor genes, profiled against VFDB, identify potential pathogens among the commensal background (9).

  • Integration with Other Omics

A metagenomic pathway abundance table becomes far more powerful when paired with other data types. Metabolomics — measuring actual small molecules in stool, serum, or urine — reveals which predicted pathways are actually active. A metagenomic prediction of high butyrate synthesis capacity combined with low measured fecal butyrate suggests a disruption in substrate supply or enzymatic activity that metagenomics alone would miss. Similarly, pairing metagenomic functional profiles with host transcriptomic data from intestinal biopsies reveals how the microbial metabolic repertoire interacts with host gene expression — a frontier approach in IBD and colorectal cancer research.

CD Genomics' Multi-Omics Service supports integrated metagenomic, metabolomic, and transcriptomic analysis for projects that connect microbial gene content to host physiology.

Figure 6: Diagram of the HUMAnN 3 functional profiling pipeline showing raw reads progressing through quality filtering, host read removal,
  alignment to UniRef90 protein clusters, and mapping to MetaCyc and KEGG metabolic pathway abundance tables.Figure 6: HUMAnN 3 functional profiling pipeline — from raw reads to pathway abundance table

Statistical Considerations

The bioinformatic analysis produces large tables — species by samples, pathways by samples. The statistical question is which features differ between groups after accounting for confounders. Getting this wrong produces microbiome associations that appear in press releases and disappear in replication studies.

  • Multiple Testing Correction

A typical gut metagenomic dataset contains hundreds of species and thousands of pathways. Testing each feature individually generates a large multiple testing burden. The standard approach is the Benjamini-Hochberg false discovery rate correction at a threshold of 0.05 or 0.10. Some researchers use a dual-threshold approach — FDR-corrected p < 0.05 combined with a minimum effect size — to reduce the risk of statistically significant but biologically trivial results.

  • Adjusting for Confounders

MaAsLin 2 is the current standard for clinical microbiome studies because it supports mixed-effects models with multiple fixed and random covariates within a single framework. A typical model for a case-control gut microbiome study includes disease status as the primary predictor with age, sex, BMI, medication use, dietary pattern, and sequencing batch as covariates.

Beware of compositional effects. Microbiome data are compositional — an increase in one taxon necessarily decreases the relative abundance of others because the data sum to a constant. Tools that do not account for compositionality can generate spurious associations. ANCOM-BC and ALDEx2 explicitly model compositional structure and are preferred over raw relative abundance comparisons (10).

  • Power Analysis

The Zouiouich et al. 2025 data provide practical benchmarks. For species present in more than 75% of individuals, detecting an odds ratio of 1.5 with 80% power requires approximately 3,500 cases with a single specimen, or roughly 2,400 with a 1:3 matched design. For functional pathways, power estimates vary but tend to be more favorable because core metabolic pathways are present in most individuals.

The practical message: a cohort of 30 cases and 30 controls may detect the largest effects — a tenfold difference in a dominant species — but will miss the moderate, clinically meaningful shifts that characterize most disease-associated dysbiosis. Not every study needs thousands of participants, but every study should report a power analysis that justifies its sample size.

For a broader overview of metagenomic sequencing approaches including environmental metagenomics, viromics, and multi-omics integration, see our guide on Metagenomic Sequencing Services — Overview.

How CD Genomics Delivers Your Gut Metagenomic Project

A well-executed gut metagenomic study follows a defined pipeline. Samples are collected using a standardized preservation kit, shipped at ambient temperature, and accessioned with metadata verification. DNA is extracted using bead-beating-based kits with negative controls in every batch. Libraries are prepared with fragmentation, end repair, adapter ligation, and barcoded indexing, then pooled and sequenced on an Illumina NovaSeq platform at the target depth — typically 20 million read pairs per sample for deep functional profiling or 3 to 5 million for shallow taxonomic screening.

Bioinformatic processing includes quality trimming, host read removal against a combined human reference, taxonomic profiling with Kraken 2 and MetaPhlAn 4, and functional profiling with HUMAnN 3. Strain-level analysis using inStrain or StrainPhlAn is available for studies requiring transmission tracking or within-species discrimination.

The final deliverable includes raw FASTQ files, processed taxonomic and functional abundance tables, alpha and beta diversity analyses, differential abundance testing with covariate adjustment, and a comprehensive report with publication-ready figures. Turnaround for a 100-sample project is approximately four to six weeks from sample receipt to analyzed data delivery.

For projects that require isolate-level genomic context beyond community profiling, CD Genomics' Microbial Whole Genome Sequencing service provides genome sequencing for cultured strains of interest. For studies examining the transcriptomic response of the host gut epithelium to microbial communities, our Metatranscriptomic Sequencing service adds the gene expression dimension.

Figure 7: Overview of the CD Genomics bioinformatics pipeline for shotgun metagenomics, from raw FASTQ files through quality trimming, host
  read removal, dual taxonomic profiling with Kraken 2 and MetaPhlAn 4, and functional annotation to final report delivery.Figure 7: CD Genomics shotgun metagenomics bioinformatics pipeline — from raw data to final report

FAQ

What is the difference between shallow and deep shotgun metagenomics for gut studies?

Shallow shotgun (2–5 million reads per sample) provides species-level taxonomic resolution at a cost approaching 16S. Deep shotgun (20 million reads or more) adds functional pathway profiling and rare gene detection. Choose shallow when taxonomy is the primary endpoint; choose deep when metabolic pathway analysis or resistome profiling is essential.

How many samples do I need for a statistically robust gut microbiome study?

For detecting modest disease associations in common species and pathways, 100 to 200 subjects per group is a practical starting point. For rare species, thousands of subjects may be needed. Collecting multiple specimens per participant reduces the required sample size by 35 to 60 percent.

Does host DNA contamination matter for stool samples?

In healthy individuals, stool typically contains less than 10% human DNA. In participants with intestinal inflammation, the human fraction can exceed 50%, reducing effective microbial sequencing depth. Both wet-lab depletion and computational filtration should be applied when host DNA is elevated.

Which taxonomic profiler should I use — Kraken 2 or MetaPhlAn?

Use both. Kraken 2 classifies a broader range of reads and captures non-bacterial DNA. MetaPhlAn 4 provides more conservative, better-validated abundance estimates. Running both in parallel gives complementary information.

Can shotgun metagenomics detect antibiotic resistance genes?

Yes. Reads mapped against the CARD database quantify the abundance of known antibiotic resistance genes in a sample. This resistome profiling is relevant for hospitalized, immunocompromised, and antibiotic-treated populations. Gene presence does not guarantee phenotypic resistance — expression and genetic context matter.

How should I preserve fecal samples for a multi-site study?

Standardize on one commercial collection kit — OMNIgene GUT or Zymo DNA/RNA Shield — across all sites. Record the time between collection and freezer storage for every sample. Include kit-blank negative controls in each batch. Ethanol is a valid low-cost alternative but may introduce variability in shotgun yield for samples from participants with intestinal inflammation.

Can I add metatranscriptomics or metabolomics later from the same samples?

If you anticipate adding metatranscriptomics, use a preservation method that stabilizes RNA — Zymo DNA/RNA Shield or immediate freezing at -80 degrees C. For metabolomics, 95% ethanol or OMNImet GUT provides metabolite stabilization. DNA-only preservation kits will not support RNA or metabolite analysis from the same aliquot.

References:

  1. Thomas AM, Manghi P, Asnicar F, et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nature Medicine. 2019;25(4):667-678. doi:10.1038/s41591-019-0405-7 (CC BY 4.0):https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9533319/
  2. Beghini F, McIver LJ, Blanco-Míguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 2021;10:e65088. doi:10.7554/eLife.65088 (CC BY 4.0):https://doi.org/10.7554/eLife.65088
  3. Zouiouich S, Wan Y, Vogtmann E, et al. Sample size estimations based on human microbiome temporal stability over six months: a shallow shotgun metagenome sequencing analysis. Cancer Epidemiology, Biomarkers & Prevention. 2025;34(4):588-597. doi:10.1158/1055-9965.EPI-24-0839 (CC BY 4.0):https://doi.org/10.1158/1055-9965.EPI-24-0839
  4. Vich Vila A, Collij V, Sanna S, et al. Impact of commonly used drugs on the composition and metabolic function of the gut microbiota. Nature Communications. 2020;11:362. doi:10.1038/s41467-019-14177-z (CC BY 4.0): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6969170/
  5. Gacesa R, Kurilshikov A, Vich Vila A, et al. Environmental factors shaping the gut microbiome in a Dutch population. Nature. 2022;604:732-739. doi:10.1038/s41586-022-04567-7 (CC BY 4.0):https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048813/
  6. Wright RJ, Comeau AM, Langille MGI. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microbial Genomics. 2023;9(3):000949. doi:10.1099/mgen.0.000949 (CC BY 4.0):https://doi.org/10.1099/mgen.0.000949
  7. Zhao S, Lieberman TD, Poyet M, et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host & Microbe. 2019;25(5):656-667. doi:10.1016/j.chom.2019.03.007 (CC BY 4.0):https://doi.org/10.1016/j.chom.2019.03.007
  8. Alcock BP, Huynh W, Chalil R, et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Research. 2023;51(D1):D690-D699. doi:10.1093/nar/gkac920 (CC BY 4.0):https://doi.org/10.1093/nar/gkac920
  9. Mallick H, Rahnavard A, McIver LJ, et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Computational Biology. 2021;17(11):e1009442. doi:10.1371/journal.pcbi.1009442 (CC BY 4.0):https://doi.org/10.1371/journal.pcbi.1009442

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Speak to Our Scientists
What would you like to discuss?
With whom will we be speaking?

* is a required item.

Contact CD Genomics
Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
Top