Why Long Reads for Metagenomics
If you are running short-read (Illumina) shotgun metagenomics, you know the limits. Short reads (2×150 bp or 2×300 bp) fragment genomes into thousands of contigs. The result: fragmented metagenome-assembled genomes (MAGs), genus-level taxonomic resolution at best, and functional gene annotation that often stops at the domain level — because the full gene context is lost.
Long reads solve these problems:
Species- and strain-level classification without assembly. Each PacBio HiFi read spans enough of the 16S rRNA gene, or a species-specific marker region, to classify directly at the species level. For many bacterial genomes, a single HiFi read can cover an entire operon or mobile genetic element — preserving the genomic context that short reads fragment.
Higher-quality MAGs. HiFi-based circular MAGs (cMAGs) consistently outperform short-read MAGs in completeness, contamination, and N50. Nanopore ultra-long reads can further improve MAG contiguity for the most complex communities.
Complete gene clusters and ARG context. Biosynthetic gene clusters (BGCs) and antimicrobial resistance genes (ARGs) often span 10–50 kb — well within a single HiFi read. When you can capture a full BGC or ARG cassette in one read, you know exactly which species it belongs to, what the neighboring genes are, and whether it is on a plasmid or chromosome.
Applications of Long-Read Metagenomic Sequencing
Where long reads add the most value over short-read metagenomics.
Environmental microbiology
Profile soil, water, sediment, and extreme-environment microbial communities at species-level resolution. Track keystone species, functional guilds, and biogeochemical pathway distributions.
Human microbiome research
Resolve gut, oral, skin, and vaginal microbiome composition to the strain level. Link specific strains to metabolites, host phenotypes, or disease states — resolution that genus-level short-read profiling cannot deliver.
Antimicrobial resistance (AMR) surveillance
Capture full ARG cassettes in single long reads. Identify the host species, plasmid context, and co-localized resistance genes simultaneously — critical for tracking ARG transmission.
Industrial and agricultural microbiology
Optimize fermentation consortia, screen for novel enzymes and BGCs, and monitor soil or rhizosphere microbiomes for biocontrol and plant-growth-promoting organisms.
Long-Read Metagenomic Sequencing Workflow
- Sample intake & QC: gDNA quantitation and purity check (Qubit, spectrophotometry). A260/280 1.8–2.0, A260/230 ≥2.0. ≥2 μg high-quality metagenomic DNA. Host DNA depletion available for clinical samples.
- Library preparation: SMRTbell library construction with size selection (PacBio) or rapid/ligation prep (Nanopore). Barcoded multiplexing for multi-sample projects.
- Sequencing: PacBio Sequel II/IIe or Revio for HiFi metagenomics. Oxford Nanopore PromethION or GridION for large-scale ONT metagenomics. PacBio SMRT Sequencing | Nanopore Sequencing. Target 5–10 Gbp per sample for species profiling, 10–20 Gbp for MAGs.
- Basecalling & QC: PacBio CCS → HiFi (≥3 passes, QV ≥30). Nanopore: Dorado basecalling. QC: yield, read-length N50, Q-score, barcode balance.
- Bioinformatics: Taxonomy (Kraken2/Bracken), functional annotation (KEGG, eggNOG, CAZy, CARD), MAG binning (HiFi-MAG, metaMDBG for HiFi; metaFlye for ONT), comparative analysis.
- Delivery: Long reads (FASTQ/BAM), QC report, taxonomy tables, annotation results, MAGs (FASTA), analysis report.

Sample Requirements
| Sample type | Recommended amount | Minimum | Notes |
|---|---|---|---|
| Metagenomic DNA | ≥2 μg, ≥30 ng/μL | 1 μg | A260/280 1.8–2.0; RNase-treated |
| Soil / Sediment | 6 g | 2 g | Freeze immediately; avoid thawing |
| Fecal / Gut contents | 5 g | 2 g | Sterile tube; −80°C |
| Water filter membrane | 6 membranes | 2 membranes | 0.22–0.45 μm; −80°C |
| Swabs | 10–20 swabs | 6 swabs | Use preservation buffer |
| Tissue | 2 g | 1 g | Snap-freeze in liquid N₂ |
| Fermentation liquid | 6–10 mL (pellet ≥2 g) | 2 mL (pellet ≥1 g) | Ship pellet on dry ice |
- Ship all samples on dry ice (−80°C) or ice packs (−20°C for DNA). Include collection date, extraction method, and known inhibitors. Contact us for low-biomass, FFPE, or challenging samples. Also available: Nanopore Ultra-Long Sequencing for the most complex communities.
Bioinformatics Analysis
Standard (included)
- Read processing: CCS → HiFi (PacBio) or basecalling (Nanopore), demultiplexing
- Run QC: yield, read-length N50, Q-score distribution, barcode balance
- Taxonomic profiling: Kraken2/Bracken, species-level resolution
- Functional annotation: KEGG, eggNOG, CAZy, CARD
- Alpha and beta diversity analysis; differential abundance testing
Optional add-ons
- MAG binning and quality assessment (CheckM2)
- Comparative metagenomics across conditions or time series
- Biosynthetic gene cluster (BGC) prediction (antiSMASH)
- Custom database construction; multi-omics integration

Deliverables
| Category | Deliverables |
|---|---|
| Raw data | HiFi reads or ONT reads (FASTQ/BAM), demultiplexed per sample |
| QC report | Yield, read-length N50, Q-score distribution, CCS pass count, barcode assignment |
| Taxonomy | Species- and genus-level abundance tables (TSV), stacked bar charts, Krona plots |
| Function | KEGG pathway abundance, eggNOG/COG annotation, CAZy enzyme families, CARD ARG profiles |
| MAGs | Binned MAGs (FASTA), CheckM2 quality report (completeness, contamination, strain heterogeneity) |
| Comparative | Alpha/beta diversity, PCoA/NMDS, differential abundance (DESeq2/ALDEx2), heatmaps |
| Project report | Methods, parameters, results, figure-ready plots |
Need help interpreting your metagenomics data? Explore our Bioinformatics Services or Genomic Data Analysis options.
Long-Read vs Short-Read Metagenomics — Platform Comparison
Which approach fits your metagenomics project? Here is how PacBio HiFi, Oxford Nanopore, and short-read (Illumina) metagenomics compare on the dimensions that matter for microbial community analysis.
| Dimension | Short-Read (Illumina) | PacBio HiFi | Oxford Nanopore |
|---|---|---|---|
| Read length | 2×150 bp or 2×300 bp | ~15–20 kb HiFi | 10–100 kb routine; ultra-long to 2 Mb+ |
| Per-read accuracy | ≥99.9% | QV ≥30 (≥99.9%) | Q10–Q20 raw; depth-dependent |
| Taxonomic resolution | Genus (16S copy-number distorted) | Species, often strain — directly from raw reads | Species/strain with sufficient depth |
| Assembly-free taxonomy | No — requires assembly first | Yes — CCS reads classify directly | Requires depth or consensus |
| MAG quality | Fragmented, many chimeras | Circular cMAGs, higher completeness | Longer contigs; more polishing |
| Full operon / BGC capture | Assembly-dependent, often broken | Single-read capture (15–20 kb spans most operons) | Single-read capture; ultra-long covers largest clusters |
| ARG host identification | Contig-level, host usually unknown | Read-level: host species + plasmid context | Read-level; longer = more context |
| Real-time monitoring | No | No | Yes — stop run when data sufficient |
| Field deployment | No | No | Yes (MinION) |
| Bioinformatics maturity | Most mature | Mature HiFi tools (HiFi-MAG, metaMDBG) | Growing ONT metagenomics ecosystem |
Quick chooser
- Prioritize taxonomic accuracy and MAG quality → PacBio HiFi
- Need real-time or field-deployable metagenomics → Nanopore
- Target the largest plasmids, prophages, or multi-operon clusters → Nanopore ultra-long
- Best of both: HiFi for taxonomy/MAGs + ONT ultra-long for structural context
- Short-read metagenomics on a familiar pipeline → Illumina (genus-level only)
CD Genomics offers all three platforms. We help you choose based on sample type, community complexity, and target resolution.
Demo Results

Species-Level Taxonomic Classification — HiFi reads classify directly at species and strain level without assembly

cMAG Quality Assessment — HiFi circular MAGs: higher completeness, lower contamination than short-read MAGs

HiFi Read-Length Distribution — 15–20 kb reads capture full operons and ARG cassettes in single molecules
Long-Read Metagenomic Sequencing FAQ
1. Why use long reads instead of short-read metagenomics?
Long reads deliver species- and strain-level taxonomy directly from raw reads — no assembly needed. Short-read metagenomics typically stops at genus level and requires assembly for functional annotation, which introduces chimeras and fragmentation. Long reads also capture full operons and ARG cassettes in single reads, preserving genomic context.
2. PacBio HiFi vs Nanopore — which is better for metagenomics?
Both have strengths. HiFi gives you higher per-read accuracy (QV ≥30), which translates to more accurate taxonomic classification and higher-quality MAGs. Nanopore can produce ultra-long reads useful for large plasmids and prophages, and supports real-time and portable sequencing. For most projects, HiFi is the first choice for taxonomy and MAGs; Nanopore adds value when ultra-long structural context or field capability matters.
3. Can long reads classify at the species or strain level without assembly?
Yes. Because HiFi reads are 15–20 kb long and QV ≥30 accurate, a single read can span enough of the 16S rRNA gene or species-specific markers for direct classification. This is a key advantage over short reads, which require assembly before taxonomy.
4. What types of samples do you accept?
Soil, sediment, water (filtered), fecal/stool, gut contents, swabs, tissue, fermentation liquids, and extracted metagenomic DNA. See the Sample Requirements table for amounts and shipping conditions. Contact us for low-biomass, FFPE, or challenging samples.
5. How much data do I need per sample?
Typically 5–10 Gbp per sample for species-level taxonomic profiling. For high-quality MAG recovery from complex communities, 10–20 Gbp. We scope coverage during project consultation based on your sample type and expected community complexity.
6. What bioinformatics do you provide?
Standard delivery includes taxonomic profiling (Kraken2/Bracken), functional annotation (KEGG, eggNOG, CAZy, CARD), diversity analysis, and differential abundance testing. Optional add-ons: MAG binning and QC (CheckM2), BGC prediction, custom database construction, and multi-omics integration.
7. Can you do both PacBio HiFi and Nanopore metagenomics on the same project?
Yes. Hybrid PacBio + Nanopore metagenomics is a powerful strategy: use HiFi for accurate taxonomy and high-quality MAGs, and Nanopore ultra-long reads for capturing large plasmids, prophages, and complex genomic regions. We design hybrid workflows during project consultation.
8. How does this differ from 16S amplicon sequencing?
16S amplicon sequencing targets only the 16S rRNA gene and provides genus-level taxonomy at best — no functional information. Long-read metagenomic sequencing captures all DNA in the sample, providing species-level taxonomy AND functional gene annotation (metabolic pathways, ARGs, BGCs) from the same dataset.
Case Study — Long-Read Metagenomics Reveals Species-Level Microbiome Composition in San Francisco Estuary
Open Access Publication Highlight
Decomposing a San Francisco estuary microbiome using long-read metagenomics reveals species- and strain-level dominance from picoeukaryotes to viruses
Journal: mSystems (ASM), 2024 | DOI: 10.1128/msystems.00242-24
Background
Estuarine microbiomes are highly complex ecosystems shaped by dynamic freshwater and marine inputs. Understanding their species- and strain-level composition is essential for predicting ecosystem responses to environmental change, but short-read metagenomics often fails to resolve closely related species and strains due to fragmented assemblies.
Methods
This study applied Oxford Nanopore long-read metagenomic sequencing (~150 Gbp total) to water samples from the San Francisco estuary. Long-read data were analyzed using a combination of taxonomic classification (Kraken2/Bracken), metagenome-assembled genome (MAG) generation (Flye + metaMDBG), and strain-level profiling. Both short-read (Illumina) and long-read data were generated from the same samples for direct platform comparison.
Results
- Approximately 500 bacterial and archaeal species identified at species-level resolution
- 68 high-quality MAGs recovered, including several from poorly characterized lineages
- ~40,000 viral populations detected, with long reads enabling complete viral genome recovery
- Species- and strain-level dominance patterns resolved that were ambiguous in short-read data
- Picoeukaryotic genomes assembled directly from metagenomic data, revealing hidden diversity
Figure 2 from mSystems, 2024. Species-level taxonomic composition by long-read metagenomic sequencing.
Conclusion
This study demonstrates that long-read metagenomic sequencing provides species- and strain-level resolution that is inaccessible to short-read approaches, particularly for complex environmental microbiomes. The ability to recover complete MAGs and viral genomes from a single long-run demonstrates the power of Nanopore metagenomics for comprehensive ecosystem profiling — the same approach we apply in our long-read metagenomic sequencing service.
Reference
- Decomposing a San Francisco estuary microbiome using long-read metagenomics reveals species- and strain-level dominance from picoeukaryotes to viruses. mSystems, 2024. https://doi.org/10.1128/msystems.00242-24
Related Publications
Here are publications from researchers who have used our metagenomic sequencing services:
Nutrient structure dynamics and microbial communities at the water–sediment interface in an extremely acidic lake
Journal: Frontiers in Microbiology
Year: 2024
Indole-3-Propionic Acid, a Gut Microbiota Metabolite and Postoperative Delirium
Journal: Annals of Surgery
Year: 2023
DOI: 10.1097/SLA.0000000000005886
Abundance and phylogenetic distribution of eight key enzymes of the phosphorus biogeochemical cycle in grassland soils
Journal: Environmental Microbiology
Year: 2023
See more articles published by our clients.
