Introduction to Next Generation Sequencing Technology

CD Genomics Blog

Explore the blog we've developed, including genomic education, genomic technologies, genomic advances, and genomics news & views.

Posted on May 20, 2026

Next-generation sequencing (NGS) refers to the set of high-throughput sequencing technologies that emerged in the mid-2000s and collectively transformed the scale and scope of genomic research. Rather than sequencing individual DNA fragments one at a time as Sanger sequencing does, NGS platforms sequence millions to billions of fragments in parallel, reducing the cost of sequencing a human genome by more than five orders of magnitude since the Human Genome Project was completed in 2003.

This guide provides a 2026 update on the NGS landscape for researchers who have some familiarity with sequencing and need a structured overview of currently active platforms, their capabilities, and their limitations. It covers the major technology families—Illumina short-read sequencing, Ion Torrent semiconductor sequencing, and the complementary role of long-read technologies from PacBio and Oxford Nanopore—with a focus on practical decision-making for project design.

What Is Next-Generation Sequencing?

NGS encompasses all sequencing technologies that perform massively parallel sequencing of DNA fragments. The defining characteristic is scalability: while Sanger sequencing produces a single read of 600-1,000 bp per reaction, NGS platforms produce millions to billions of reads in a single instrument run, with read lengths ranging from 50 bp to over 300 bp depending on the platform and configuration. This scalability is what makes NGS the standard technology for most genomic applications — the ability to sequence an entire human genome in 24 hours rather than a decade depends on parallel processing rather than on any single read length or accuracy advantage over earlier technologies.

The key distinction between NGS and third-generation sequencing technologies (PacBio SMRT, Oxford Nanopore) is the amplification step. NGS platforms require PCR amplification of individual DNA molecules to generate detectable signals—either through bridge amplification on a flow cell surface (Illumina) or emulsion PCR on beads (Ion Torrent). This amplification step is both a strength and a limitation: it provides the high signal-to-noise ratio needed for accurate base calling but introduces amplification bias and limits read length. Third-generation technologies sequence single molecules without amplification, enabling longer reads but with different error profiles.

The practical implication for researchers is that NGS and third-generation sequencing are complementary rather than competing. NGS provides the highest throughput and accuracy for applications where read length is adequate—SNV detection, RNA-seq quantification, and metagenomic profiling. Long-read technologies provide information that NGS cannot access—structural variant resolution, de novo assembly contiguity, and direct epigenetic modification detection. Many projects now use both approaches in a hybrid strategy, combining Illumina short reads at 30-60× depth for cost-effective variant calling with PacBio HiFi or Nanopore long reads at 15-30× for resolving complex genomic regions. The total cost of a hybrid project is higher than single-platform sequencing but the data completeness is substantially greater, particularly for genome assembly and comprehensive variant detection. The decision to use a hybrid approach should be based on the specific requirements of the research question — for projects where comprehensive genome coverage is essential, the additional cost is justified by the data quality improvement.

NGS services cover the full range of short-read sequencing platforms, with configurations optimized for different project types and throughput requirements.

The NGS Ecosystem in 2026 — Active Platforms and Market Status

The NGS technology landscape has consolidated significantly since the early 2010s, when platforms from Roche, Illumina, Life Technologies, and Complete Genomics competed for market share. Today, the market structure is clearer.

Roche 454 (discontinued): The first commercial NGS platform, launched in 2005, used pyrosequencing and emulsion PCR to generate reads of up to 1,000 bp. Roche discontinued the platform in 2016 after failing to compete with Illumina’s throughput advantage. Researchers who used 454 legacy data should be aware that the platform’s error profile (high error rates in homopolymer tracts) differs from modern platforms and may affect comparisons with newer data.

Ion Torrent (Thermo Fisher, limited market share): Ion Torrent uses semiconductor technology to detect hydrogen ions released during nucleotide incorporation. Its main advantage is speed—a sequencing run can be completed in 2-7 hours. The trade-off is high error rates in homopolymer regions due to the analog nature of the ion detection signal, which cannot reliably distinguish between a single base incorporation and multiple identical base incorporations at a homopolymer run. Ion Torrent is used primarily for targeted small panels and clinical applications where rapid turnaround is prioritized over data quality. Its market share has declined steadily as Illumina’s run times have improved and as long-read platforms have become more accessible for applications requiring longer reads.

Illumina (dominant, ~80% market share): Illumina’s sequencing by synthesis (SBS) technology, introduced with the Genome Analyzer in 2007, has become the standard for short-read sequencing through continuous platform innovation. The Illumina platform family spans a 10,000-fold throughput range, from the benchtop iSeq 100 producing 1.2 Gb per run to the NovaSeq X Plus producing 16 Tb per run. The 2023 introduction of XLEAP-SBS chemistry on the NovaSeq X improved cycle times by 30-50% and signal intensity by 30-40%, reducing per-genome sequencing costs while maintaining the accuracy that has made Illumina the platform of choice for most genomic applications.

Platform	Max Output	Max Read Length	Run Time	Best Suited For
iSeq 100	1.2 Gb	2 × 150 bp	9-17 hr	Small panels, validation
MiniSeq	7.5 Gb	2 × 150 bp	7-24 hr	Targeted small panels
MiSeq	15 Gb	2 × 300 bp	4-55 hr	16S amplicons, small genomes
NextSeq 2000	330 Gb	2 × 150 bp	11-48 hr	RNA-seq, exomes, mid-scale WGS
NovaSeq 6000	6 Tb	2 × 250 bp	13-44 hr	Large-scale WGS, population studies
NovaSeq X Plus	16 Tb	2 × 150 bp	12-48 hr	Ultra-large WGS, single-cell at scale

PacBio and Oxford Nanopore are not NGS platforms in the strict sense (they are single-molecule sequencing technologies without amplification), but they are increasingly integrated into projects alongside NGS. PacBio HiFi reads at >99.9% accuracy for 10-20 kb now rival Illumina accuracy for single-nucleotide variant calling while providing long-range information for phasing, structural variant detection, and de novo assembly. Oxford Nanopore’s ultra-long reads exceeding 2 Mb provide unmatched contiguity for genome assembly, with the ability to span entire centromeres and other complex repetitive regions that cannot be resolved by any other technology. The complementary strengths of these three technology families mean that many large-scale genomics projects now employ more than one platform. Each sequencing platform has a characteristic error profile: Illumina produces random substitution errors (~0.1% at Q30), PacBio HiFi errors are also random and corrected by CCS, while Ion Torrent and Nanopore have higher homopolymer error rates that can cause false-positive indels if not accounted for in the analysis pipeline. Understanding these error profiles is important for selecting appropriate variant calling thresholds and for evaluating whether detected variants are likely to be real or platform-specific artifacts.

Figure 1. NGS platform ecosystem — active platforms, market status, and throughput range

Illumina Sequencing by Synthesis — The Workhorse Platform

Illumina’s SBS technology works through three core processes: library preparation (adding platform-specific adapters), cluster generation (amplifying single molecules into detectible clusters on a flow cell surface), and sequencing by synthesis (incorporating fluorescently labeled, reversibly terminated nucleotides one at a time with imaging after each cycle). The most recent chemistry iteration, XLEAP-SBS launched with the NovaSeq X in 2023, reduces cycle times by 30-50% and improves signal intensity by 30-40% compared to standard SBS chemistry.

Key capability for project design: The Illumina platform family’s 10,000-fold throughput range means that the platform should be matched to the project scale. A 16S amplicon project with 96 samples producing 25 million reads total is efficiently run on a MiSeq. A 100-sample human WGS project at 30× requiring 35 billion reads requires NovaSeq-class throughput. Using a platform mismatched to the project scale either wastes capacity (MiSeq for large projects requires too many runs) or pays for unused throughput (NovaSeq for small projects). The sequencing center or service provider should have access to the full platform range so that the platform is selected based on project needs rather than instrument availability. NGS platform selection services can match project parameters to the optimal Illumina system.

Read type considerations: Paired-end sequencing (reading both ends of each fragment) provides more accurate alignment and enables detection of structural variants and fusion transcripts. Single-end sequencing is lower cost and may be sufficient for applications like small RNA-seq where fragment length is shorter than the read length. Most standard NGS projects use paired-end reads as the default configuration. The choice of 2 × 150 bp vs. 2 × 300 bp depends on the required read length — 2 × 150 bp is standard for WGS and RNA-seq, while 2 × 300 bp is needed for 16S V3-V4 amplicon sequencing where the amplicon length (~460 bp) requires longer reads for complete coverage.

Figure 2. Illumina platform family — throughput range, read length, and project fit

Beyond read length — why Illumina dominates: Illumina’s dominance in short-read sequencing is not solely due to read length or accuracy. The platform benefits from a mature ecosystem including validated library preparation kits, standardized data formats, a large user community, and comprehensive bioinformatics tool support. These ecosystem advantages reduce the risk and complexity of adopting NGS for new laboratories and contribute to Illumina’s position as the default platform for most projects. Competitors face the challenge of not only matching the technical performance but also building the ecosystem that makes the platform practical for routine use.

Ion Torrent Semiconductor Sequencing — Speed Over Accuracy

Ion Torrent sequencing detects hydrogen ions released when nucleotides are incorporated during DNA synthesis. The pH change is proportional to the number of bases incorporated, enabling direct detection without fluorescence or optics. The platform’s main advantage is speed—a complete sequencing run from library to data can take as little as 2 hours. The limitation is that homopolymer regions (stretches of the same base, e.g., AAAAA) cannot be accurately quantified because the signal magnitude from multiple identical bases in a row cannot be reliably distinguished from a single base with signal variation.

In 2026, Ion Torrent is used primarily for small targeted panels in clinical settings where rapid turnaround is critical. Its market share has declined to <5% of total sequencing output, and new projects are generally better served by Illumina platforms or by long-read technologies depending on the application.

What NGS Can and Cannot Do

Capability	NGS (Short-Read, Illumina)	Limitation
SNV / small InDel detection	High sensitivity at ≥30×	Limited in repetitive regions and GC-rich areas
Structural variant detection	Detects deletions <1 kb; limited for larger events	Cannot span SVs > read length; requires specialized tools
Gene expression quantification	Standard for RNA-seq; 20-50M reads per sample	Isoform resolution limited to 150 bp reads
De novo genome assembly	Produces fragmented assemblies for >100 Mb genomes	Cannot resolve repetitive content > read length
Epigenetic modification detection	Bisulfite sequencing (conversion-based)	Cannot detect native modifications; requires chemical treatment
Metagenomic profiling	High depth for community composition	Species-level resolution limited for closely related taxa

Figure 3. NGS capabilities — what short-read sequencing can and cannot achieve

Understanding the limits is as important as understanding the capabilities: A researcher designing a project to detect structural variants in a cancer genome using only short-read NGS will miss approximately 30-50% of deletions and insertions larger than 1 kb. A project aiming to assemble a bacterial genome from short reads alone will produce a fragmented assembly with hundreds of contigs rather than a complete chromosome. Recognizing these limitations before the project starts prevents wasted sequencing expenditure and ensures that the experimental design matches the data requirements. For projects where comprehensive genomic analysis is critical, incorporating long-read sequencing or adopting a hybrid approach should be considered from the planning stage rather than as an afterthought when short-read data proves insufficient.

Key Applications of NGS

NGS is applied across four major domains of genomic research, each with distinct technical requirements and data interpretation challenges.

Genomics: Whole-genome sequencing (WGS) and whole-exome sequencing (WES) detect germline and somatic variants across the genome or its coding regions. The standard for human WGS is 30× coverage with paired-end 150 bp reads producing approximately 90-100 Gb of data per sample. WES targets the ~1-2% of the genome that codes for proteins, requiring 100-200× depth for reliable variant detection but at approximately one-fifth the sequencing cost of WGS. The choice between WGS and WES depends on whether non-coding regions are relevant to the research question — WES is appropriate for known coding variant discovery, while WGS is required for comprehensive genome-wide analysis including structural variants and non-coding regulatory regions that may affect gene expression. Whole genome sequencing services can be configured for short-read, long-read, or hybrid approaches depending on the required data completeness and project budget.

Transcriptomics: RNA sequencing quantifies gene expression, detects alternative splicing, and identifies transcript isoforms. Standard mRNA-seq requires 20-50 million reads per sample for gene-level quantification. Long-read RNA-seq (Iso-Seq, Nanopore cDNA) provides full-length transcript information that short-read RNA-seq cannot achieve, including complete isoform structures and fusion transcript sequences. RNA-seq services offer both poly(A)-selected and rRNA-depleted library options depending on the RNA type and research objectives, with strand-specific protocols available for applications requiring transcript orientation information.

Epigenomics: Bisulfite sequencing maps DNA methylation at single-base resolution. ChIP-seq identifies protein-DNA interaction sites genome-wide. ATAC-seq maps chromatin accessibility. Each application uses NGS as a readout for a different aspect of epigenetic regulation. For methylation analysis, whole-genome bisulfite sequencing (WGBS) provides the most comprehensive view but at high sequencing cost; reduced representation bisulfite sequencing (RRBS) targets CpG-rich regions at lower cost. Epigenomic data analysis services provide specialized pipelines for each assay type, including methylation calling, peak detection, and differential accessibility analysis.

Microbiomics: 16S/ITS amplicon sequencing profiles microbial community composition using conserved marker genes, providing genus-level resolution at low cost per sample. Shotgun metagenomics sequences all DNA in a sample, enabling functional profiling, strain-level analysis, and detection of pathogens and antimicrobial resistance genes without the PCR bias inherent to amplicon approaches. The choice between amplicon and shotgun metagenomics depends on the required resolution — amplicon sequencing is sufficient for broad community comparisons across many samples, while shotgun metagenomics is required for functional analysis or species-level taxonomic resolution. 16S/ITS amplicon sequencing services provide standardized protocols for marker gene analysis with validated primer sets and bioinformatics pipelines. For complex microbial communities requiring functional insights, shotgun metagenomics also enables recovery of metagenome-assembled genomes (MAGs), providing genome-level resolution for uncultivated microorganisms.

Figure 4. Major NGS application areas — genomics, transcriptomics, epigenomics, and microbiomics

Third-Generation Sequencing — The Long-Read Complement

Short-read NGS cannot access approximately 5-10% of the human genome—primarily centromeric, telomeric, and segmentally duplicated regions. Third-generation sequencing technologies address this limitation through single-molecule, real-time sequencing that does not require amplification.

PacBio SMRT sequencing: Circular consensus sequencing (CCS) produces HiFi reads at 10-20 kb with >99.9% accuracy — comparable to Illumina for single-nucleotide resolution while spanning repetitive regions thousands of bases long. HiFi reads are the gold standard for de novo genome assembly and enable detection of structural variants longer than 50 bp that short-read methods miss. The trade-off is higher cost per gigabase and more stringent DNA quality requirements.

Oxford Nanopore sequencing: Nanopore sequencing passes single DNA molecules through a protein nanopore, detecting ionic current changes as each nucleotide passes through the pore. Read lengths routinely exceed 100 kb and can reach 2 Mb, providing unparalleled contiguity for genome assembly. The platform directly detects base modifications (5mC, 6mA) without chemical conversion, a capability unique among sequencing technologies. Raw read accuracy is 90-97%, but consensus accuracy exceeds 99% with 30-50× coverage. Nanopore is used for ultra-long read genome assembly, rapid pathogen identification in outbreak settings, and direct RNA sequencing for transcriptomics without reverse transcription bias.

The choice between Illumina NGS, PacBio, and Nanopore is not a competition — it is a project design decision. For most SNV detection and RNA quantification projects, Illumina NGS provides the best value. For de novo assembly and structural variant detection, long-read platforms are essential. Hybrid strategies that combine short-read depth with long-read contiguity are increasingly common for comprehensive genome analysis. PacBio vs Oxford Nanopore provides a detailed comparison for researchers evaluating long-read options.

Figure 5. Short-read vs. long-read — complementary roles in genomic analysis

How to Choose the Right Sequencing Approach

Selecting between NGS, long-read sequencing, or a hybrid approach depends on three factors: data type required, read length needed, and budget. For projects requiring SNV detection at scale, Illumina NGS at 30× provides the most cost-effective solution. For genome assembly or comprehensive structural variant detection, long-read platforms are necessary — no amount of short-read depth can resolve a 10 kb repeat, and attempting to do so wastes sequencing budget on data that cannot answer the research question. For projects that need both depth and contiguity—for example, a cancer genome requiring both SNV detection and structural variant analysis—a hybrid approach using Illumina for depth and PacBio or Nanopore for long reads is the recommended strategy.

Decision guide by project type: A typical microbial isolate genome for publication requires both assembly and annotation — PacBio HiFi at 50× provides a complete assembly, often supplemented by Illumina reads for polishing. A population-scale human WGS study of 1,000 samples requires the throughput of NovaSeq X at 30×, and the budget would not support long-read sequencing at this scale. A targeted study of a known gene family across 500 samples is most efficiently performed by amplicon sequencing on MiSeq. Aligning the platform choice with the project scale and data requirements is the most important project design decision. Researchers unsure about the optimal approach should start with a small pilot study to assess data quality and feasibility before committing to full-scale production sequencing.

Computational and Data Management Considerations

NGS generates data volumes that require advance planning. A 100-sample human WGS project at 30× produces approximately 10 TB of FASTQ data, plus 5-8 TB of aligned BAM files and 50-100 GB of VCF files after variant calling. Storage, backup, and compute capacity must be planned before the project begins — sequencing data cannot be regenerated if lost, and reprocessing through an analysis pipeline requires the original FASTQ files to be available. Cloud-based analysis is increasingly common for projects exceeding 50 samples, avoiding the need for capital investment in compute hardware and enabling flexible scaling. For smaller projects (<50 samples), a local workstation with 64 GB RAM and 8+ CPU cores can handle standard NGS analysis pipelines including alignment, variant calling, and RNA-seq quantification without requiring cloud computing resources. Bioinformatics services can provide storage, compute, and analysis pipelines tailored to the selected platform’s data output characteristics.

FAQ

What is the difference between NGS and third-generation sequencing?
NGS requires PCR amplification of DNA fragments before sequencing, producing high accuracy but short reads. Third-generation sequencing (PacBio SMRT, Oxford Nanopore) sequences single molecules without amplification, producing longer reads but with different error profiles.

Which NGS platform is best for my project?
The choice depends on required throughput and read length. For 16S amplicon projects, MiSeq (2×300 bp) is standard. For RNA-seq and exome sequencing, NextSeq 2000 provides appropriate throughput. For large-scale WGS, NovaSeq 6000 or NovaSeq X is required. For rapid targeted sequencing with same-day turnaround, Ion Torrent may be appropriate despite its limitations in homopolymer accuracy. The key is to match the platform to the project scale rather than selecting a platform and designing the project around its limitations.

Is Roche 454 still available?
No. Roche discontinued the 454 platform in 2016. Active short-read platforms are Illumina and Ion Torrent. For long-read sequencing, PacBio and Oxford Nanopore are the available options.

Can I combine NGS and long-read sequencing in one project?
Yes. Hybrid strategies that use Illumina short reads for depth and PacBio or Nanopore long reads for contiguity are standard for de novo genome assembly and comprehensive structural variant detection.

How much storage do I need for an NGS project?
A single human WGS sample at 30× generates approximately 100 GB of FASTQ data plus 50 GB of aligned BAM files for downstream analysis and long-term storage. For a 100-sample project, plan for 15-20 TB of total storage including intermediate and output files. RNA-seq projects generate less data per sample — approximately 15-20 GB of FASTQ for 30 million reads — but the storage requirements still scale with sample number and must be budgeted accordingly.

What is the difference between whole-genome and whole-exome sequencing?
Whole-genome sequencing reads the entire genome, including both coding and non-coding regions. Whole-exome sequencing targets only the protein-coding exons (~1-2% of the genome). WGS provides more comprehensive variant detection at higher cost; WES is more cost-effective for studies focused on coding variants.

How do I choose between single-end and paired-end sequencing?
Paired-end sequencing reads both ends of each DNA fragment, providing more accurate alignment, better detection of structural variants, and improved assembly quality. Single-end sequencing reads one end only and is lower cost. Paired-end is the standard for most NGS applications; single-end is used primarily for small RNA-seq and some tag-based methods where the fragment is shorter than the read length and additional information from the second read would be redundant.

References

Related Services