What is Next Generation Sequencing (NGS)?
Next-generation sequencing (NGS) is not a single technology. It is a project system that integrates sample preparation, library construction, platform selection, sequencing parameters, and bioinformatics analysis. The choice of each component determines whether the final data can answer the original biological question.
This guide is designed for researchers who already understand what NGS is and need a decision-oriented framework for planning their next sequencing project. It covers how to select the right sequencing strategy based on research objectives, sample characteristics, platform capabilities, and data quality requirements. The focus is on actionable project design logic, not on repeating basic NGS definitions that are already well covered in existing resources.
From Sequencing Technology to Project Design System
NGS differs fundamentally from Sanger sequencing in three aspects that affect every project design decision. First, NGS reads millions to billions of DNA fragments in parallel, producing data volumes that require computational infrastructure for processing and interpretation. Second, NGS generates short or long reads that must be aligned to a reference or assembled de novo—the read type determines what biological questions can be addressed. Third, NGS data quality depends on sequencing depth, coverage uniformity, error models, and reference database quality, not just on platform accuracy.
Researchers evaluating an NGS project should focus on five practical questions before selecting a platform or service provider:
- Can my sample type and quality support the intended analysis?
- Should I choose short-read, long-read, or a hybrid strategy?
- How much sequencing depth do I need for reliable results?
- What QC metrics should I track at each stage?
- What biological questions can the resulting data answer, and which questions remain outside its scope?
These questions form the backbone of the project design framework described in this guide. For researchers beginning their first NGS project or evaluating a new application, comprehensive NGS services provide expert guidance on experimental design and strategy selection.
Figure 1. NGS project design stack — from sample to biological interpretation
Caption: Five-layer NGS project design stack showing the progression from biological question through sample preparation, library construction, platform selection, sequencing parameters, and bioinformatics analysis to biological interpretation.
From Research Question to NGS Strategy — A Decision Framework
The most common mistake in NGS project planning is starting with the platform rather than the biological question. The right sequencing strategy depends on what the research aims to discover, measure, or compare.
Classifying research goals by NGS requirement: The first step in designing an NGS project is to classify the research goal by the type of biological information required. Each goal type has specific sequencing parameter requirements that determine platform, depth, and analysis pipeline choices.
| Research Goal | Better NGS Strategy | Key Design Variable | Risk If Misaligned |
|---|---|---|---|
| SNP / small InDel detection | Short-read WGS / WES / targeted panel | Depth, mapping quality, duplication rate | Low-confidence calls |
| Structural variant discovery | Long-read or hybrid WGS | Read length, molecule integrity | Missed SVs |
| De novo genome assembly | PacBio HiFi / ONT / hybrid | N50, coverage, heterozygosity | Fragmented assembly |
| Differential expression | RNA-seq | RNA integrity, library type, biological replicates | False biological interpretation |
| Microbiome profiling | Amplicon or shotgun metagenomics | Marker region, database, host depletion | Biased taxonomic profile |
Decision logic: If a high-quality reference genome is available for the target species, short-read sequencing is the most cost-effective approach for most variant detection and quantification applications. The trade-off is limited resolution in repetitive regions. If the research requires resolving repetitive regions longer than the read length, detecting large structural variants (deletions, insertions, inversions exceeding 50 bp), or assembling a genome de novo, long-read sequencing is necessary despite its higher per-base cost and more stringent sample requirements.
For projects requiring both accuracy and contiguity, hybrid strategies combining short-read and long-read platforms provide the best balance. A typical hybrid genome assembly uses PacBio HiFi or Nanopore long reads for contig construction and Illumina short reads for polishing and error correction. The cost of this dual-platform approach is higher but the resulting assembly quality justifies the investment for high-priority projects.
Common mistakes to avoid:
- Asking "how many gigabases" without first defining the biological question
- Comparing platform prices without evaluating whether the resulting data can support the intended conclusion
- Treating WGS, WES, RNA-seq, and amplicon sequencing as interchangeable project types
Figure 2. Research question to NGS strategy decision tree
Caption: Decision tree mapping five common research goals—SNP detection, structural variant discovery, de novo assembly, differential expression, and microbiome profiling—to their optimal NGS strategies with key design variables and misalignment risks.
Platform Selection — How to Evaluate Platform Fit for Your Project
Platform selection is not about ranking technologies but about matching their characteristics to the specific requirements of a research project. The ABRF Next-Generation Sequencing Study demonstrated that different platforms produce measurably different results in coverage consistency, error rates, and variant detection performance. These differences mean that platform choice directly affects which biological findings are discoverable.
Key project-level considerations: For projects where throughput and per-base accuracy are the primary requirements—SNV detection, RNA-seq quantification, exome sequencing, and targeted panels—short-read sequencing is the most established approach. The NGS services portfolio includes multiple short-read platforms to match throughput to project scale.
For projects that require resolving genomic regions longer than the read length—de novo assembly, structural variant detection, and full-length transcript sequencing—long-read platforms are necessary despite higher per-base cost and more stringent sample requirements. A detailed comparison of PacBio and Oxford Nanopore technologies is available for researchers evaluating long-read options.
For projects that need both accuracy and contiguity—comprehensive genome assembly or complete variant detection—hybrid strategies combining short-read and long-read sequencing provide the best balance. This dual-platform approach requires higher total investment but produces data quality that neither platform can achieve alone.
Common platform selection mistakes: Assuming longer reads are universally better, assuming short reads cannot contribute to structural variant analysis, and expecting one platform to be optimal for all project types are among the most frequent errors in NGS project design.
Figure 3. NGS platform trade-off triangle — accuracy, read length, throughput, and sample input requirements
Caption: Trade-off triangle illustrating the interrelationship between accuracy, read length, throughput, and sample input requirements across short-read, long-read, and hybrid NGS platforms.
Sample Quality Is the First Constraint in Any NGS Project
No amount of sequencing depth or sophisticated bioinformatics can compensate for poor sample quality. Sample quality assessment should be the first step in project design, before platform or library selection.
Key variables for DNA samples: Input amount (total mass and concentration), fragment size distribution, degradation level (assessed by gel electrophoresis or TapeStation), purity ratios (A260/280, A260/230), and contaminants (phenol, ethanol, salts, polysaccharides, heme, humic acid) all affect library construction efficiency and sequencing data quality. A sample that passes concentration QC but contains residual phenol will fail at the ligation step because phenol inhibits DNA ligase. For this reason, spectrophotometric purity assessment combined with fluorometric quantification provides a more reliable quality picture than either method alone.
Quantitative QC thresholds for DNA: A260/280 should be 1.8-2.0 for pure DNA; values outside this range indicate protein or phenol contamination. A260/230 should be 2.0-2.2; lower values suggest organic compound or carbohydrate carryover. For high-molecular-weight DNA required by long-read platforms, the genomic DNA should show a dominant band above 20 kb on a gel or TapeStation trace without significant smearing below 10 kb.
Key variables for RNA samples: RIN score (RIN ≥ 7 for mRNA-seq, RIN ≥ 5 for total RNA-seq), DV200 for FFPE RNA samples, rRNA contamination level, and tissue preservation method. FFPE-derived RNA requires specific library preparation protocols with damage repair steps, and the expected yield is typically lower than from fresh-frozen tissue.
Additional considerations for long-read sequencing: High-molecular-weight DNA extraction is essential. Gentle handling during extraction, minimal freeze-thaw cycles, and avoiding mechanical shearing during pipetting are critical for preserving the long fragments required for PacBio and Nanopore library preparation.
Diagnosing common quality problems:
- Uneven coverage: GC bias, fragmentation bias, or low library complexity. Solution: reassess input quality, adjust library method, control PCR cycles.
- Low mapping rate: Contamination, reference genome mismatch, or sample degradation. Solution: add contamination screening, verify reference suitability.
- High duplication rate: Low input DNA, excessive PCR amplification, or low library complexity. Solution: reduce PCR cycles, optimize library complexity, consider PCR-free protocols. For sample types with limited input material, genomic data analysis can help assess whether the duplication rate is within acceptable limits for the intended detection method.
Figure 4. Sample quality gates before NGS — QC thresholds and risk assessment
Caption: Sample quality control gates for DNA and RNA samples before NGS, showing quantitative QC thresholds (A260/280, A260/230, RIN, DV200) and risk assessment for common quality problems including uneven coverage, low mapping rate, and high duplication rate.
Library Preparation Determines Data Usability
Library preparation is the bridge between raw nucleic acid and sequencing-ready molecules. Its primary functions are to convert DNA or RNA into platform-compatible molecules, introduce adapters and barcodes for flow cell binding and sample identification, control insert size and library complexity, and preserve strand information where required.
Key variables that affect sequencing output: Fragmentation strategy (mechanical vs. enzymatic vs. tagmentation) affects coverage bias and reproducibility. Adapter ligation efficiency determines the proportion of fragments that can be sequenced. PCR cycle number directly influences duplication rate — each additional cycle beyond 10 adds approximately 5-10% more duplicates. Size selection window controls the insert size distribution, which affects cluster density and mapping rates. Library quantification method (qPCR vs. Qubit vs. Bioanalyzer) must be chosen carefully — qPCR is the most accurate for determining sequencing-ready concentration.
Common library preparation mistakes:
- Assuming high library concentration equals high library quality
- Ignoring adapter dimer contamination, which wastes sequencing reads
- Using total DNA quantification (Qubit) alone instead of also measuring amplifiable molecules (qPCR)
- Excessive PCR amplification in low-input samples, causing high duplication rates
For a detailed discussion of library preparation optimization, see the NGS library preparation resource, which covers fragmentation, end repair, adapter ligation, amplification, cleanup, and QC in depth. For projects involving specialized sample types such as FFPE tissues or cfDNA, targeted sequencing approaches often require specific library protocols optimized for degraded or low-input material.
Figure 5. Library preparation variables that affect sequencing output
Caption: Key library preparation variables affecting NGS sequencing output—fragmentation strategy, adapter ligation efficiency, PCR cycle count, size selection window, and quantification method—with common mistakes and their impact on data quality.
Read Length, Depth, and Coverage — Three Distinct Concepts
These three terms are frequently used interchangeably in project discussions, but they describe different parameters that each affect data quality and project cost independently. NGS can be applied to DNA or RNA from virtually any biological source — blood, tissue, cells, FFPE blocks, plasma (cfDNA), microbial cultures, environmental samples, and single cells. The key constraint is that sample quality must meet the requirements of the chosen library preparation method and sequencing platform.
Read length is the number of contiguous bases determined per sequencing read. It affects alignment accuracy, the ability to span repetitive regions, isoform resolution in RNA-seq, and assembly contiguity. Longer reads are not always better — they require longer run times and produce fewer total reads per flow cell.
Sequencing depth (or coverage depth) is the average number of times each base in the target region is sequenced. It determines confidence in variant calls — higher depth enables detection of lower-frequency variants and provides more robust statistical power for differential expression analysis.
Coverage can refer to either the fraction of the target genome that is covered by at least one read (breadth of coverage) or the distribution of depth across the genome (uniformity). Project discussions should specify which meaning is intended.
| Metric | What It Measures | Why It Matters | Common Misinterpretation |
|---|---|---|---|
| Read length | Length of sequencing reads | Alignment, assembly, repeat spanning | Longer always means better |
| Raw data | Total output before filtering | Run scale | Equals usable data |
| Clean data | Filtered high-quality reads | Downstream input | Guarantees mapping quality |
| Depth | Average reads per locus | Variant confidence | Same across all genome regions |
| Coverage uniformity | Distribution of depth | Reliability across regions | Ignored if mean depth looks high |
Project-specific depth requirements: Human WGS for germline SNV detection requires 30× coverage as a standard benchmark. Cancer somatic mutation detection requires 60-100× to identify low-frequency variants. RNA-seq gene expression analysis requires 20-50 million reads per sample. 16S amplicon profiling requires 10,000-50,000 reads per sample. These targets should be used as minimum values, with 10-20% over-sequencing added to account for sample-specific quality variation.
Project-specific interpretation: For WGS, the focus should be on genome-wide depth and uniformity. A 30× WGS run where some regions are covered at 5× and others at 60× is not equivalent to a run where all regions are covered at 25-35×. Coverage uniformity metrics such as the coefficient of variation (CV) of depth across bins or the fraction of the genome within 0.2× and 2× of the mean depth provide a more complete quality picture than the mean depth alone. For WES and targeted panels, the key metrics are on-target rate, target coverage, and capture uniformity — not total data output. For RNA-seq, mapped reads per sample, gene body coverage, and library strandedness are more informative than raw read count. For metagenomics, host read proportion, microbial diversity recovery, and rare taxa detection thresholds determine whether the depth is adequate.
Figure 6. Read length vs. depth vs. coverage — three independent metrics in NGS project design
Caption: Conceptual diagram distinguishing read length, sequencing depth, and coverage as three independent NGS metrics, with a table clarifying definitions, practical importance, and common misinterpretations for each metric.
NGS Data Quality Metrics — What to Look for in a Sequencing Report
A comprehensive QC report should include metrics at three levels: sequencing-level quality, alignment-level quality, and library-level quality. Researchers evaluating a sequencing service provider should know which metrics are standard and how to interpret them.
Sequencing QC: Q20/Q30 percentages, GC content distribution, adapter content, duplication rate, and read length distribution. For a standard Illumina run, >85% of bases should be above Q30 for 2 × 150 bp runs. The per-cycle quality heatmap should show a gradual decline from high to moderate quality — a sharp drop at any cycle number indicates a run-specific issue that should be investigated before proceeding with data analysis.
Alignment QC: Mapping rate, properly paired reads percentage, insert size distribution, mean coverage depth, and coverage uniformity. Low mapping rates (<80% for human DNA) should be investigated — possible causes include contamination (bacterial, fungal, or human DNA from handling), reference genome mismatch (wrong species or genome version), or sample degradation producing fragments that cannot align uniquely.
Library QC: Library concentration, molarity, size distribution, adapter dimer content, and library complexity estimate. Adapter dimer content above 5% of total library mass wastes sequencing capacity.
Project-specific QC: For WES or targeted panels, on-target rate and target coverage at specified depths (e.g., % of target bases at 20×, 50×, 100×) are essential. Whole exome sequencing services typically report these metrics as standard deliverables. For RNA-seq, rRNA rate, exonic/intronic/intergenic distribution, and gene body coverage should be reported. For metagenomics, host contamination level, taxonomic assignment rate, and database version should be documented. For long-read sequencing, read N50, read length distribution, total yield, and raw vs. corrected accuracy are key metrics.
Figure 7. NGS QC dashboard — sample report showing key quality metrics
Caption: Comprehensive NGS QC dashboard showing three levels of quality metrics—sequencing QC (Q30, GC content, adapter content), alignment QC (mapping rate, insert size, coverage uniformity), and library QC (concentration, dimer content)—with project-specific indicators for WES, RNA-seq, metagenomics, and long-read data.
Bioinformatics Analysis — NGS Value Is in the Question, Not the Sequencer
The sequencing instrument produces raw data. Bioinformatics analysis transforms that data into biological insight. The choice of analysis pipeline should be determined by the research question, not by default settings or standard workflows.
Core analysis pipeline components: Raw data QC, trimming and filtering, alignment or assembly, quantification or variant calling, annotation, statistical analysis, and biological interpretation. Each step has platform-specific and application-specific variations that affect results.
Key analysis differences by project type:
| NGS Project | Core Bioinformatics Output | Key Dependency | BOFU Question |
|---|---|---|---|
| WGS | Variant list, annotation, SV/CNV | Reference quality, depth | Can this design detect my target variant type? |
| RNA-seq | DEG, pathway, expression profile | RNA quality, replicates | Is the design statistically interpretable? |
| Metagenomics | Taxonomy, function, diversity | Database, host depletion | Can rare taxa or functional genes be resolved? |
| Long-read assembly | Contigs, N50, BUSCO, annotation | HMW DNA, coverage | Is assembly continuity enough for the research goal? |
Common bioinformatics mistakes: Assuming the same pipeline works for all project types, ignoring reference genome quality and version, neglecting database version impact on annotation results, designing experiments without biological replicates or appropriate statistical models, and confusing data visualization with biological conclusion. For projects that require customized bioinformatics analysis, discussing pipeline options with the service provider before sequencing begins ensures that the data output format matches the analysis requirements.
Bioinformatics analysis services can provide pipelines tailored to specific project types, ensuring that data processing and interpretation align with the research objectives.
Figure 8. Bioinformatics workflow by NGS project type
Caption: Bioinformatics analysis workflows for four NGS project types—WGS, RNA-seq, metagenomics, and long-read assembly—showing core pipeline components, key dependencies, and the BOFU (Basis of Future Use) question that should be answered before pipeline selection.
NGS Application Design — WGS, WES, RNA-seq, Metagenomics, and More
Whole Genome Sequencing
WGS is appropriate for genome-wide variant discovery, population genomics, de novo assembly, and comparative genomics. Key design variables include genome size, heterozygosity rate, repeat content, required variant type, and reference genome availability. Short-read WGS at 30× is standard for human germline SNV detection. Long-read WGS is preferred for assembly, structural variant detection, and repeat-rich regions. Hybrid strategies combining both read types provide the best balance for comprehensive genome analysis.
For a human WGS project, the standard deliverable includes approximately 90-100 Gb of raw data per sample at 30×. The bioinformatics pipeline must handle variant calling for SNVs, small indels, and copy number variants as a minimum, with structural variant analysis as an optional extension. Whole genome sequencing services can be configured for short-read, long-read, or hybrid approaches depending on the research objectives.
Whole Exome Sequencing / Targeted Sequencing
WES and targeted panels focus on specific genomic regions, reducing cost while enabling higher depth on target regions. Key design variables include capture region design, on-target rate expectations, target coverage requirements, probe compatibility with the target species, and GC-rich or repetitive target regions. Risks include uneven coverage across target regions, capture bias, and the inability to interpret non-target regions.
For human WES, a typical deliverable includes 100-200× mean depth on target regions, with at least 90% of target bases covered at 20× or above. The on-target rate (percentage of reads mapping within or near the capture design) should exceed 60% for standard exome capture kits.
RNA Sequencing
RNA-seq measures gene expression, detects alternative splicing, identifies fusion transcripts, and discovers novel transcripts. Key variables include RNA integrity, library strandedness (preserving strand orientation), poly(A) selection vs. rRNA depletion strategy, and biological replicate number. Risks include RNA degradation affecting quantification accuracy, batch effects from processing samples at different times, and insufficient replicate design for statistical power.
A standard mRNA-seq project requires 20-50 million reads per sample for gene-level quantification. For isoform-level analysis, 100+ million reads per sample may be needed. At least three biological replicates per condition are recommended for reliable differential expression analysis. RNA-seq services support both poly(A)-selected and rRNA-depleted library types.
Metagenomic Sequencing
Metagenomics profiles microbial community structure, functional potential, and strain-level composition. Key variables include host DNA proportion, microbial biomass, database choice, and sequencing depth. Risks include host contamination overwhelming microbial reads, insufficient depth for rare taxa detection, and database-dependent annotation bias that varies with the reference set used.
For shotgun metagenomics, 50-100 million reads per sample is typical for comprehensive functional profiling. For 16S amplicon sequencing, 10,000-50,000 reads per sample are sufficient for community composition analysis. Host DNA depletion strategies — including differential lysis and probe-based capture — should be considered for low-biomass microbiome samples from host-associated sites. 16S/ITS amplicon sequencing services provide standardized protocols for community profiling, while shotgun metagenomics services offer higher resolution for functional and strain-level analysis.
Figure 9. NGS application selection matrix — matching research goals to sequencing services
Caption: Application selection matrix for four major NGS approaches—WGS, WES, RNA-seq, and metagenomics—showing optimal use cases, design variables, risks, and recommended read depth for each application type.
Choosing Between Amplicon, Panel, Exome, Genome, and Transcriptome
The choice between these five assay types depends on the scope of the research question and the required discovery potential.
| Option | Best For | Discovery Potential | Data Complexity | Main Limitation |
|---|---|---|---|---|
| Amplicon | Known small regions, high sample numbers | Low | Low–Medium | Narrow scope |
| Target panel | Known gene sets | Medium | Medium | Design-dependent |
| WES | Coding variants | Medium–High | Medium–High | Misses non-coding regions |
| WGS | Genome-wide discovery | High | High | Higher data burden |
| RNA-seq | Expression / transcriptome | High for RNA level | High | RNA quality sensitive |
Buyer decision logic: Is the target known or unknown? Is discovery required or is confirmation sufficient? Is genome-wide coverage needed or is a focused region adequate? Is the question at the DNA or RNA level? Does the sample quality and budget support the complexity of the chosen approach? Answering these questions before selecting an assay type prevents costly mid-project pivots.
Figure 10. NGS assay selection ladder — from amplicon to whole genome
Caption: Five-level NGS assay selection ladder from amplicon (lowest discovery potential, narrowest scope) to WGS (highest discovery potential, broadest coverage), with decision logic for choosing the appropriate assay based on target knowledge and research requirements.
Common Causes of NGS Project Failure — and How to Prevent Them
Understanding failure modes before starting a project is the most effective prevention strategy. Failures can occur at every stage.
| Problem Observed | Possible Cause | Design-Level Prevention |
|---|---|---|
| Low usable reads | Contamination, adapter dimers, low library quality | Pre-sequencing QC and cleanup |
| Uneven coverage | GC bias, capture bias, fragmentation bias | Platform and library strategy adjustment |
| High duplication | Low input DNA, over-PCR | Library complexity monitoring, PCR-free protocols |
| Weak biological signal | Poor replicate design, batch effects | Statistical design before sequencing |
| Poor annotation | Database mismatch, outdated reference | Database selection and version control |
Sample-level failures: Degradation, contamination, insufficient input, or poor extraction method — address by rigorous pre-sequencing QC.
Library-level failures: Adapter dimer contamination, low conversion efficiency, over-amplification, biased fragmentation — address by optimizing library protocols and including QC checkpoints.
Sequencing-level failures: Under-loading or over-loading the flow cell, low-diversity libraries causing calibration failures, imbalance among multiplexed samples, insufficient depth — address by careful loading calculations and diversity planning.
Data-level failures: Low mapping rate, poor reference genome selection, high duplication, batch effects, database mismatch — address by including controls and planning analysis before sequencing.
Project design failures: Research question too broad for the chosen approach, wrong platform or assay type, no biological replicates, unrealistic downstream expectations — address by using the framework in this guide before committing resources.
Figure 11. NGS troubleshooting map — problem, cause, and prevention
Caption: Comprehensive NGS troubleshooting map covering five failure levels—sample, library, sequencing, data, and project design—with observed problems, possible causes, and design-level prevention strategies for each category.
How to Prepare a Project Inquiry for an NGS Service Provider
A well-prepared project inquiry accelerates the consultation process and reduces the risk of misaligned expectations. The following information should be prepared before contacting a sequencing service provider.
Basic information to provide: Species and sample type, DNA or RNA, sample number, extraction method, concentration and total amount, integrity data (RIN, DV200, or gel image), research objective, and expected deliverables. Including this information in the initial inquiry allows the service provider to assess feasibility and recommend an appropriate strategy without back-and-forth clarification.
Project design information: Whether a reference genome is available, whether the study targets known regions or requires discovery, whether variant calling, expression profiling, assembly, or annotation is needed, whether biological replicates are included, and batch information.
Questions to ask the service provider: What NGS strategy is recommended and why? What QC metrics will be reported at each stage? What are the raw, clean, and final deliverables? How are abnormal QC results handled? Can the bioinformatics pipeline be customized for the project?
Framing questions constructively: Instead of "Can you guarantee success?" ask "What sample or design factors affect data quality in this type of project?" Instead of "How fast can you deliver?" ask "What are the key QC checkpoints in the project timeline?"
Figure 12. NGS project inquiry checklist — information to prepare and questions to ask
Caption: Project inquiry preparation checklist showing basic information (species, sample type, extraction method, concentration, integrity data), project design information (reference genome, discovery vs targeted, replicates), and constructive questions to ask a sequencing service provider.
NGS Strategy Selection Checklist — A 10-Step Framework
- Define the biological question
- Identify whether the target is DNA, RNA, epigenetic, microbial, or single-cell
- Confirm sample quality and input feasibility
- Select assay type: amplicon / panel / WES / WGS / RNA-seq / metagenomics
- Select sequencing platform: short-read / long-read / hybrid
- Define read length, depth, and coverage expectations
- Confirm library preparation strategy
- Define QC metrics to track before sequencing
- Define bioinformatics deliverables
- Prepare project information for consultation
Following this checklist systematically minimizes the risk of costly mid-project corrections and ensures that the sequencing strategy is aligned with the research objectives from the start.
Figure 13. NGS strategy checklist for research projects — 10-step design framework
Caption: 10-step NGS strategy selection framework from defining the biological question through selecting assay type, sequencing platform, read length, depth, library strategy, QC metrics, bioinformatics deliverables, and preparing a project inquiry for consultation.
Conclusion — NGS Value Is in the Match Between Strategy and Question
Next-generation sequencing is a multi-variable project system. Platform, library, depth, sample quality, and analysis pipeline all contribute to the final data interpretation capability. For researchers evaluating NGS options, the most important questions are not about what NGS is, but about which strategy best fits the research goal, whether the sample supports the chosen approach, which data metrics need to be defined in advance, and whether the bioinformatics deliverables can answer the original biological question.
For research-use project planning, researchers may prepare sample type, research objective, expected analysis output, and available QC information before discussing an NGS strategy with their chosen service provider.
The most successful NGS projects are those where the experimental design is driven by the biological question, the sample quality is assessed before sequencing begins, the platform and depth are selected based on the target variant type and genome characteristics, and the bioinformatics analysis is planned as an integral part of the project rather than an afterthought. By applying the framework described in this guide, researchers can significantly reduce the risk of costly mid-project corrections and ensure that their sequencing investment produces interpretable, publication-ready results.
FAQ
What is the difference between sequencing depth and coverage?
Depth refers to the average number of reads covering each base in the target region. Coverage can refer to either the fraction of the genome covered by at least one read (breadth) or the uniformity of depth across the genome. Both metrics are needed to assess data quality.
Can I combine short-read and long-read sequencing in one project?
Yes. Hybrid strategies that use long reads for contiguity and short reads for polishing are standard for de novo assembly and structural variant detection. Many published genome assemblies use this combined approach.
What is the minimum DNA input required for NGS?
Minimum input varies by library preparation method: standard PCR-based kits work with 0.1 ng to 1 µg, PCR-free kits require 100 ng to 1 µg, tagmentation-based kits work with 1-50 ng, and ultra-low-input kits can work with as little as 50 pg. Selecting the appropriate kit for the available input is critical.
How do I evaluate NGS data quality from a sequencing report?
Key metrics to check: Q30 percentage (>85% for good runs), mapping rate (>80% for human DNA), duplication rate (<15% for WGS), adapter content (<1% after trimming), and on-target rate for capture-based methods. A good QC report should include all of these metrics with clear explanations.
How do I choose between WGS and WES for a human genetics project?
Choose WGS when comprehensive variant detection (including non-coding, structural, and regulatory variants) is needed and budget allows. Choose WES when the focus is on coding variants and the project requires higher depth on exonic regions at lower overall cost. WES misses approximately 98% of the genome including most regulatory and intronic regions, which are increasingly recognized as important in complex disease genetics.
For Research Use Only.
References:
- Illumina NGS workflow overview. Illumina, Inc.
- Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nature Biotechnology. 2021;39:1348-1365.
- The chemistry of next-generation sequencing. Nature Biotechnology. 2023;41:1709-1715.
- Nanopore sequencing technology, bioinformatics and applications. Nature Biotechnology. 2021;39:1348-1365.
- Accurate circular consensus long-read sequencing. Nature Biotechnology. 2019;37:1155-1162.