Illumina NGS: Principles, Platforms, and Best Practices for Successful Sequencing Projects
Illumina sequencing by synthesis (SBS) technology has dominated the short-read sequencing landscape for over a decade, powering the majority of genomic studies published each year. The platform's combination of high accuracy, scalable throughput, and mature ecosystem makes it the default choice for most NGS applications—from targeted amplicon panels and whole-exome sequencing to population-scale whole-genome studies.
However, running a successful Illumina sequencing project requires more than knowing the basic workflow. Selecting the right platform, preparing high-quality libraries, interpreting quality metrics, and avoiding common workflow failures are all critical to achieving reproducible, publication-ready results. A single poorly prepared library can waste an entire sequencing run—and with NovaSeq X flow cells costing tens of thousands of dollars per run, the financial impact of failure is substantial.
This article provides a practical guide to Illumina NGS, covering platform selection, library preparation best practices, sequencing quality interpretation, and data analysis workflows. It is designed for researchers who already understand the basic principles and need actionable guidance for experimental planning and execution. The focus throughout is on practical, decision-oriented content: which platform to choose, how to avoid the most common library preparation failures, how to read a sequencing QC report, and how to plan a project from start to finish.
Why Illumina NGS Dominates the Short-Read Sequencing Landscape
Illumina's sequencing by synthesis (SBS) technology has remained the dominant short-read platform through continuous innovation. The chemistry has evolved from standard SBS to the more recent XLEAP-SBS, introduced with the NovaSeq X Series, which delivers faster run times, higher signal intensity, and reduced reagent consumption. These improvements have dramatically reduced the cost per genome over the past decade, making population-scale sequencing projects economically feasible.
The Illumina platform family spans a 10,000-fold range in throughput, covering virtually every scale of sequencing project:
| Platform | Max Output | Max Read Length | Typical Run Time | Ideal Applications |
|---|---|---|---|---|
| iSeq 100 | 1.2 Gb | 2 × 150 bp | 9–17.5 hr | Small panels, validation runs |
| MiniSeq | 7.5 Gb | 2 × 150 bp | 7–24 hr | Small targeted sequencing |
| MiSeq | 15 Gb | 2 × 300 bp | 4–55 hr | 16S/ITS amplicons, small genomes, amplicon panels |
| NextSeq 2000 | 330 Gb | 2 × 150 bp | 11–48 hr | RNA-seq, exomes, medium WGS |
| NovaSeq 6000 | 6 Tb | 2 × 250 bp | 13–44 hr | Large-scale WGS, population studies |
| NovaSeq X / X Plus | 16 Tb | 2 × 150 bp | 12–48 hr | Ultra-large WGS, >30× human genomes at scale |
For researchers planning their first Illumina project or looking to upgrade their platform, understanding where each system fits is the first critical decision. Comprehensive next-generation sequencing services cover the full range of Illumina platforms, making it possible to select the right instrument for each project's specific throughput and read-length requirements.
Figure 1: Illumina platform matrix — throughput versus read length for MiSeq, NextSeq, NovaSeq 6000, and NovaSeq X

The Three Core Steps — A Quick Overview
Every Illumina sequencing project follows the same three-stage workflow:
- Library preparation: DNA or RNA is fragmented, end-repaired, A-tailed, and ligated to sequencing adapters. The resulting library is amplified, quantified, and quality-checked before loading.
- Cluster generation and sequencing: Libraries are loaded onto a flow cell where they undergo bridge amplification to form clonal clusters. Sequencing by synthesis proceeds in cycles, with each cycle incorporating one fluorescently labeled, reversibly terminating nucleotide. The instrument captures images after each cycle, and base-calling software converts the fluorescence signals into sequence reads.
- Data analysis: Raw BCL files are converted to FASTQ format (primary analysis), reads are aligned to a reference genome (secondary analysis), and biological interpretation follows (tertiary analysis).
Figure 2: NGS three-step workflow — library preparation, cluster generation, and sequencing

Selecting the Right Illumina Platform — Throughput, Read Length, and Application Fit
Choosing the wrong platform is one of the most common and costly mistakes in NGS project planning. The right choice depends on the interaction between three parameters: total sequencing output needed, read length required, and budget.
Application-driven platform selection: A typical research group may run projects across multiple scales. Understanding how each platform maps to common study types ensures efficient resource use.
- 16S/ITS amplicon sequencing: Requires 2 × 250 bp or 2 × 300 bp reads to cover full-length variable regions. MiSeq is the standard platform, processing 96–384 samples per run at low cost per sample.
- Whole-exome sequencing (WES): Requires ~10 Gb per sample. For 96 samples, a NovaSeq 6000 S4 flow cell handles the full batch in one run. For smaller batches of 12–24 samples, NextSeq 2000 is more practical and avoids paying for unused flow cell capacity.
- Whole-genome sequencing (WGS): Requires 30–60 Gb per sample for 30× coverage. NovaSeq 6000 or NovaSeq X are the appropriate platforms. The NovaSeq X with XLEAP-SBS chemistry has significantly reduced the per-genome sequencing cost, making ultra-large WGS studies more accessible.
- RNA-seq (mRNA): Requires 20–50 million reads per sample for standard gene expression; 100+ million for isoform-level analysis. NextSeq 2000 fits standard projects well, while NovaSeq class platforms support single-cell RNA-seq projects requiring 500 million to 3 billion reads per run.
- Targeted panels (small): 10–100 genes with 1–5 million reads per sample. MiniSeq or MiSeq are cost-effective, offering fast turnaround. For panels exceeding 500 amplicons, NextSeq may be needed to ensure sufficient read depth per amplicon.
Practical decision framework: Start by calculating the total number of reads needed (reads per sample × number of samples). Then check the minimum read length. Finally, choose the platform that delivers that throughput in the shortest time at the lowest per-sample cost. The NGS sequencing services team can help match your project parameters to the optimal platform configuration.
Figure 3: Platform selection decision tree — from project parameters to recommended Illumina system

Library Preparation — The Step Where Most Projects Succeed or Fail
Library preparation is the most variable step in the NGS workflow and the most common source of project failure. A well-designed library prep protocol with rigorous QC checkpoints is essential for consistent results.
Five critical QC checkpoints:
- Input nucleic acid quality: DNA should have OD260/280 of 1.8–2.0 and OD260/230 > 1.5. RNA should have RIN ≥ 7 for mRNA-seq and RIN ≥ 5 for total RNA-seq. Degraded input is the single most common cause of library failure and cannot be compensated for by increasing input amount.
- Fragmentation consistency: Enzymatic fragmentation is more reproducible than mechanical shearing for most applications. The target fragment size distribution should match the sequencing read length—for 2 × 150 bp, the insert size should center around 300–500 bp.
- Adapter ligation efficiency: Inefficient ligation produces libraries with high adapter-dimer content. A Bioanalyzer trace showing a prominent peak at 80–120 bp with no corresponding insert indicates adapter dimers, which waste sequencing capacity and reduce data quality.
- PCR amplification bias: Limit PCR cycles to 6–10 for DNA libraries and 12–15 for RNA libraries. Excessive amplification increases duplication rates without improving library complexity. For low-input samples, consider PCR-free library preparation methods.
- Final library quantification: qPCR-based quantification is more accurate than Qubit or Bioanalyzer for determining loading concentration. A 2–3× discrepancy between methods is common, and relying on the wrong measurement is a leading cause of poor cluster density.
Common library failures and their solutions:
- Low cluster density: Library concentration was underestimated. Cross-validate quantification with qPCR. For patterned flow cells (NovaSeq), the optimal loading concentration range is narrow—a 10–20% deviation can produce poor results.
- Over-clustering: Library concentration was overestimated. Re-quantify and re-pool at a lower concentration. Over-clustering produces overlapping clusters that cannot be resolved, reducing the number of usable reads.
- Adapter dimer contamination in reads: Post-ligation cleanup was insufficient. Increase the SPRI bead ratio or add a gel-based size selection step. For stubborn cases, use a double-sided SPRI cleanup. A Bioanalyzer trace with a dominant peak below the expected library size range confirms adapter dimer contamination.
- High duplication rate (>30%): Insufficient input DNA or too many PCR cycles. Increase input material if available; reduce PCR cycles; or switch to a PCR-free library protocol for WGS applications.
- Index hopping: On patterned flow cells, residual free indexes can mis-annotate neighboring clusters. Use Unique Dual Indexes (UDI) instead of single indexes to eliminate index hopping as a concern. For large multiplexed projects with many samples, UDI is strongly recommended over single-index strategies.
Multiplexing strategy and barcode allocation: A key decision in library preparation is how many samples to multiplex per sequencing run. The number of samples per run is determined by the required reads per sample and the total output of the flow cell. For a NextSeq 2000 generating 400 million reads, multiplexing 96 exome samples at 4 million reads each is straightforward. For a MiSeq generating 25 million reads, multiplexing more than 48 samples for a 16S amplicon project may result in insufficient reads per sample for reliable diversity estimates.
Index quality is another often-overlooked factor. Low-quality indexes with high similarity between barcode sequences increase the risk of misassignment. Using validated index sets from the library prep manufacturer—with a minimum Hamming distance of 3 between any two indexes—minimizes cross-talk between samples in the same run.
For teams that prefer to outsource library preparation, genomic data analysis services include library QC and preparation as part of a comprehensive sequencing workflow.
Figure 4: Library preparation QC workflow — five critical quality checkpoints from input DNA to final library quantification

Understanding the Sequencing by Synthesis (SBS) Cycle in Detail
While the introduction covered the basic SBS principle, understanding the cycle-level mechanics is useful for troubleshooting and interpreting QC metrics.
Each SBS cycle proceeds through four steps: (1) incorporation — a fluorescently labeled, reversibly terminated nucleotide is added by the polymerase; (2) imaging — the instrument images the flow cell surface at four wavelengths to identify which base was incorporated at each cluster; (3) cleavage — the fluorescent dye and the terminating group are removed; (4) wash — unincorporated reagents are flushed before the next cycle.
The time required per cycle varies by platform. On the NovaSeq 6000, each cycle takes approximately 5–10 minutes, including imaging time. On the NovaSeq X with XLEAP-SBS chemistry, the cycle time is reduced to 3–5 minutes due to faster enzyme kinetics and a redesigned imaging system that captures the full flow cell surface in fewer exposures.
The key failure mode at the cycle level is "phasing" and "pre-phasing." Phasing occurs when some templates in a cluster fail to incorporate a nucleotide in a given cycle, trailing behind by one base. Pre-phasing occurs when some templates incorporate two bases in a single cycle, pulling ahead. Both effects reduce the synchrony of the cluster and cause signal decay over successive cycles. This is the fundamental reason why quality scores decline toward the end of a read—not an instrument flaw, but a natural consequence of imperfect synchrony in a multi-cycle chemical process.
Phasing rates are typically expressed as a percentage per cycle. A phasing rate of 0.1% means that after 100 cycles, 10% of templates in each cluster are one base behind the majority. By cycle 150, this grows to 15%. The cumulative effect determines the practical read length limit for each platform. Illumina's highest phasing specification is typically <0.5% per cycle for standard SBS chemistry and lower for XLEAP-SBS.
Illumina platforms manage phasing through proprietary algorithms that estimate and correct for the percentage of molecules running ahead or behind. However, as cycle count increases beyond 150–300 cycles (depending on the platform), the accumulated effect reduces both Q-scores and the usable read length. This is why NovaSeq X with XLEAP-SBS, which has reduced phasing rates due to faster kinetics and improved washing, can maintain higher Q-scores across longer reads compared to standard SBS chemistry.
Understanding Sequencing Quality — Q-Scores, Error Profiles, and Data QC
Quality scores (Q-scores) provide the primary metric for assessing Illumina sequencing run performance. The Phred quality score (Q) is logarithmically related to the probability of an incorrect base call: Q30 corresponds to an error probability of 1/1000 (99.9% accuracy), while Q20 corresponds to 1/100 (99% accuracy). The score is calculated as Q = -10 log₁₀(P), where P is the probability of an incorrect base call.
For a typical Illumina run, the following benchmarks indicate good performance:
- >85% of bases at Q30 or higher for 2 × 150 bp runs
- >75% of bases at Q30 for 2 × 250 bp or longer runs
- Error rate (PhiX alignment mismatch) < 1%
Interpreting a sequencing QC report: The standard Illumina analysis viewer provides several key metrics that should be reviewed after every run:
- Per-cycle quality heatmap: Shows Q-score distribution across all cycles. A gradual decline from start to end is normal; a sharp drop mid-run may indicate a reagent or fluidics issue.
- Base composition by cycle: For balanced libraries, A and T curves should overlap, as should G and C curves. Divergence indicates library composition bias, particularly in amplicon or enrichment panels.
- GC content distribution: A unimodal peak matching the expected GC content of the target genome indicates normal library complexity. Multiple peaks or a broad flat distribution suggest contamination or PCR bias.
- Duplication rate: For WGS libraries, expected duplication rates are 5–15%. Higher rates indicate low input DNA, excessive PCR, or insufficient library complexity.
Factors affecting quality scores: Several parameters during the sequencing run influence the final Q-score distribution. Understanding these helps both in planning experiments and in troubleshooting poor runs.
- Read position: Quality decreases toward the end of the read as fluorescent signal decay accumulates and phasing effects become more pronounced. The final 5–10 cycles of a 150 bp read typically show lower Q-scores than the first 50 cycles. This is normal and expected—the rate of decline is a useful diagnostic.
- Sequence composition: GC-rich regions and homopolymer tracts tend to have lower quality due to reduced nucleotide diversity during imaging. Adding PhiX control (5–20% of total library mass) to low-diversity libraries provides a balanced signal reference that significantly improves quality scores across the entire run.
- Cluster density: Both under-clustering and over-clustering reduce quality. The optimal density range varies by platform—for NovaSeq 6000 S4 flow cells, 250–350 K clusters/mm² is typical. For NextSeq 2000, 150–250 K clusters/mm² is optimal. Deviating by more than 20% from the optimal range typically produces a measurable drop in Q30 percentages.
- Index sequence diversity: Low-diversity index sequences (e.g., all A or all T) can cause registration failures during the first few sequencing cycles of the index read. Using a pre-designed, validated index set from the library prep kit manufacturer avoids this issue entirely.
- Reagent quality and storage: Expired or improperly stored sequencing reagents are a common hidden cause of quality degradation. SBS chemistry is sensitive to freeze-thaw cycles and temperature fluctuations. Following the manufacturer's storage and handling guidelines—and logging reagent lot numbers and expiration dates—is a simple but often overlooked step.
Reviewing the sequencing QC report before proceeding to data analysis is essential. Key sections include the per-cycle quality heatmap, base composition by cycle, GC content distribution, and duplication rate. If any metric falls outside acceptable ranges, the run should be flagged and the root cause investigated before the data is used for downstream analysis.
Figure 5: Typical Illumina Q-score heatmap showing per-cycle quality distribution across a 2 × 150 bp run

NovaSeq X and XLEAP-SBS Chemistry — What Changed and Why It Matters
The 2023 introduction of the NovaSeq X Series with XLEAP-SBS chemistry represents the most significant Illumina chemistry update in a decade. XLEAP-SBS is not a minor revision—it is a redesigned sequencing chemistry with measurable improvements in speed, accuracy, and cost. The NovaSeq X Plus, operating at full capacity, can generate up to 16 Tb of data per run, equivalent to sequencing more than 500 human genomes at 30× coverage in a single 48-hour run.
Key improvements over standard SBS:
- Faster enzyme kinetics: XLEAP-SBS enzymes incorporate nucleotides more rapidly, reducing 2 × 150 bp run times from ~40 hours (NovaSeq 6000) to ~24 hours (NovaSeq X).
- Improved signal intensity: Higher signal-to-noise ratio reduces miscall rates, particularly in the later cycles of long reads. Published data from Illumina shows a 30-40% reduction in error rates compared to standard SBS on the NovaSeq 6000.
- Reduced reagent consumption: The new chemistry uses less reagent per base, significantly lowering the cost per Gb compared to standard SBS chemistry.
- Higher throughput per run: The 25B and 100B flow cells support previously impossible scale—a single NovaSeq X Plus run can produce 16 Tb of data, equivalent to ~500 human genomes at 30× coverage.
Practical implications for researchers: The NovaSeq X does not replace all previous Illumina platforms. For small-scale projects (fewer than 50 samples), MiSeq and NextSeq remain more practical due to their lower minimum run costs and faster turnaround. The NovaSeq X is transformative for projects requiring large-scale, cost-efficient sequencing—population studies, longitudinal cohort analyses, and single-cell atlas projects.
Figure 6: XLEAP-SBS versus standard SBS chemistry — key improvements in speed, signal intensity, and reagent consumption

NGS Data Analysis — From BCL to Biological Insight
The data analysis pipeline for Illumina sequencing follows a standard three-tier structure:
Primary analysis (on-instrument): The sequencing instrument performs real-time base calling, converting fluorescence images into BCL (Binary Base Call) files, then into FASTQ format. This step is fully automated and typically requires no user intervention. Modern platforms provide real-time quality metrics accessible during the run.
Secondary analysis (user-managed): FASTQ files are processed through alignment (STAR for RNA, BWA-MEM for DNA, HISAT2 for transcriptome) and variant calling (GATK, FreeBayes, Strelka2). This stage requires 32–64 GB RAM for human WGS and substantial storage—a single 30× human genome generates ~100–200 GB of FASTQ data and ~50–100 GB of aligned BAM files.
Tertiary analysis (biological interpretation): Annotated variants are filtered, prioritized, and interpreted in the biological context of the study. Common tertiary analysis tools include ANNOVAR, SnpEff, VEP for annotation, and a variety of pathway and enrichment analysis packages.
Critical considerations for data analysis:
- Reference genome version: GRCh38 (with patches) remains the standard human reference. The T2T-CHR13 reference offers a more complete representation but is not yet universally adopted. Pipeline results can differ substantially between reference versions.
- Storage planning: A typical WGS project requires 3–5× the raw FASTQ storage for intermediate files. Plan for 600 GB–1 TB per 30× human genome, including FASTQ, BAM, VCF, and temporary pipeline files.
- Computing infrastructure: Cloud-based analysis (AWS, Google Cloud, or dedicated bioinformatics platforms) is increasingly preferred over local servers for large projects, eliminating the need for capital investment in computing hardware. The primary trade-off is data transfer time—uploading 10 TB of FASTQ files can take 2–5 days depending on connection speed. Hybrid approaches (local storage + cloud compute) are common for large-scale projects.
- Pipeline reproducibility: Using containerized pipelines (Docker, Singularity) or workflow managers (Nextflow, Snakemake, Cromwell) ensures that the same analysis is applied consistently across all samples in a project. This is essential for maintaining data comparability, particularly in multi-batch or collaborative studies.
For research teams without in-house bioinformatics capacity, genomic data analysis services provide access to established pipelines covering alignment, variant calling, and biological interpretation.
Figure 7: Three-tier NGS data analysis pipeline — from BCL to FASTQ to aligned BAM to biological interpretation

Planning a Successful Illumina Sequencing Project — A Step-by-Step Framework
Beyond the technical details of each workflow step, successful Illumina projects share a common planning framework. Following this structure minimizes the risk of costly mid-project revisions or reruns.
- Define the biological question and determine the optimal assay type. Is this a discovery study (WGS, RNA-seq), a targeted follow-up (WES, targeted panel), or a screening application (amplicon panel)? The assay type determines all downstream parameters.
- Calculate required sequencing depth. For human WGS, 30× is sufficient for most germline applications. Rare variant detection may require 60×. RNA-seq gene expression requires 20–50 million reads per sample; isoform-level analysis requires 100+ million. Targeted panels need 500–1,000× coverage per amplicon for reliable variant calling.
- Select the platform and flow cell. Match total read requirement (reads per sample × number of samples + 10–20% over-sequencing) to the available platforms. The selected platform should deliver the required throughput without unused capacity. A MiSeq run generating 15 Gb is appropriate for small amplicon studies but inefficient for a large exome project; a NovaSeq X flow cell with terabase-scale output is overkill for a small pilot study.
- Design libraries with QC checkpoints. Plan for Bioanalyzer traces after fragmentation and after final library, qPCR quantification, and a small-scale pilot titration run for novel library types. Each checkpoint should have a predefined pass/fail criterion.
- Include experimental controls. A positive control sample with known variants validates the workflow from library preparation through variant calling. A negative (no-template) control identifies contamination. PhiX spike-in (typically 1% for WGS, 5–20% for low-diversity libraries like amplicons) provides a calibration standard for quality scoring.
- Plan data analysis before sequencing begins. Pipeline selection, reference genome version, computing resources, and storage capacity should all be in place before the first sequencing run completes. Sequencing generates data faster than most researchers expect—a NovaSeq X producing 16 Tb in 48 hours requires correspondingly rapid downstream capacity.
Common Workflow Failures and How to Avoid Them
| Failure Mode | Root Cause | Prevention |
|---|---|---|
| Low cluster density | Library concentration underestimated; qPCR quantification inaccurate | Use qPCR for final quantification; run a titration loading test for new library types; cross-validate with Qubit |
| Over-clustering | Library concentration overestimated; patterned flow cell loading too high | Validate with two orthogonal methods; dilute conservatively; start at recommended loading range midpoint |
| >30% duplication rate | Insufficient input DNA; too many PCR cycles; low library complexity | Use ≥100 ng input DNA where possible; limit to ≤8 PCR cycles; consider PCR-free library prep for WGS |
| Index hopping | Residual free adapter indexes on patterned flow cells | Use unique dual indexes (UDI) instead of single indexes; UDI eliminates index hopping risk entirely |
| Low Q30 in final cycles | Read length exceeds effective chemistry range; phasing accumulation | Use platform's recommended max read length; run pilot test before full-scale production |
| Adapter contamination in reads | Incomplete cleanup after adapter ligation; short insert fragments | Optimize SPRI bead ratio; add gel-based size selection for problematic sample types |
| PhiX mismatch rate >2% | Reagent degradation; flow cell defects; base calling calibration drift | Log reagent lot numbers and expiration dates; check flow cell; recalibrate if issue persists |
Each failure mode has a specific root cause and a clear preventive action. Catching problems early through small-scale pilot runs—testing loading concentrations across 3–4 dilutions before full-scale production—prevents the most expensive sequencing failures.
How CD Genomics Supports Illumina NGS Projects
CD Genomics provides end-to-end Illumina sequencing services covering the full project pipeline from experimental design through data delivery.
Platform availability: Our laboratory is equipped with NovaSeq X Plus, NovaSeq 6000, NextSeq 2000, and MiSeq systems, covering the full throughput range from small targeted panels to population-scale WGS. Each platform is maintained under rigorous QC protocols to ensure consistent data quality. Our platform choice is driven by your project parameters—not by what we have available, because we have every Illumina system in active operation.
Comprehensive library preparation: We offer standard, low-input, PCR-free, and ultra-low-input library preparation protocols optimized for different sample types—including blood, tissue, FFPE, cfDNA, and single cells. QCs at every stage.
Data analysis and interpretation: Standard deliverables include FASTQ files with QC reports and optional secondary analysis (BWA/GATK pipeline, RNA-seq quantification) and tertiary analysis (variant annotation, functional enrichment). For larger projects, we can provide cloud-based analysis pipelines that scale with your data volume.
Project consultation: Our team helps match your project parameters to the optimal platform, flow cell configuration, and sequencing strategy to maximize data quality while minimizing cost. A typical consultation covers: expected data output, optimal read length and coverage depth, run configuration (single vs. paired-end), multiplexing strategy, and data analysis requirements.
For more details, explore our NGS services or contact our team for a project-specific consultation.
FAQ
What is the difference between SBS and XLEAP-SBS chemistry?
XLEAP-SBS is a redesigned sequencing chemistry introduced with NovaSeq X. It offers faster run times, higher signal intensity, and lower reagent consumption compared to standard SBS chemistry used on earlier Illumina platforms.
How do I choose between MiSeq, NextSeq, and NovaSeq for my project?
Start by calculating your total read requirement (reads per sample × number of samples). MiSeq suits small panels and amplicon projects. NextSeq fits medium-scale projects like RNA-seq and exome sequencing. NovaSeq class platforms are designed for large-scale WGS and population studies.
What cluster density should I aim for on a NovaSeq 6000 S4 flow cell?
The optimal range is typically 250–350 K clusters per mm². Values outside this range may reduce data yield or quality.
Why is my sequencing Q30 score lower than expected?
Common causes include: library with low nucleotide diversity (add more PhiX), over- or under-clustering, degraded input DNA/RNA, or using a read length that exceeds the platform's optimal range.
How can I tell if my library preparation was successful before sequencing?
A successful library should show a clear peak on the Bioanalyzer trace at the expected size range, minimal adapter dimer contamination (<5% of library mass), and consistent qPCR quantification results.
What causes adapter dimers and how do I remove them?
Adapter dimers form when adapter molecules ligate to each other instead of to insert DNA. They can be removed by increasing the SPRI bead ratio during cleanup or by adding a gel-based size selection step.
What is the difference between index hopping and index cross-talk?
Index hopping occurs when residual free index primers mis-annotate neighboring clusters, causing reads from one sample to appear in another. Index cross-talk results from signal interference between index sequences during imaging. UDI (unique dual indexes) effectively eliminates index hopping.
What data output should I expect from a human WGS 30× run?
Approximately 90–100 Gb of raw data per sample, producing ~100–200 GB of FASTQ files, ~50–100 GB of aligned BAM files, and ~1–2 GB of gVCF files.
How much storage space do I need for an NGS project?
Plan for 3–5× the raw FASTQ size to accommodate intermediate analysis files. For a 100-sample WGS project, this means 30–50 TB total storage.
What reference genome version should I use for human sequencing data?
GRCh38 is the current standard for most applications. T2T-CHR13 is more complete but not yet supported by all analysis tools. Match the reference version to the tool requirements and community standards for your specific application.
References
- Illumina NGS workflow overview. Illumina, Inc.
- Quality scores for next-generation sequencing. Illumina Technical Note.
- Sequencing error profiles of Illumina sequencing instruments. NAR Genomics and Bioinformatics. 2021;3(1):lqab019.
- Chemistry and imaging on the NovaSeq X Series. Illumina Knowledge Base.
- Cluster density optimization on Illumina sequencing instruments. GenoHub.
Related Services
- Next Generation Sequencing
- Genomic Data Analysis
- Whole Genome Sequencing
- Genotyping
- Bioinformatics Services
- RNA-Seq
- Microbial Sequencing Service