Whole Genome Sequencing for Structural Variant Detection in Pharma Research: When WGS Beats WES

Pharmaceutical research teams investigating drug targets, pharmacogenomic biomarkers, and disease mechanisms face a fundamental question: how much of the genome must be sequenced to confidently detect structural variants (SVs)? Structural variants — deletions, duplications, inversions, translocations, and copy number changes — account for a substantial fraction of human genetic diversity and play critical roles in drug response variability, target identification, and disease biology. Yet the choice between whole genome sequencing (WGS) and whole exome sequencing (WES) for SV detection is not always straightforward, as each platform carries distinct tradeoffs in sensitivity, coverage, cost, and analytical complexity.
For pharma R&D teams, the stakes of this decision are practical. A target discovery program built on WES data alone may overlook structural rearrangements affecting drug binding sites or regulatory elements. A pharmacogenomic study relying on exome-based genotyping may misclassify metabolizer phenotypes when gene deletion boundaries remain unresolved. Conversely, deploying WGS across large cohort studies without a clear hypothesis about the SV classes of interest may consume budget without proportional scientific return. This article provides a practical framework for evaluating when WGS is necessary, when WES may suffice, and how each approach performs across the range of structural variant classes that matter in drug development.
The Structural Variant Detection Gap
Structural variants collectively account for more base-pair differences between individual genomes than any other variant class. A typical human genome carries thousands of SVs, ranging from fifty-base deletions to megabase-scale chromosomal rearrangements. While single nucleotide variants (SNVs) have historically received greater attention in pharmaceutical genomics, accumulating evidence points to SVs as major contributors to pharmacogenetic phenotypes, disease susceptibility, and drug target variability.
In pharmacogenomics, gene deletions and duplications directly alter drug metabolism phenotypes across multiple clinically important genes:
- CYP2D6 — deletion shifts individuals from extensive to poor metabolizer status for ~25% of clinically prescribed drugs; a 2024 analysis of 479,144 UK Biobank genomes found 20.8% of individuals carry CYP2D6 SVs (deletions, duplications, or hybrids)
- DPYD — structural rearrangements affect chemotherapy toxicity risk by altering fluoropyrimidine metabolism
- UGT1A1 — promoter region structural variation impacts drug clearance profiles and irinotecan toxicity
- GSTM1 — homozygous deletion alters detoxification capacity and chemotherapy response
These variants frequently involve breakpoints that fall outside coding regions, placing them beyond the reach of conventional WES analysis.
For oncology drug development, somatic structural variants represent both therapeutic targets and resistance mechanisms:
- Gene fusions (e.g., ALK, ROS1, NTRK1) define target patient populations for precision therapies
- Intragenic deletions (e.g., PTEN, CDKN2A) drive resistance to targeted agents
- Copy number alterations affect drug sensitivity and tumor evolution trajectories
Detecting these events with confidence requires sequencing approaches that can resolve breakpoints, quantify copy number changes, and distinguish true structural events from alignment artifacts.
The core question, then, is not whether WGS can detect more SVs than WES — it clearly can — but rather which research scenarios demand the additional genome-wide coverage, and what tradeoffs in cost, throughput, and analytical complexity accompany the choice.
What Exome Sequencing Leaves Hidden
Whole exome sequencing targets the ~2% of the genome that codes for proteins, using hybridization-based capture probes to enrich for exonic regions. This design makes WES highly efficient for SNV and small indel discovery in coding sequences, but it imposes fundamental limitations for structural variant detection.
The primary constraint is spatial: most SV breakpoints occur in intronic, intergenic, or repetitive regions that fall outside capture probe coverage. When a deletion spans an exon but its breakpoints lie in flanking intronic sequence, WES may detect a reduction in read depth over the exon — a copy number signal — but cannot resolve the deletion boundaries or confirm its structure. For inversions, translocations, and other balanced rearrangements that do not alter copy number, WES typically provides no signal at all.
A 2024 study by Demidov and colleagues, analyzing 6,224 unsolved rare disease exomes from the Solve-RD consortium, demonstrated the magnitude of this limitation. Systematic SV calling from existing WES data yielded only 23 pathogenic SVs (a diagnostic yield of 0.4%), and just 8 of those had been missed by standard read-depth CNV tools. The study concluded that while SVs can occasionally be recovered from WES data through specialized bioinformatic analysis, the vast majority remain undetectable within the exome’s narrow genomic window.
Additional constraints compound these limitations:
- Uneven coverage — WES capture efficiency varies across genomic regions due to GC content and probe density differences, creating coverage bias that complicates read-depth-based CNV calling
- Small CNVs — structural variants affecting only a portion of a single exon (66 bp to a few kb) are routinely missed even by dedicated WES CNV callers
- Repetitive regions — pseudogenes, segmental duplications, and repetitive elements where many pharmaceutically relevant SVs reside suffer from probe design limitations and alignment ambiguity
- Breakpoint resolution — even when WES detects a copy number signal, it cannot resolve deletion boundaries or confirm rearrangement structure
Tool dependency further complicates WES-based SV analysis. A 2024 study of CNV detection in WES data found that 95.1% of CNV-positive exons were detected by only one of four commonly used callers (ExomeDepth, XHMM, CODEX2, EXCAVATOR2), and only 0.1% of calls were shared across all four tools. This low concordance means that relying on a single WES CNV caller carries a substantial risk of both false negatives and false positives — a risk that is difficult to quantify without orthogonal validation.
Breakpoints Only WGS Can Resolve
Whole genome sequencing addresses these limitations by providing uniform coverage across the entire genome, enabling SV detection through three complementary signal types: paired-end mapping (discordant insert sizes), split-read alignment (reads spanning breakpoints), and read-depth analysis (copy number changes). Together, these signals allow WGS to resolve SV breakpoints at base-pair resolution across coding and noncoding regions alike.
Recent head-to-head comparisons underscore the real-world impact of this technical advantage. In a 2025 study by Brashear and colleagues, WGS and WES were compared in a cohort of 36 pediatric patients with heterogeneous neuromusculoskeletal disorders. WGS identified a median of 90.5 candidate variants per patient compared to 57.5 by WES. Critically, 31.6% of Tier-1 diagnostic variants (12 out of 38) were identified exclusively by WGS, including six duplications or deletions — structural variants that WES could not detect. DMD gene deletions spanning hundreds of kilobases were captured by WGS but completely missed by WES, as their breakpoints fell outside exonic capture regions.
Similarly, a prospective comparison by Cullufi and colleagues in 2025 reported that WGS achieved a diagnostic yield of 68.1% compared to 30.6% by WES in 72 pediatric patients. WGS uniquely identified structural variants including a 16p11.2 microdeletion and a 12p duplication, along with deep intronic splice variants and regulatory variants that WES could not access.
For pharmaceutical research, the ability to resolve breakpoint junctions matters beyond diagnostic yield. When evaluating gene-edited cell lines or animal models, WGS provides definitive evidence of on-target and off-target structural rearrangements — information essential for preclinical safety assessment. In pharmacogenomic studies, breakpoint resolution enables accurate genotyping of structural alleles in genes like CYP2D6, CYP2A6, and UGT2B17, where deletion boundaries determine allele function and phenotype classification.

Detection Power Across SV Classes
The performance gap between WGS and WES is not uniform across all structural variant types. Some SV classes are partially accessible through WES read-depth analysis, while others remain entirely invisible. Understanding these differences guides method selection for specific research questions.
| SV Class | WGS Detection | WES Detection | Key Limitation of WES |
|---|---|---|---|
| Large deletions (>1 kb) | High sensitivity via paired-end + read-depth signals | Low to moderate; read-depth only | Breakpoints in intronic/intergenic space |
| Tandem duplications | High; split-read and paired-end | Low; read-depth only | Duplication boundaries unresolvable |
| Inversions | Moderate to high; paired-end and split-read | Not detectable | Copy-number neutral; no read-depth signature |
| Translocations | High; discordant paired-end reads | Not detectable | Breakpoints in non-targeted regions |
| Mobile element insertions | Moderate; split-read signatures | Very low | Insertions rarely in capture targets |
| Small CNVs (50–500 bp) | Moderate; depends on coverage depth | Poor; below resolution of depth-based callers | Single-exon CNVs missed |
| Complex SVs (multi-breakpoint) | Moderate to high with appropriate callers | Not detectable | Requires breakpoint-level resolution |
These detection asymmetries have direct implications for pharmaceutical study design. For research programs focused on known pharmacogenes where relevant SVs are predominantly large deletions or duplications (e.g., CYP2D6, GSTM1, SLC22A2), WGS provides definitive genotyping while WES leaves substantial ambiguity. For studies investigating novel drug targets where the SV landscape is unknown, WGS is the only approach that captures the full spectrum of structural variation.
The impact of sequencing platform extends beyond variant discovery. In a comprehensive benchmarking study from the Francis Crick Institute (2025), optimal somatic SV detection in cancer genomes required combining multiple WGS-based callers — Delly, GRIDSS, Manta, and SvABA — to achieve high sensitivity and specificity. No WES-based approach was included in the consensus because the SV classes of interest (gene fusions, large deletions, complex rearrangements) are not reliably detectable from exome data.
The practical implication for pharma research teams is that study designs for SV analysis should align detection expectations with the chosen platform’s capabilities. For oncology drug development programs, where SVs such as gene fusions (ALK, ROS1, NTRK1) and intragenic deletions (PTEN, CDKN2A) define target populations and resistance mechanisms, WGS-based detection provides the resolution needed for candidate prioritization. For preclinical pharmacogenomic studies characterizing metabolizer phenotypes across discovery cohorts, WGS enables definitive SV genotyping that WES-based read-depth approaches cannot match.
Coverage Depth and Practical Tradeoffs
Standard WGS at 30× mean coverage depth provides robust SV detection across most variant classes. However, pharmaceutical research projects often face constraints that require balancing coverage depth against sample numbers and budget. The choice of coverage depth directly affects detection sensitivity across different SV classes and study contexts:
- Low-pass WGS (4–8×) — detects large copy number changes and some deletions, but sensitivity for smaller SVs, balanced rearrangements, and precise breakpoint resolution is substantially reduced. Even at this depth, WGS outperforms WES for large CNV detection because uniform coverage eliminates capture bias inherent to exome enrichment
- Moderate-coverage WGS (15–20×) — provides sufficient resolution for confident SV genotyping in pharmacogenomic studies where target genes are known. Sherman and colleagues (2024) demonstrated this in 2,504 genomes from the 1000 Genomes Project, using PyPGx to detect SVs across 58 pharmacogenes including gene deletions, duplications, and hybrids
- Standard WGS (30×) — recommended when comprehensive SV discovery across all variant classes is required, particularly for novel target identification and oncology applications
Sample type also influences coverage decisions:
- FFPE tumor samples — fragmented DNA reduces effective coverage for SV detection; higher raw sequencing depth or long-read WGS may be necessary to compensate for template damage
- Liquid biopsy samples — low tumor fraction presents challenges, as SV detection requires sufficient representation of the variant-carrying genome fraction
Large-scale studies have helped benchmark these tradeoffs. The UK Biobank analysis by Gaynor and colleagues (2024), comparing WGS, WES, and imputation across 149,195 participants for complex trait associations, found that WGS captured approximately five times more variants than WES with imputation, though the incremental yield for common variant association signals was modest. Expanding the WES analysis to 468,169 participants increased association signals ~4-fold, underscoring that sample size often matters more than platform choice for complex trait mapping. For SV detection, however, where the variant class itself is poorly captured by WES, the yield differential is qualitative rather than quantitative.
For pharma R&D teams evaluating these tradeoffs, the practical consideration is not simply “WGS vs WES,” but rather what depth and configuration of WGS provides sufficient SV resolution for the specific research question, given sample quality, budget, and timeline constraints. A useful heuristic is to model the expected SV detection yield against sequencing cost for the specific research context. In biomarker discovery studies where the SV landscape is unknown and the cost of missing a relevant SV exceeds the sequencing cost differential, WGS at moderate coverage (15–30×) is the defensible choice. In large cohort pharmacogenomic studies where the target genes are known and validated WGS-based genotyping pipelines exist, the cost per sample for WGS has declined to a point where it may be comparable to or lower than WES when factoring in the SV genotyping that would otherwise require additional assays.

Choosing the Right SV Detection Strategy
Selecting between WGS and WES for SV detection in pharmaceutical research depends on several study-specific factors. The following decision framework organizes these considerations:
When WGS is the preferred approach:
- The research question involves unknown or uncharacterized SVs
- Breakpoint-level resolution is required for target characterization
- Balanced rearrangements (inversions, translocations) are of interest
- Pharmacogenomic genes with complex SV architecture (CYP2D6, CYP2A6, UGT2B17) are central to the study
- Preclinical safety assessment of gene-edited models requires comprehensive genomic characterization
- Somatic SV detection in oncology samples, where gene fusions and large rearrangements are common
When WES may be sufficient:
- The primary interest is coding SNVs and small indels, with SV analysis as a secondary objective
- Known CNVs in well-characterized genes can be detected with validated WES-based CNV pipelines
- Large cohort studies where WGS cost would be prohibitive and the primary focus is on coding variation
- Follow-up validation or replication studies where SV candidates were already identified through prior WGS discovery in independent cohorts
When a multi-modal approach adds value:
- Combining WGS with transcriptome sequencing (RNA-seq) to assess the functional impact of detected SVs on gene expression
- Using targeted long-read sequencing to resolve complex SVs initially identified by short-read WGS
- Integrating WGS-based SV detection with cancer panel sequencing for high-depth mutation profiling in oncology studies
Pharma R&D teams designing biomarker discovery or pharmacogenomic studies should also consider that SV detection methodology affects downstream analyses. Variant annotation, population frequency filtering, and functional prediction pipelines must accommodate SV data, which requires different bioinformatic infrastructure than SNV-focused analysis. Pharma research solutions that integrate WGS, WES, and transcriptomic data with SV-aware analysis pipelines provide a more complete picture of the genomic variation relevant to drug development.
For research programs where WGS is the chosen approach, selecting an experienced sequencing partner is essential. SV detection requires appropriate library preparation, sequencing coverage, and bioinformatic analysis — factors that vary substantially across service providers. Whole genome sequencing services designed for pharma applications typically offer coverage depth options tailored to SV detection, along with validated SV calling pipelines.
FAQ
What types of structural variants can WGS detect that WES cannot?
WGS detects all SV classes including balanced rearrangements (inversions, translocations) that produce no copy number signal, mobile element insertions, small CNVs (50–500 bp), and complex multi-breakpoint SVs. WES can only detect a subset of large deletions and duplications through read-depth signals, and cannot resolve breakpoints or confirm rearrangement structure. Inversions and translocations are entirely invisible to WES.
Is low-pass WGS sufficient for structural variant detection?
Low-pass WGS (4–8×) can detect large copy number changes and outperforms WES for this purpose due to uniform genome coverage. However, sensitivity for smaller SVs, balanced rearrangements, and precise breakpoint resolution is substantially reduced. For studies requiring comprehensive SV characterization, 15× or higher coverage is recommended.
How does sample type affect WGS-based SV detection?
FFPE tumor samples produce fragmented DNA that reduces effective coverage for SV detection, often requiring higher sequencing depth or long-read WGS approaches. Liquid biopsy samples with low tumor fraction present additional challenges, as SV detection depends on sufficient representation of the variant-carrying genome fraction. Fresh or fresh-frozen samples with high-molecular-weight DNA provide the most reliable SV detection.
When should I choose WES over WGS for SV analysis?
WES may be sufficient when the primary study focus is coding SNVs and small indels and SV analysis is a secondary objective, or when known CNVs in well-characterized genes can be detected with validated WES-based CNV pipelines. For large cohort studies where WGS cost would be prohibitive and the primary focus is on coding variation, WES with dedicated CNV callers provides a practical alternative.
What bioinformatics tools are recommended for SV detection from WGS data?
Comprehensive SV detection typically requires combining multiple callers to capture different SV classes: Delly (deletions, duplications, inversions), GRIDSS (breakpoint resolution), Manta (structural variant discovery), and SvABA (somatic SV detection) are commonly used in combination. SV annotation and population frequency filtering require additional bioinformatic infrastructure beyond standard SNV-focused pipelines.
Key Takeaways for Study Design
- Structural variants are a major source of pharmacogenetic diversity and drug target variability, yet the majority of SVs fall outside the genomic regions captured by WES
- WES reliably detects only a narrow subset of SVs — primarily large CNVs detectable through read-depth signals — and misses inversions, translocations, and small CNVs entirely
- WGS provides three complementary signal types (paired-end, split-read, read-depth) that enable breakpoint-level SV resolution across coding and noncoding regions
- Head-to-head comparisons show that 30–40% of clinically relevant structural variants are missed by WES and detected only through WGS
- Coverage depth decisions depend on the SV classes of interest, sample type, and study scale; moderate-coverage WGS (15–20×) often provides sufficient resolution for pharmacogenomic SV genotyping
- Study design should match SV detection strategy to the specific research question, considering that WGS is essential for comprehensive SV discovery while WES may suffice for targeted coding CNV analysis in well-characterized genes
References
- Demidov G, Laurie S, Torella A, et al. Structural variant calling and clinical interpretation in 6224 unsolved rare disease exomes. Eur J Hum Genet. 2024;32(8):998–1004. DOI: 10.1038/s41431-024-01637-4
- Brashear AM, Gustafson AG, Quitadamo A, et al. Improved genomic characterization of a clinically heterogeneous pediatric cohort with WGS vs. WES. Sci Rep. 2025;15:37679. DOI: 10.1038/s41598-025-21421-8
- Cullufi P, Tabaku M, Skrahin A, et al. Comparative analysis of WGS and WES for genetic diagnosis in a pediatric Albanian population. medRxiv [Preprint]. 2025. DOI: 10.1101/2025.07.24.25332056
- Sherman CA, Claw KG, Lee SB. Pharmacogenetic analysis of structural variation in the 1000 genomes project using whole genome sequences. Sci Rep. 2024;14:22774. DOI: 10.1038/s41598-024-73748-3
- Gaynor SM, Joseph T, Bai X, et al. Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank. Nat Genet. 2024;56:2345–2351. DOI: 10.1038/s41588-024-01930-4
- Waise S, Mensah N, Lesluyes T, et al. An optimised computational approach for the identification of somatic structural variants in cancer. bioRxiv [Preprint]. 2025. DOI: 10.1101/2025.07.01.662575
- Watanabe D, Okamoto N, Kobayashi Y, et al. Biallelic structural variants in three patients with ERCC8-related Cockayne syndrome and a potential pitfall of copy number variation analysis. Sci Rep. 2024;14:19741. DOI: 10.1038/s41598-024-70831-7
- Jiang X, Hu F, Zou XZ. Genetic ancestral patterns in CYP2D6 alleles: structural variants, rare variants, and clinical associations in 479,144 UK Biobank genomes. medRxiv [Preprint]. 2024. DOI: 10.1101/2024.07.23.24310892
For Research Use Only. Not for use in diagnostic procedures.