Whole-exome sequencing (WES) has successfully identified approximately 85% of known pathogenic mutations by targeting and capturing about 1%-2% of the exon regions (key protein-coding regions) in the genome, becoming a core technology for genetic disease diagnosis and tumor research. However, WES has significant limitations: it cannot detect non-coding region regulatory variations (such as promoters and splicing sites), pseudogene interference, and dynamic mutations (such as CGG duplication amplification in Fragile X syndrome). Furthermore, WES has limited ability to analyze complex phenotypes (such as cancer heterogeneity and the immune microenvironment), requiring integration with other omics technologies to overcome these limitations.
Multi-omics integration strategies (such as WES+RNA-Seq and WES+epigenetic analysis) cover more comprehensive molecular information through complementary techniques, significantly improving diagnostic rates (e.g., the discovery of TP53 mutation-driven glycolysis pathway abnormalities in esophageal cancer) and revealing host-microbe interaction mechanisms (e.g., the impact of GPR151 mutations on gut microbiota composition in IBD). Research by consortia such as the GREGoR alliance has further advanced technologies such as long-read sequencing and multi-omics matrix construction to address challenges related to non-coding region variations and complex genetic patterns. Currently, integrating WES with other omics has become a core direction in precision medicine, providing a systematic perspective for elucidating disease mechanisms, targeted therapies, and biomarker development.
I. Technological Complementarity: Overcoming the Limitations of Single-Omic Analysis
While WES can cover 85% of known pathogenic mutations, its limitations have spurred the rapid development of multi-omics integration strategies. The following are key complementary directions and typical cases:
Analysis of Non-coding Regions and Regulatory Variations
- WES has limited ability to detect deep intronic variants and other non-coding regulatory elements (such as distal promoters and enhancers). In contrast, RNA-seq can directly reveal aberrant splicing and cryptic splice site activation caused by these hidden intronic mutations or even synonymous mutations within exons (e.g., TP53 mutations leading to mis-splicing in esophageal cancer). Studies show that covering all variant types (WES provides high-depth identification of DNA point mutations/low-frequency mutations, scRNA-seq analyzes the transcriptome + CNV supplements large fragment mutations), complementing the limitations of single technologies (WES has low VAF sensitivity, scRNA-seq has insufficient DNA detection), achieving dual verification and correlation between DNA and transcriptome, and accurately analyzing the disease's "mutation-expression" panorama.
- Lu W et al., through the combined use of whole-exome sequencing (WES) and single-cell RNA sequencing (scRNA-seq) (supplemented by immunohistochemistry, CellChat analysis, etc.), discovered the specific characteristics of VHL germline mutant clear cell renal cell carcinoma (ccRCC)—the presence of OBP2A/BCR1 translocation mutations at the DNA level (rare), high expression of COX7A1 at the transcriptome level (associated with good prognosis), and extensive interaction between TAM and CD8+ depleted T cells in the tumor microenvironment (key to immunosuppression), revealing its "mutation-transcription-microenvironment" panorama, providing a basis for the diagnosis (targeted sequencing) and treatment (immunointervention) of rare ccRCC.
Gene mutation characteristics of ccRCC with VHL germline mutation (Lu W et al., 2024)
- Epigenetic Analysis: Abnormal DNA methylation is associated with gene silencing. For example, through multi-omics integration (WES+RRBS+RNA-seq), four methylation subgroups (demethylated DEM, low-methylated LOW, intermediate INT, and high-methylated CIMP) were identified. The results showed that the LOW subgroup was immune-activated (long survival, ICB sensitive) and the CIMP subgroup was immunosuppressed (short survival, ICB resistant). Furthermore, DNMT inhibitors can induce demethylation to mimic the LOW phenotype, providing a basis for precision immunotherapy (Anichini A et al., 2025).
Detection of Dynamic Mutations and Structural Variations
- Long-read sequencing (ONT/PacBio): Addresses the issue of WES missing pseudogenes (such as the pseudogene region of the PKD1 gene) and dynamic mutations (such as CGG duplication amplification in Fragile X syndrome). Clinical guidelines recommend the combined LR-PCR+NGS protocol, increasing the PKD1 mutation detection rate from 70% to 90%.
- WGS Supplement: WGS can capture complex structural variations (such as chromosomal inversions and balanced translocations) missed by WES. For example, WGS covers the entire genome's non-coding regions and has discovered regulatory region variations (promoters, enhancers), intergenic variations, and sex-specific associations that WES could not detect. It also revealed a novel mechanism by which low-frequency variations participate in DKD through transcriptional regulation (such as the METTL4 enhancer). These findings complement the limitations of WES, which focuses only on exon variations, and provide new non-coding region targets for the precision diagnosis and treatment of DKD (Haukka JK et al., 2024).
Study setup for single-variant and gene and intergenic region aggregate analyses (Haukka JK et al., 2024)
Host-Microbe Interaction Mechanisms
Metagenomic analysis: In inflammatory bowel disease (IBD) studies, WES found that common variants of immune-related genes (such as MYRF and IL17REL) regulate microbial metabolic pathways (such as short-chain fatty acid synthesis) and abundance (such as Alistipes) through microbial quantitative trait loci (mbQTLs), and identify disease-specific genetic-microbial interactions (such as BTNL2 and TNFSF15). Metagenomic analysis revealed that the abundance of Bacteroidetes phylum and the decrease of Firmicutes (such as Faecalibacterium) in the gut microbiota of IBD patients were increased, and the short-chain fatty acid metabolic pathway (pyruvate → propionic acid/acetic acid) was reduced. Furthermore, microbial characteristics were associated with host variants (such as MYRF and IL17REL) (Hu S et al., 2021).
II. Technical Pathways and Innovations in Multi-omics Joint Analysis
Technology Combination Optimization Strategy
- Low-Depth WGS + High-Depth WES (WEGS): Combining low-depth WGS (2X/5X) with reused WES improves recall rate (e.g., SNV recall rate of 8 reused + 5X WGS is 0.9847 > 0.9830 without reused WES), achieving coding region performance comparable to WES, and capturing more population-specific non-coding variants (identifying 938 more SNVs and 60% more Indels than genotyping arrays).
- It achieves high-precision detection of coding regions (average coverage of 120×) at 1.7-2 times the cost, while simultaneously capturing non-coding region variants (e.g., enhancer SNPs). In a cohort of 862 PAD patients, WEGS identified 44.74 million variants (including 12.89 million new variants), covering all known PAD loci, identifying 4056 more variants/locus than arrays (Bhérer C et al., 2024).
WEGS experimental design overview (Bhérer C et al., 2024)
- Complementary targeted capture technologies: The combination of HaloPlex (selective circularization) and SureSelect V5 (hybrid capture) covers 97% of the CDS region, accurately detecting somatic point variants with low VAF (< 10%) (such as MTOR intraframe deletions and MAP2K1/PTPN11 variants), solving the detection challenge of extremely low VAF somatic variants (Type IIB < 4%, Type I < 10%) in FCD.
- Fujita A et al., through a combination of technologies, discovered that in epilepsy-related brain malformations (such as focal cortical dysplasia FCD and hemispheric megagyre HME), somatic/germline variants are enriched in the PI3K-AKT3-mTOR and RAS-MAPK pathways (such as intraframe deletions of mTOR, MAP2K1/PTPN11 variants, and large CNVs of DEPDC5/TSC1), and these variants activate pathway signals (elevated p-S6/p-ERK). This reveals that abnormal activation of these pathways is a common cause of epileptic brain malformations, providing molecular evidence for precise diagnosis (such as targeted sequencing to detect low VAF somatic variants), treatment (such as mTOR inhibitors), and prognostic assessment (pathway activity monitoring).
Data Integration and Intelligent Analysis
Family Analysis and Mendelian Inheritance
- WES Cosegregation Validation: For dominant genetic diseases (e.g., Huntington's disease), WES data is used to screen for variants cosegregating with the patient's phenotype (e.g., ≥2 generations of patients in a family carrying the same exon mutation), combined with HPO terminology to standardize phenotypic descriptions (e.g., "dystonia").
- Recessive Inheritance Filtering: For recessive genetic diseases (e.g., cystic fibrosis), only homozygous or compound heterozygous mutations are retained (e.g., the CFTR gene p.Arg117His/p.Gly542X combination), while benign variants with a population frequency >1% are excluded (based on the gnomAD database).
Functional Annotation and Structural Prediction
- AlphaMissense-WES Integration: For WES exon region variants, AlphaMissense is used to predict the impact of mutations on protein stability (e.g., BRCA1 p.Arg1753Gln leads to a conformational change in the BRCT domain, predicting ΔΔG = 3.2 kcal/mol).
- WES-Specific Annotation: Functional annotation (e.g., missense mutations, frameshift mutations) is performed on exon region variants captured by WES using the ANNOVAR tool, and linked to the ClinVar database (e.g., TP53 p.Arg273His is labeled as pathogenic).
Machine Learning-Driven Pathogenicity Grading
- REVEL-Integrated Pathogenicity Assessment: By integrating REVEL scores (an ensemble method based on evolutionary conservation and biophysical features) with WES-specific sequencing parameters (e.g., exon coverage >20× and variant allele frequency (VAF) >15%), researchers can construct a more robust pathogenicity prediction model. This combined approach (achieving an AUC of 0.89) provides a 12% improvement in diagnostic accuracy over general models by filtering out sequencing artifacts while prioritizing functionally critical mutations.
- LLM-Assisted Literature Mining: Clinical evidence related to WES variants in PubMed is automatically extracted using the EvAgg large model (e.g., "p.Val600Glu is associated with BRAF inhibitor resistance in melanoma"), and a chain of evidence (PS3/BS3 level evidence) is generated after manual review.
Clinical Translational Case Study
- Precision Treatment of Esophageal Cancer: WES combined with proteomics revealed key gene mutations (TP53, MACF1, AKAP9) regulating protein functions (DNA replication, Wnt signaling, glycolysis). Early 3q chromosome amplification promotes Ca²⁺ signaling and proliferation, and eight dynamic pathways drive progression in stages. Environmental variations (alcohol/non-smoking alcohol consumption) promote DNA replication/mitosis through SBS16/APOBEC characteristics. The target PGK1 (S203 phosphorylation promotes glycolysis, gemcitabine inhibition) was identified, elucidating the causal chain of "genomic variation - protein function - pathway progression," providing a basis for precision diagnosis and treatment (Li L et al., 2023).
- RP-ILD diagnosis: WES identified six variants of the IFIH1 gene, and Sanger sequencing verified the association between these variants (especially rs12479043, rs10930046, and rs141134657) and the risk of ILD, positive anti-MDA5 antibody, and acute ILD onset in DM patients, providing a basis for the study of genetic biomarkers for RP-ILD (Okamoto M et al., 2025).
III. Challenges and Frontier Directions
Technical Bottlenecks and Solutions
- Data Complexity: Single-cell sequencing (scRNA-Seq) has high noise levels, requiring the development of noise reduction algorithms (such as batch correction in Seurat V5). Long-read sequencing error rates (10%-15%) need optimization through error correction algorithms (such as Canu).
- Cost-Efficiency Balance: WGS costs 3-5 times more than WES, but region-specific sequencing (such as exons only + key regulatory regions) can reduce costs to 60% of WGS.
Clinical Application Challenges
- Ethics and Privacy: Standardized databases (such as the All of Us project) need to be established, covering different ethnic groups (such as HLA regional variations in East Asian populations) to avoid diagnostic bias.
- Dynamic Mutation Monitoring: Develop real-time PCR technologies (such as TP-PCR) for FMR1 gene CGG repeat amplification to achieve dynamic prenatal monitoring.
Future Trends
- Modular Analysis Framework: Support for dynamically updated ACMG guidelines and emerging data streams (such as microbiome-host interactions). For example, GATK4 integrates a microbiome functional annotation module to enhance the ability to elucidate IBD mechanisms.
- Precision treatment translation: Personalized drug target screening based on WES + epigenetics. For instance, WES discovered that IDH1 mutations drive gliomas, and combined with methylation analysis, guided the use of IDH inhibitors (such as Ivosidenib).
Conclusion
Integrating WES with other omics technologies (RNA-Seq, WGS, single-cell sequencing) has increased the diagnostic rate of genetic diseases from 35% to 65%. In the future, it is necessary to build a closed loop of "data-algorithm-clinical," using multi-omics matrices to analyze disease heterogeneity, ultimately achieving a leap from "empirical medicine" to "digital twin medicine."
People Also Ask
Why would a patient choose whole genome sequencing rather than whole exome sequencing?
Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. Whole-exome sequencing (WES) is routinely used and is gradually being optimized for the detection of rare and common genetic variants in humans.
What are the limitations of exome sequencing?
Exome sequencing does not target 100% of the genes in the human genome; approximately 97% of exons are targeted. However, ~10% of exons may not be covered at sufficient levels to reliably call heterozygous variants. Each individual may have slightly different coverage yield distributions across the exome.
Which of the following is an advantage of WGS over exome sequencing?
One of the major advantages of WGS is that it provides a more comprehensive view of an individual's genetic makeup. WGS can identify variants that are not present in the exome, including those in non-coding regions and structural variants.
How does multi-omics improve cancer diagnosis?
Multi-omics improves cancer diagnosis by integrating data from genomics, transcriptomics, proteomics, and metabolomics, enabling a comprehensive understanding of tumor biology and personalized treatment strategies.
References
- Bhérer C, Eveleigh R, Trajanoska K, St-Cyr J, Paccard A, Nadukkalam Ravindran P, Caron E, Bader Asbah N, McClelland P, Wei C, Baumgartner I, Schindewolf M, Döring Y, Perley D, Lefebvre F, Lepage P, Bourgey M, Bourque G, Ragoussis J, Mooser V, Taliun D. A cost-effective sequencing method for genetic studies combining high-depth whole exome and low-depth whole genome. NPJ Genom Med. 2024 Feb 7;9(1):8.
- Fujita A, Kato M, Sugano H, Iimura Y, Suzuki H, Tohyama J, Fukuda M, Ito Y, Baba S, Okanishi T, Enoki H, Fujimoto A, Yamamoto A, Kawamura K, Kato S, Honda R, Ono T, Shiraishi H, Egawa K, Shirai K, Yamamoto S, Hayakawa I, Kawawaki H, Saida K, Tsuchida N, Uchiyama Y, Hamanaka K, Miyatake S, Mizuguchi T, Nakashima M, Saitsu H, Miyake N, Kakita A, Matsumoto N. An integrated genetic analysis of epileptogenic brain malformed lesions. Acta Neuropathol Commun. 2023 Mar 2;11(1):33.
- Lu W, Jin T, Yu X, Liu Y, Lu Z, Huang S, Wen Z, Yan H, Su C, Ye Y, Huang Z, Mo Z, Yu Z. Integrating whole-exome sequencing and scRNA-seq reveal the characteristic in one clear cell renal cell carcinoma sample arising in the setting of VHL disease. Sci Rep. 2025 Dec 18;15(1):44077.
- Anichini A, Caruso FP, Lagano V, Noviello TMR, Tufano R, Nicolini G, Molla A, Bersani I, Sgambelluri F, Covre A, Lofiego MF, Coral S, Di Giacomo AM, Simonetti E, Valeri B, Cossa M, Ugolini F, Simi S, Massi D, Milione M, Maurichi A, Patuzzo R, Santinami M, Maio M, Ceccarelli M, Mortarini R; EPigenetic Immune-oncology Consortium Airc (EPICA) investigators. Integrated multi-omics profiling reveals the role of the DNA methylation landscape in shaping biological heterogeneity and clinical behaviour of metastatic melanoma. J Exp Clin Cancer Res. 2025 Jul 18;44(1):212.
- Haukka JK, Antikainen AA, Valo E, Syreeni A, Dahlström EH, Lin BM, Franceschini N, Krolewski AS, Harjutsalo V, Groop PH, Sandholm N; FinnDiane Study Group. Whole-exome and whole-genome sequencing of 1064 individuals with type 1 diabetes reveals novel genes for diabetic kidney disease. Diabetologia. 2024 Nov;67(11):2494-2506.
- Hu S, Vich Vila A, Gacesa R, Collij V, Stevens C, Fu JM, Wong I, Talkowski ME, Rivas MA, Imhann F, Bolte L, van Dullemen H, Dijkstra G, Visschedijk MC, Festen EA, Xavier RJ, Fu J, Daly MJ, Wijmenga C, Zhernakova A, Kurilshikov A, Weersma RK. Whole exome sequencing analyses reveal gene-microbiota interactions in the context of IBD. Gut. 2021 Feb;70(2):285-296.
- Li L, Jiang D, Zhang Q, Liu H, Xu F, Guo C, Qin Z, Wang H, Feng J, Liu Y, Chen W, Zhang X, Bai L, Tian S, Tan S, Xu C, Song Q, Liu Y, Zhong Y, Chen T, Zhou P, Zhao JY, Hou Y, Ding C. Integrative proteogenomic characterization of early esophageal cancer. Nat Commun. 2023 Mar 25;14(1):1666.
- Okamoto M, Yoshida A, Zaizen Y, Ishida M, Shimizu T, Sakamoto N, Hozumi H, Yamano Y, Gono T, Matsuo N, Kaieda S, Kuwana M, Miyamura T, Kawakami A, Mukae H, Suda T, Kondoh Y, Yamamoto K, Hoshino T. Gene variants of interferon induced with helicase C domain 1 in Japanese patients with Dermatomyositis-associated rapidly progressive interstitial lung disease: a genetic association study using whole-exome and Sanger sequencing. Respir Res. 2025 Oct 31;26(1):304.