Whole Exome Sequencing in Population Genetics and Evolution Studies

With the rapid development of high-throughput sequencing technology, Whole-exome sequencing (WES) has become a core tool for elucidating population genetic structure and evolutionary mechanisms. By targeting and capturing approximately 1% of the protein-coding regions in the genome, WES can efficiently identify genetic variations associated with adaptive evolution and disease susceptibility, making it particularly suitable for large-scale cross-species comparative studies.

This article aims to systematically review the latest advancements of WES in population genetics and evolutionary research, and, considering its technological advantages and limitations, explore its potential in species adaptive evolution, gene function annotation, and interdisciplinary applications, in order to provide methodological references and theoretical frameworks for future research.

I. Technical Principles and Core Advantages

WES efficiently identifies genetic variations associated with disease or adaptation by targeting approximately 1% of the protein-coding regions (exons) in the genome. Compared to whole genome sequencing (WGS), WES is less expensive and provides more focused data, making it particularly suitable for large-scale sample analysis. For example, a single WES run can detect over 100,000 exon variations, covering more than 95% of known pathogenic mutations.

II. Key Applications in Population Genetics

WES Reveals Novel Variations and Population Differences

Heutinck PA et al. analyzed 111 Iranian non-syndromic IRD patients using whole-exome sequencing (WES). They found that 59% (66 cases) had a clear genetic cause, involving 31 genes and 53 pathogenic variants (mainly translocation mutations, accounting for 36%). High-frequency pathogenic genes included CERKL (12%), EYS (11%), and RPE65 (9%). They identified 14 novel variants (26%, distributed across 12 genes) and discovered two cases of RP2 gene hemizygous deletions (X-linked recessive inheritance) confirmed by copy number analysis, expanding the understanding of IRD genetic diversity.

WES revealed that 94% (62 cases) of this population had autosomal recessive inheritance (high inbreeding rates leading to predominantly homozygous variants), with 2 cases of X-linked and 2 cases of autosomal dominant inheritance. Five patients were revised from "non-syndromic RP" to syndromic IRD (such as Bardet-Biedl syndrome), and 3 were revised to macular dystrophy. VUS (variables of undetermined significance) were found in 35 cases (32%), and the cause was undetermined in 10 cases (9%). Population analysis showed that the CERKL variant was most common in Turkish, RPE65 in Kurdish, and EYS variants in Farsian, highlighting genetic diversity. The study emphasizes the impact of high inbreeding rates on genetic structure and the necessity of improving local genetic testing capabilities.

Figure 1.Comparing the most frequent causal genes in the three largest ethnic groups.Comparing the most frequent causal genes in the three largest ethnic groups (Heutinck PA et al., 2025)

Population-Level Association Analysis of Disease-Related Genes

Bernstein N et al., through whole blood exome sequencing analysis of 200,000 individuals aged 40-73 years in the UK Biobank, revealed for the first time on a large scale the positive selection pattern of somatic mutations in aging blood. The team identified 52,700 coding region somatic mutations, finding that their number and allele frequency increased with age, and that approximately one-eighth of non-synonymous mutations were positively selected (dN/dS=1.13). Building on this, the research team broke through the limitations of the traditional 74 CH genes, identifying 17 new positive selection driver genes (such as BAX, CHEK2, and MYD88) using methods such as dNdScv. Mutations in these genes increased the prevalence of large clonal (VAF>0.1) CH by 18%, and their clonal frequency and size also increased with age, consistent with the classic CH gene pattern.

The 17 newly discovered genes not only expand the spectrum of hematopoietic cell proliferation (CH) drivers but also reveal their strong association with various clinical phenotypes: MYD88/IGLL5 mutations significantly increase the risk of chronic lymphocytic leukemia (CLL), ZBTB33 mutations lead to abnormal bone marrow cell counts, and most of the newly discovered gene-driven CH are closely associated with increased risk of infection and all-cause mortality, with adverse outcomes worsening with clonal enlargement. This finding provides a new perspective on understanding the role of clonal hematopoiesis in aging, cancer, and hematological diseases, and also points to potential directions for targeted interventions in abnormal clonal expansion and delaying related aging phenotypes.

Identification of Genes Associated with Strabismus

Duan W et al. analyzed 10 lineages of exotropia associated with strabismus in Yunnan, China (47 individuals, including 32 patients) using whole-exome sequencing combined with Sanger sequencing, revealing the genetic mechanism for the first time. The study identified seven potential pathogenic genes (COL4A2, SYNE1, LOXHD1, AUTS2, GTDC2, HERC2, and CDH3) and found 11 missense variants co-segregated with lineage members—all of which were harmful, low-frequency (<5% in the general population), and matched for autosomal dominant inheritance, with a mean age of onset of 3.35 ± 1.51 years (earlier than sporadic cases). Among them, COL4A2 (variants in 3 lineages) and SYNE1 (variants in 2 lineages) were directly associated with the disease, and five genes (COL4A2, SYNE1, AUTS2, HERC2, and CDH3) overlapped with the Human Phenotypic Ontology (HPO) strabismus gene list.

These genes fall into two functional categories: neurally related (COL4A2, SYNE1, etc., involved in neural development and synaptic anchoring) and muscle-related (CDH3, GTDC2, etc., involved in extraocular muscle development). Subsequent target-capture sequencing validated these genes in 220 sporadic cases: AUTS2 revealed 15 variants (12 SNPs + 3 Indels), and GTDC2 revealed 4 SNP variants. The study confirms the genetic heterogeneity of exotropia, providing candidate genes for early diagnosis and precision treatment, but larger sample sizes and functional validation (e.g., at the mRNA/protein level) are needed to confirm the mechanism.

Figure 2.Workflow for genetic variant detection and validation in concomitant exotropia pedigrees.Workflow for genetic variant detection and validation in concomitant exotropia pedigrees (Duan W et al., 2025)

III. Case Studies in Evolutionary Biology

Primate Adaptive Evolution

Hyakawa T et al. analyzed the genetic structure and evolutionary characteristics of 42 chimpanzees using whole-exome sequencing (combined with human bait capture) (non-invasive samples of wild feces + blood samples from captive populations). Maximum likelihood tree reconstruction of mitochondrial genes showed significant monophyletic distribution among chimpanzee subspecies (western, central, and eastern), but no monophyletic groups were formed in some local populations. Autosomal multivariate analysis (MDS, hierarchical clustering, and mixture analysis) effectively distinguished subspecies (western vs. central/eastern) and local populations (e.g., the separation of Bosu western chimpanzees from captive populations), and identified hybrid samples (e.g., the western-central hybrid Chloe), confirming that exon sequences can trace geographical origin and population differentiation.

The study also found a significant negative correlation between exon heterozygosity and the genome-wide nonsynonymous/synonymous substitution ratio (N/S), indicating a mutational burden (accumulation of mildly harmful mutations in small populations). Of the 23,534 coding genes identified, 60% contained segregated pseudogenes (alleles that function as both functional and pseudogenes). Among these, trans-subspecies shared pseudogenes of chemoreceptor receptor genes (OR7D4 olfactory receptor, TAS2R42 bitter taste receptor) (e.g., the OR7D4 insertion pseudogene shared between Bosu and Mahalai) were potential candidates for balanced selection or related to olfactory/gustatory adaptations. This method validated the feasibility of non-invasive sampling with lysis buffer, providing a cost-effective tool for conservation genetics in wild chimpanzees and can be extended to other non-model organisms.

Figure 3.Exome nucleotide variations clustered at the subspecies and local population levels.Exome nucleotide variations clustered at the subspecies and local population levels (Hyakawa T et al., 2025)

WES Reveals Genes of Coat Color Differentiation

Yan, X et al. analyzed 46 individuals from five species of rhesus monkeys on Sulawesi Island using whole-exome sequencing (combined with a human exon capture kit), identifying approximately 550 highly differentially expressed genes (SNP-rich regions in the top 5% of Fst values). These genes are involved in key biological processes such as pigmentation, cell adhesion, signal transduction (e.g., Rho GTPase, TGF-beta pathway), and stress response. Pigmentation-related gene differentiation was particularly prominent: missense mutations in TYR (tyrosinase) (e.g., D132N) and LRIT3 (S394P, Y363D) are shared only in dark-coated species (M. nigra, M. nigrescens), potentially driving coat color differences by regulating the rate-limiting step of melanin synthesis or FGFR1 signaling; missense variations in genes such as MC1R and ASIP may affect receptor function and melanin deposition, highlighting the contribution of genetic differentiation to species-specific traits.

The study also identified a large number of fixed SNPs (Fst=1) as species-specific genetic markers. Four comparisons revealed a total of 8380 fixed SNPs (41.18%-45% located in exons), involving 705-861 genes (such as TGM4 and ZFHX3), which can be used for species differentiation and hybridization risk monitoring. Fst analysis showed uneven differentiation among species (the highest average Fst=0.179 was observed in the NgNc comparison), indicating that local adaptations (such as positive selection for coat color) and genetic drift jointly drive rapid differentiation. These findings provide crucial genomic evidence for understanding primate speciation, adaptive evolution of coat color, and the conservation of endangered rhesus macaques (such as maintaining genetic integrity).

Figure 4.The distribution of the top 5% SNPs Fst values across all four pairwise species comparisons.The distribution of the top 5% SNPs Fst values across all four pairwise species comparisons (Yan, X et al., 2025)

IV. Technical Advantages and Challenges

  • Advantages:
    • High Cost-Effectiveness: Single-sample sequencing cost is only 1/10 of WGS, suitable for cohort studies involving thousands of participants.
    • Highly Efficient Variant Detection: Detection rate for low-frequency variants (<1%) is more than 30% higher than SNP chips.
  • Challenges:
    • Coverage Bias: Exon capture efficiency is affected by genomic repetitive sequences, and some low-frequency variants may be missed.
    • Limitations in Functional Interpretation: Covering only coding regions makes it difficult to assess the role of regulatory variants or non-coding RNAs.

V. Future Development Directions

  • Multi-omics Integration: Combining epigenomic and transcriptomic data to analyze the functional impact of variants. For example, validating the epigenetic regulatory mechanisms of exon variants through methylation analysis.
  • Artificial Intelligence Assistance: Utilizing deep learning to predict the pathogenicity of variants, accelerating disease gene screening.
  • Cross-Species Comparison: Extending to non-model organisms (such as wild primates) to reveal evolutionarily conserved adaptive mechanisms.

Conclusion

WES provides an efficient tool for population genetics and evolutionary research, but its application requires validation from multiple dimensions, including epigenetics and functional experiments. With decreasing sequencing costs and algorithm optimization, WES will play a more central role in areas such as human disease mechanisms and adaptive evolution of species.

People Also Ask

What is the whole exome sequencing test used for?
The whole exome sequencing test is primarily used in clinical diagnostics to identify genetic mutations causing rare inherited disorders, and in research to study cancer genomics, complex diseases, and population genetics.
Is WGS or WES more expensive?
WGS currently costs two to three times as much as WES, but most of the cost of WGS (>90%) is directly related to sequencing whereas WES cost is mainly due to the capture kit.
What diseases can whole exome sequencing detect?
WES is valuable for pediatric patients with conditions such as multiple congenital anomalies, neurodevelopmental disorders, and epilepsy where genetic etiology is suspected.
How long does it take to get results from whole exome sequencing?
Turnaround time is about 5 days, but can vary based on factors such as collection date, sample quality/quantity and completeness of patient information provided.

References

  1. Heutinck PAT, Iglesias AI, Farhud DD, van Tienhoven M, Khoshraftar A, Zarif-Yeganeh M, Kia SK, Ghanbari M, Smoor MA, Klaver CCW, Hoefsloot LH, Thiadens AAHJ, Verhoeven VJM. Diagnostic whole exome sequencing in presumably autosomal recessive inherited retinal dystrophies in an Iranian population. Sci Rep. 2025 Jul 3;15(1):23745.
  2. Bernstein N, Spencer Chapman M, Nyamondo K, Chen Z, Williams N, Mitchell E, Campbell PJ, Cohen RL, Nangalia J. Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis. Nat Genet. 2024 Jun;56(6):1147-1155.
  3. He F, Pasam R, Shi F, Kant S, Keeble-Gagnere G, Kay P, Forrest K, Fritz A, Hucl P, Wiebe K, Knox R, Cuthbert R, Pozniak C, Akhunova A, Morrell PL, Davies JP, Webb SR, Spangenberg G, Hayes B, Daetwyler H, Tibbits J, Hayden M, Akhunov E. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat Genet. 2019 May;51(5):896-904.
  4. Duan W, Zhou T, Huang X, He D, Hu M. Whole-exome sequencing uncovers the genetic basis of hereditary concomitant exotropia in ten Chinese pedigrees. BMC Med Genomics. 2025 Jan 7;18(1):4.
  5. Hayakawa T, Kishida T, Go Y, Inoue E, Kawaguchi E, Aizu T, Ishizaki H, Toyoda A, Fujiyama A, Matsuzawa T, Hashimoto C, Furuichi T, Agata K. Genome-scale evolution in local populations of wild chimpanzees. Sci Rep. 2025 Jan 2;15(1):548.
  6. Yan X, Arakawa N, Widayati KA, Purba LHPS, Fahri F, Suryobroto B, Terai Y, Imai H. Exome analysis reveals species divergence in TYR and identifies species genetic markers in five endemic Macaca species on Sulawesi Island. BMC Ecol Evol. 2025 Jul 3;25(1):66.
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Related Services
PDF Download
* Email Address:

CD Genomics needs the contact information you provide to us in order to contact you about our products and services and other content that may be of interest to you. By clicking below, you consent to the storage and processing of the personal information submitted above by CD Genomcis to provide the content you have requested.

×
Quote Request
! For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Contact CD Genomics
Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
Top