Two Core Techs for Epigenetic Clock DNA Methylation Detection

As a key tool for quantifying biological age, the accuracy of the epigenetic clock is highly dependent on the performance of DNA methylation detection technology. Efficient and reliable methylation detection methods are the core foundation of clock model construction, verification, and application transformation. At present, Illumina BeadChip chip technology and genome-wide bisulfite sequencing (WGBS) constitute two core technology systems in this field, which jointly support the development of the epigenetic clock from basic research to clinical practice. Among them:

  • Illumina BeadChip chip, with its standardized process, controllable cost, and high adaptability to mainstream clock models, has become the gold standard for large-scale queue research, consumer age prediction, and other scenarios, and can realize the Qualcomm quantitative analysis of hundreds of thousands of key CpG loci.
  • GWBS, with the advantages of single-base resolution and unbiased coverage of the whole genome, shows irreplaceable value in in-depth research such as mining new age-related CpG loci, analyzing methylation regulation mechanisms, and accurate diagnosis of diseases.

The article details two core DNA methylation detection technologies for epigenetic clocks—Illumina BeadChip and WGBS, and compares their performance, explains bioinformatic pipelines, and guides technology selection.

The Gold Standard: Illumina's BeadChip Arrays

In the research of DNA methylation, the Illumina BeadChip has established the gold standard status with more than ten years of technical verification and a mature application ecology. It covers the core products such as Infinium MethylationEPIC (850k) and the previous generation 450k, and realizes the Qualcomm detection of specific CpG sites through precise probe design. The platform has the advantages of controllable cost, stable data, and a perfect analysis system, which widely supports the research of epigenetic clock construction and disease mechanism exploration, and has become a key tool to connect basic scientific research with clinical transformation.

Analysis of 850k and 450k Chip

In the field of DNA methylation detection, Illumina's BeadChip series has long occupied the position of gold standard because of its stable performance and standardized process. Among them, the 850k chip and its predecessor, the 450k chip, have become the core tools of epigenetic research, as follows:

  • 450k chip: As an early milestone product, this chip was launched in 2011, and it realized the Qualcomm quantity detection of large-scale CpG site methylation for the first time. It covers about 485,000 CpG loci in the whole genome, including promoter region (about 76% of RefSeq promoter), CpG island, and part of the genome region, which provides key technical support for the initial construction of the epigenetic clock.
  • 850k chip: 850k chip launched in 2016 has been fully upgraded, and the detection sites have been expanded to about 853,000 CpG sites. Based on retaining the core sites of the 450k chip, the chip added 368,000 sites, focusing on strengthening the coverage of enhancers, open chromatin regions, and intergenic regions. In addition, the 850k chip also includes disease-related differential methylation regions (DMRs) and epigenetic clock core sites, which makes it perform better in application scenarios such as aging assessment and disease diagnosis.

Both chips adopt the Infinium technology principle, and accurate detection of the methylation state is realized through two probe designs:

  • Type I probe: Methylated and unmethylated CpG sites were detected by two different beads
  • Type II probe: Two states can be detected simultaneously by a single bead design

This combination design not only ensures the accuracy of detection, but also takes into account the coverage of sites

Chronological age distribution by sex, derived from tooth growth layer groups for the calibration dataset (Bors et al., 2021) The distribution of chronological age by sex estimated from tooth growth layer groups for the calibration dataset (Bors et al., 2021)

Qualcomm's Detection Mechanism of Chip Technology

Illumina BeadChip is based on microbead array technology, which realizes the simultaneous analysis of a large quantity of hundreds of thousands of specific CpG loci in the whole genome. Its core carrier is a solid matrix, and microspheres with a diameter of about 3μm (with a density of millions) are fixed on the surface. Hundreds of thousands of homologous oligonucleotide probes are covalently coupled to the surface of each microsphere, and each CpG site is repeatedly characterized by multiple microspheres, to ensure the repeatability and accuracy of detection. The detection process follows the three-stage mode of probe hybridization, signal detection, and quantitative analysis:

  • Sample pretreatment: genomic DNA was treated with bisulfite, and unmethylated cytosine (C) was converted into uracil (U), while methylated cytosine (5mC) remained chemically stable.
  • Specific hybridization: The DNA samples amplified by PCR and fluorescently labeled are specifically bound to the chip probes. The methylated site probes only recognize the DNA chains with C bases, while the unmethylated site probes are complementary to the DNA chains modified by U (converted into T after amplification).
  • Signal acquisition: The fluorescence intensity on the surface of microbeads was measured by a laser scanning system, in which type I probes used Cy3 (green) and Cy5 (red) dual-channel labels to detect methylated and unmethylated signals, respectively. The type II probe obtains the corresponding signal based on the design of the single bead and double probe.
  • Data processing: By calculating the ratio of the two fluorescent signals, the methylation level of a single CpG locus can be accurately quantified. This technique can analyze hundreds of thousands of loci in parallel in a single experiment, the detection period of a single sample is 24-48 hours, and the coefficient of variation (CV) between batches is less than 5%, which provides reliable technical support for large-scale cohort research and clinical application.

Survival Curve Variations Between Individuals With and Without PhenoAgeAccel* (Ho et al., 2023) Difference in the survival curves between those with and without PhenoAgeAccel* (Ho et al., 2023)

The Emerging Challenger: WGBS

In the field of DNA methylation detection, the Illumina BeadChip has dominated the market for a long time, but the preset site detection mode has coverage limitations. WGBS, combined with bisulfite transformation and high-throughput sequencing, broke through the restriction of site selection and achieved single-base resolution detection of cytosine methylation in the whole genome. With unbiased coverage, this emerging technology provides a new path for epigenetic research and clock model optimization, and becomes a strong competitor of chip technology.

Principle and Function of Bisulfite Treatment

Genome-wide bisulfite sequencing is the gold standard technology for DNA methylation detection, and its core is to realize the chemical distinction between methylated cytosine and unmethylated cytosine based on bisulfite transformation. The details are as follows:

  • Chemical transformation mechanism: In an acidic environment, bisulfite reacts with unmethylated cytosine in the DNA chain to produce cytosine sulfonate. Uracil sulfonate is formed by deamination, and finally, U is produced by sulfonation under alkaline conditions. 5mC can resist bisulfite modification and keep its original structure because of the steric effects of the methyl group.
  • PCR amplification and site determination: Under ideal conditions, the transformation rate of bisulfite can reach over 99%. After the transformed DNA was amplified by PCR, uracil was recognized as T by DNA polymerase, and methylated cytosine remained as C. By comparing the sequencing result with the reference genome, if the sequencing result is T and the reference genome is C, the site is judged to be unmethylated; If both of them are C, they are methylation sites.

Technical Challenges and Optimization Schemes

  • Technical challenge: The quality of bisulfite treatment directly affects the accuracy of detection, mainly facing problems such as incomplete transformation (producing false positive methylation signal), DNA degradation (affecting the construction of sequencing library), and excessive transformation (leading to false negative signal).
  • Optimization strategy: To overcome the above challenges, low pH and high concentration bisulfite reagent combined with heating treatment are currently used to improve the conversion efficiency; Adding a DNA protective agent to reduce degradation. At the same time, positive and negative controls were set to monitor the transformation quality in real time.

Arabidopsis WGBS Coverage Distribution (Liao et al., 2015) Coverage distribution of Arabidopsis WGBS (Liao et al., 2015)

Advantages of WGBS Unbiased Detection

Genome-wide bisulfite sequencing, as the core technology of epigenetics research, realizes the single-base resolution detection of cytosine methylation in the whole genome through the synergistic effect of bisulfite transformation and high-throughput sequencing. Its technical advantages can be analyzed from the following dimensions:

  • A. Comprehensive and unbiased detection range
    • a) Site coverage integrity: WGBS breaks through the limitation that chip technology relies on preset probes, and can cover all CpG sites in the whole genome, including promoters, CpG islands, genomes, enhancers, insulators, and intergenic regions. At the same time, this technique can detect the methylation status of non-CpG loci (CHG, CHH) and provide key information for the study of epigenetic regulation mechanisms of complex physiological and pathological processes such as embryonic development and tumorigenesis.
  • B. Advantages of detection accuracy and resolution
    • a) Single-base quantitative analysis: WGBS can accurately quantify at the single-base level. By calculating the ratio of methylated reads at specific sites to total reads, the methylation level value of 0-100% can be obtained continuously. Compared with chip technology, its detection sensitivity is significantly improved, and it can identify low-frequency methylation variation and chimeric methylation patterns, which provides technical support for the discovery of new age-related CpG loci in epigenetic clock research and helps to build a more accurate clock model.
  • C. Application value in clinical research
    • a) Analysis of disease mechanism: WGBS shows unique advantages in the diagnosis of developmental and epileptic encephalopathy in children. After the preliminary screening of 850k chips found the differentially methylated regions related to the CHD2 gene, the researchers used WGBS for in-depth analysis to successfully refine the epigenetic characteristics of the gene and clarify the pathogenesis.
    • b) Detection of rare regulatory regions: WGBS can detect the methylation changes of rare regulatory regions that cannot be covered by the chip, and provide more comprehensive epigenetic information for disease diagnosis and prognosis evaluation. With the continuous decline of sequencing cost, the application of WGBS in epigenetic clock research shows a significant upward trend.

Choosing the Right Tool: Microarray vs. Sequencing

In DNA methylation detection, the choice of chip and sequencing technology is the core premise to determine the efficiency and depth of research.

  • Chip technology is characterized by preset sites, Qualcomm, and low cost, which is suitable for standardized analysis of large-scale queues.
  • Sequencing technology realizes unbiased coverage of the whole genome and single-base resolution, which is suitable for new site mining and mechanism analysis.

The essential differences between them in coverage, cost, and applicable scenarios highlight the importance of scientific balance in technology selection based on research objectives and resource conditions.

There are significant differences between chip technology and WGBS in core performance indicators, which directly determine its applicability in different research scenarios, as shown in the following table:

Comparison Between chip and sequencing technology

Performance Indicator Illumina BeadChip Array WGBS
Cost Approximately 1,000–2,000 RMB per sample Approximately 5,000–15,000 RMB per sample
Throughput 96–384 samples detectable per run 8–24 samples detectable per run (depending on sequencing depth)
Genomic Coverage ~850,000 preselected CpG sites (~2% of the whole genome) All CpG sites across the whole genome (~28 million)
Resolution Locus-level quantification (β-value) Single-base resolution quantification (methylation rate)
Limit Difficult to detect methylation variants < 5% Capable of detecting methylation chimeric events as low as 1%
Starting Sample Requirement 50–250ng of genomic DNA required 100–500ng of genomic DNA required

In terms of cost, the cost of single-sample detection of chip technology is only 1/5-1/10 of that of WGBS, which has obvious advantages in large-scale queue research. In terms of flux, the chip microplate design can process hundreds of samples at a time, and WGBS is limited by the flux of the sequencer, so the sample processing ability is low.

The core difference between them lies in the coverage: The chip focuses on the preset CpG sites in the known regulatory regions, while WGBS achieves genome-wide coverage without dead ends, which is conducive to discovering new functional sites. In terms of resolution, WGBS reaches the level of single-base, which can accurately calculate the methylation rate.

Due to the probe design, the detection accuracy of some sites is slightly weak. In terms of detection sensitivity, WGBS can identify low-frequency methylation variation in trace samples, which is suitable for tumor liquid biopsy, but it is difficult for the chip to detect methylation variation below 5%.

From Raw Data to Methylation Proportions: The Bioinformatic Pipeline

The raw data generated by DNA methylation detection needs to be processed by a standardized bioinformatics process before it can be transformed into quantitative data representing the methylation level. This process eliminates technical deviation through quality control, eliminates batch effect through normalization, and obtains the methylation ratio through quantitative calculation. It is the core link of linking detection technology with epigenetic clock modeling and realizing accurate aging evaluation, which directly determines the reliability and accuracy of subsequent analysis results.

Core Steps of Quality Control

From the original data generated by chip or sequencing to the methylation ratio data that can be used for age prediction, it needs to go through a standardized bioinformatics processing flow, the core of which includes three key links: quality control (QC), normalization, and methylation quantification. The specific processing methods of different technology platforms are different.

Quality control is the basis of data reliability. The original fluorescence signal was extracted from the chip data by Illumina GenomeStudio software, and the samples with an average detection rate over 95%, balanced fluorescence intensity, and normal internal control were screened. For WGBS data, the base quality was evaluated by FastQC (keeping Phred≥30), and impurities were removed by Trim Galore and other software, and the conversion efficiency of bisulfite was required to be higher than 99%.

Normalization eliminates technical deviation. Chip data were corrected by background correction, SWAN color balance, and ComBat batch effect, while WGBS data were normalized by RPM calculation or quantile to unify the sequencing depth differences of different samples.

Methylation quantification is the core output. The chip data is quantified by β value (0-1), and the WGBS data obtains the methylation rate by calculating the ratio of methylated reads, and achieves higher accuracy with single-base resolution.

Transformation from Data to Age Prediction

After preliminary processing, the original data needs to be processed in three stages: data filtering, site selection, and feature integration, before it can become the available input of the epigenetic clock model.

  • Data filtering: the chip data eliminates the sites with p-value> 0.05 and detection rate < 90% in more than 5% samples. The WGBS data filters the areas with coverage depth less than 10× and low quality of reference genome, to reduce noise.
  • Site selection: Based on known models (such as consumer-level detection), core sites (such as 353 sites of the Horvath clock) are directly extracted. To construct a new model, correlation and differential methylation analysis were combined with elastic network regression to screen significant age-related loci and remove redundancy.
  • Feature integration: Firstly, the missing values are processed, a small number of missing values are filled with mean or k nearest neighbors, and the sites with high missing rate are eliminated. Then, the methylated β values of the samples are arranged into a matrix format to form an input data set.

Significance of the Top 15 Biomarker Principal Components in Biomarker Ages of Healthy Males and Females (Chan et al., 2021) Importance of the top 15 biomarker principal components in the biomarker ages for healthy men and women (Chan et al., 2021)

Conclusion

DNA methylation detection technology supports the development of the epigenetic clock, and the Illumina BeadChip chip and WGBS form a technical bipolar. The 850k chip has become the gold standard for research and consumer applications because of its excellent site coverage, standard process, and controllable cost. WGBS has irreplaceable value in basic research because of its single-base resolution and genome-wide coverage.

The choice of technology depends on research objectives and resources. An 850k chip is suitable for large-scale queuing, consumer-level forecasting, and other scenarios. WGBS is more suitable for in-depth research such as mechanism exploration and disease diagnosis. Simplified genome methylation sequencing (RRBS) is also of unique value in balancing coverage and cost.

The bioinformatics process is the key to connecting technology and prediction, standardization processing determines the accuracy of prediction, and a stable process is the guarantee of standardized application. In the future, epigenetic clock detection technology will develop to "accuracy, multi-dimension, and low cost". Chip optimization site, WGBS, to reduce costs, multi-omics integration to improve the comprehensiveness of evaluation, and promote technology to clinical and health management.

FAQ

1. For large-scale cohort epigenetic clock research, which is better between Illumina BeadChip and WGBS?

Choose Illumina BeadChip. It costs only 1/5-1/10 of WGBS per sample, can process 96-384 samples at a time, and is more suitable for standardized analysis of large queues.

2. Can WGBS detect non-CpG loci?

Yes. It can cover methylation status of non-CpG loci (CHG, CHH) besides all CpG sites, which chip technology can't do.

3. What's the key index to judge WGBS data quality?

Bisulfite conversion efficiency, which needs to be over 99%; also, base quality (Phred ≥ 30) and coverage depth (filter areas < 10×)

4. Why can't Illumina BeadChip detect methylation variants below 5%?

It's limited by probe design. Its detection sensitivity is lower than WGBS, so low-frequency variants (<5%) are hard to identify.

References

  1. Bors EK, Baker CS, Wade PR, et al. "An epigenetic clock to estimate the age of living beluga whales." Evol Appl. 2021 14(5): 1263-1273.
  2. Ho KM, Morgan DJ, Johnstone M, Edibam C. "Biological age is superior to chronological age in predicting hospital mortality of the critically ill." Intern Emerg Med. 2023 18(7): 2019-2028.
  3. Chan MS, Arnold M, Offer A, et al. "A Biomarker-based Biological Age in UK Biobank: Composition and Prediction of Mortality and Hospital Admissions." J Gerontol A Biol Sci Med Sci. 2021 76(7): 1295-1302.
  4. Liao WW, Yen MR, Ju E, Hsu FM, Lam L, Chen PY. "MethGo: a comprehensive tool for analyzing whole-genome bisulfite sequencing data. BMC Genomics." 2015 16 Suppl 12(Suppl 12): S11.
! For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Related Services
x
Online Inquiry