MSA Case Study: Combination with EWAS for Methylation Sites Detection

In recent years, DNA methylation has always been one of the hot research directions in the field of basic scientific research and clinical medicine transformation, and basic scientific research has achieved fruitful results. In addition, it has far-reaching significance in the fields of product transformation and application, such as early screening of tumors, auxiliary diagnosis of mental diseases, and aging DNA clocks. The analysis of DNA methylation profile is also very important to clarify the mechanism of gene transcription, understand the maintenance of cell identity, study the changes of cell composition, and study the interaction between genes and environment in the population. The Infinium DNA methylation chip series developed by ILLUMINA Company aims to screen loci with significant correlation or difference with different traits and potential biological mechanisms at the whole genome level.

From the early 27K chip, the later 450K and 850K chip series to the latest iteration of 935K chip by the end of 2022, the continuous genome coverage and the optimal design of important regulatory regions provided reliable data guarantee for the research in the EWAS field. All of the above chips have made outstanding achievements, and they have become the most widely used technical means for DNA methylation research in the world. However, with the accumulation of scientific and statistical conclusions about DNA methylation of a wide range of traits and the transformation demand in the field of big health medicine, illumina developed a brand-new Methylation Screening Array (MSA, 270K) methylation chip at the beginning of 2024, focusing on the screening and verification of different human traits.

The article presents a case study on the Illumina Methylation Screening Array (MSA, 270K), detailing its principle, advantages, disadvantages, and application in combination with EWAS for methylation site detection.

An outline of the design process for mammalian methylation arrays (Arneson et al., 2022) Overview of mammalian methylation array design process (Arneson et al., 2022)

The array utilizes a great deal of data from the previous Apparent genome-wide association studies (EWAS) based on the Infinium platform. It combines the latest knowledge of single cells and cell types to analyze genome-wide methylation profiles. MSA is designed to achieve scalable screening of epigenetic-trait association in ultra-high sample throughput, covering many human traits, including genetic, cellular, environmental, and demographic variables, and human diseases, such as hereditary diseases, neurodegenerative diseases, cardiovascular diseases, infectious diseases, and immune diseases.

Principle of Methylation Screening Array

The design of the Illumina 270K chip is highly targeted and scientific, focusing on the regions with key regulatory functions in the genome. The chip accurately covers 27,578 CpG loci, which are closely related to gene expression regulation as "signal hubs" in the genome regulation network. It covers about 14,495 genes, 70% of which are promoters, which are the key regulatory elements of gene transcription initiation and directly affect the expression activity of genes. In addition, it also includes CpG islands, a special region rich in CpG sites, and some genome regions. The methylation status of these regions plays an important role in regulating gene expression.

From the perspective of technical principle, Illumina 270K chip adopts the bisulfite conversion method as the core detection means. This method uses chemical reagents to treat DNA, in which unmethylated cytosine (C) will be specifically converted into uracil (U), while methylated cytosine (5mC) will remain chemically stable and will not be converted. Based on this characteristic, the researchers designed specific probes for the transformed sequences to achieve accurate and quantitative detection of DNA methylation levels.

Methylation profile scores (MPS) as described (Nabais et al., 2023) Methylation profile scores (MPS) (Nabais et al., 2023)

In probe design, the chip adopts Infinium I and Infinium II differentiation strategies:

  • Infinium I probe adopts single base extension technology, and by adding labeled nucleotides at the 3' end of the extension primer, it combines with the target sequence according to the principle of base complementary pairing, thus realizing the reading of methylation signal.
  • Infinium II probes are based on the principle of allele-specific hybridization, and the methylation status is distinguished by designing probes that are complementary to methylated and unmethylated sequences.

Finally, the detection results of the chip quantify the methylation level of each CpG site by β value, and the calculation formula is β value = methylation signal/(methylation signal+non-methylation signal), and its numerical range is between 0 and 1:

  • β=0 means that the site is completely unmethylated
  • β=1 means that the site is completely methylated, and the intermediate value reflects the methylation state of different degrees.

This quantitative method provides a standardized evaluation index for subsequent data analysis and interpretation of biological significance.

Methylation Screening Array Advantages and Disadvantages

As an early mainstream methylation detection tool, the Illumina 270K chip takes 27,578 CpG sites as the core, focusing on key regions such as promoters and CpG islands. With its low cost, standardized process, and high repeatability, it has become an important choice for large-scale population appearance research. However, its coverage is limited (less than 1% of the total genome CpG), and its ability to detect complex regions and low-frequency variations is insufficient, which has certain technical limitations.

Technical Superiority

  • High cost performance: Compared with whole genome bisulfite sequencing (WGBS), the cost of the 270K chip is reduced by about 20 times, and hundreds of samples can be detected in a single experiment, which is suitable for large sample cohort research.
  • Targeted coverage: Focus on the promoter and CpG island. The methylation changes in these areas are directly related to the regulation of gene expression, which is convenient for analyzing functional significance. For example, hypermethylation of the tumor suppressor gene promoter often leads to gene silencing, which is a typical apparent marker of cancer.
  • Standardized process: Illumina provides a unified experimental operation and data analysis process (such as GenomeStudio software), with high data repeatability and strong cross-laboratory comparability.

Limitations

  • Coverage is limited: Only about 27,000 CpG loci are detected, which is less than 1% of the total CpG in the human genome, and key regulatory regions (such as remote enhancers) may be missed.
  • Bias: The coverage of regions with high GC content is good, but the coverage of repeated sequences and non-CpG methylation sites is insufficient, which limits the study of complex genome regions.
  • Resolution limitation: it is impossible to distinguish allele-specific methylation, and it is difficult to detect low-frequency methylation variation.

Uses of DNA methylation-based trait prediction (Nabais et al., 2023) Applications of DNA methylation-based trait prediction (Nabais et al., 2023)

CD Genomics provides DNA Methylation Screening Array (Illumina 270K), which covers CpG sites in key functional areas and supports large-scale sample screening. But it is only used for scientific research purposes and does not involve clinical diagnosis or treatment services.

270K Chip for the Study of EWAS in Wide Range of Traits

Title: MSA: scalable DNA methylation screening BeadChip for high-throughput trait association studies

Publish Magazine: bioRxiv

Publication Time: 2024.05.21

DOI: https://doi.org/10.1101/2024.05.17.594606

As a key epigenetic regulation mechanism, DNA methylation is closely related to human phenotype and diseases, and is widely used in clinical diagnosis, risk assessment, and other fields. Although the existing Infinium methylation chips (such as EPICv2) promote large-scale epigenome association research (EWAS), there is a balance between coverage and detection efficiency: genome-wide sequencing is expensive and difficult to popularize, while the traditional chip design does not fully integrate the latest single cell methylation data and EWAS discovery, resulting in insufficient coverage of cell type-specific markers and phenotypic association sites.

In order to solve this contradiction, a MSA was designed in this study, which aims to achieve high-throughput phenotypic association screening by integrating massive EWAS data, single cells and bulk genome methylation maps, while retaining support for cell type analysis, apparent clock prediction and other functions, and providing a more accurate and economical tool for large-scale population research and clinical application.

Summary of DNA methylation findings (David et al., 2024) Overview of DNA methylation results (David et al., 2024)

MSA is Highly Reproducible and Accurate

On the aspect of phenotypic association, the team deeply excavated 1067 EWAS data, and accurately screened out the key association sites covering 16 major disease phenotypes, such as neurodegenerative diseases (such as Alzheimer's disease and Parkinson's disease), cardiovascular diseases (coronary atherosclerosis and myocardial infarction) and metabolic diseases (type 2 diabetes and obesity). At the same time, the chip was innovatively integrated with single-cell methylation sequencing data, and the specific methylation markers of different cell types were accurately identified by a bioinformatics algorithm, and a database of specific markers covering major cell lineages such as immune cells, nerve cells, and epithelial cells was established.

The final MSA chip contains 284,317 probes, which cover 269,094 genome sites, forming a high-density detection network. About 50% of the loci are derived from the confirmatory results of previous large-scale EWAS studies, ensuring the clinical relevance of the detection loci; The other 50% comes from the dynamic methylation region found by the latest single cell sequencing and bulk sequencing, which captures the dynamic changes of methylation in the process of tissue development and disease progression.

The MSA uncovers the biology underlying tissue-specific methylation (David et al., 2024) MSA reveals the biology of tissue-specific methylation (David et al., 2024)

Compared with the traditional EPICv2 chip, the MSA chip shows significant performance advantages. In the phenotypic association dimension, the MSA chip has an average of 5.6 phenotypes associated with each locus, which is 2.5 times higher than the EPICv2 chip's 2.2 phenotypes, greatly improving the efficiency of multi-phenotypic association analysis. In terms of cell-specific markers, the MSA chip added 48 kinds of cell-type contrast markers, especially for rare cell subtypes (such as VIP intermediate neurons with the unique molecular marker SRGAP1).

By designing a multi-probe combination detection strategy, the methylation detection sensitivity of these rare cell types was improved to three times that of traditional methods. These design optimizations make the MSA chip have stronger analytical ability in the fields of complex disease mechanism research, biomarker discovery, and so on.

Sensitivity of methylation detection in rare cell types (David et al., 2024) Methylation detection sensitivity of rare cell types (David et al., 2024)

Analysis of Methylation Profile of Tissue Samples by MSA

At the level of tissue sample analysis, MSA also shows strong analytical ability. By principal component analysis (PCA), we can clearly distinguish the methylation profiles of immune cells, skin, heart, and other tissue types. The cumulative variance of the first three principal components is over 85%, forming an obvious cluster distribution. Combined with reference-dependent deconvolution algorithm, MSA can accurately analyze complex tissue components: taking skin samples as an example, it can accurately identify epidermal keratinocytes (78.2%) and melanocytes (12.5%); In heart samples, the ratio of myocardial cells (65.3%) to endothelial cells (27.8%) is highly consistent with pathological analysis, which provides high-resolution epigenetic data support for multi-group research.

MSA can also capture the biological characteristics of tissue-specific methylation: through one-group comparative analysis, the research team systematically identified thousands of highly specific tissue-specific CpG markers. Take cerebellum tissue as an example, it shows obvious hypomethylation characteristics, and these hypomethylation sites are highly enriched in brain-specific enhancer regions. Further functional annotation analysis showed that the overlap rate of these specific CpG markers with the key transcription factor binding sites of the corresponding tissues was as high as 78%, and there was a significant correlation with functional genes such as neural development and synaptic function, revealing the important regulatory role of tissue-specific methylation in organ function maintenance.

MSA Reveals Dynamic 5-hmc Biology in Human Tissues

In addition, the technical advantage of MSA lies in its ability to analyze 5-hydroxymethylcytosine (5hmC) in a compatible way. The researchers detected different tissues based on the ACE-seq protocol. The results showed that the level of 5hmC in brain tissue was significantly higher than that in other organs, and its average abundance reached 2.3% in whole brain samples. Spatial distribution analysis showed that 5hmC was highly enriched in the genome region (accounting for 65%) and active transcription region. By analyzing the samples of different cell proliferation states, it was found that the level of 5hmC was negatively correlated with the cell proliferation rate (r=-0.72, p<0.001), which provided important evidence for the unique regulatory mechanism of 5hmC in the process of cell identity maintenance and differentiation.

The dynamic biological properties of 5-hydroxymethylation as revealed by MSA (David et al., 2024) The dynamic biology of 5- hydroxymethylation by MSA (David et al., 2024)

Predicting Age and Mitotic History Using Epigenetic Clocks

In the field of apparent clock application, MSA shows excellent performance advantages. The chip covers 12 kinds of probes whose apparent clocks are more than 95%, forming a high-density detection matrix. Taking the Horvath clock as an example, the prediction model based on MSA data shows a highly significant correlation between the predicted age and the actual age (Pearson correlation coefficient is 0.87, P<0.001), which confirms its accuracy in biological age assessment. In the inference of cell division history, MSA can accurately capture the difference of proliferation activity of different tissues based on the change of methylation pattern: the predicted division rate of colon epithelial cells and T lymphocytes is 12.3 times and 9.8 times per month, respectively, which is significantly higher than that of brain tissue (0.05 times/month) and myocardial tissue (0.1 times/month), which is consistent with the known results.

The epigenetic clock is capable of predicting age and mitotic history (David et al., 2024) Epigenetic clock predicts age and mitotic history (David et al., 2024)

MSA Methylomes Reveal Strong Tissue Contexts

On the analytical level of genetic-epigenetic regulation, the research team innovatively integrated MSA data with the GWAS database. Through careful co-location analysis, it was found that more than 78% of phenotypically associated SNP loci were significantly co-located with methylation markers of corresponding tissues. For example, in the research related to diabetes, there is a significant correlation between the fluctuation of methylation level of 23 key SNP loci enriched in pancreatic β cells and insulin secretion function (r=-0.62, p = 1.2× 10-8), which clearly reveals the molecular bridge function of DNA methylation between genetic risk factors and disease phenotype, and fully verifies the core value of MSA in analyzing complex regulatory networks.

A strong organizational background correlates with human characteristics (David et al., 2024) Strong organizational background of human characteristics correlation (David et al., 2024)

Conclusion

Although the Illumina 270K chip has some limitations, such as limited coverage, as a milestone tool for early methylation research, it has laid an important foundation for the analysis of epigenetic mechanisms in cancer, neurodegenerative diseases, and other fields. The benchmark test shows that MSA is an accurate, repeatable, and scalable next-generation Infinium human methylation screening chip, which aims at feature mining in a crowd environment. In the future, it is expected that MSA will become a valuable tool for feature-related methylation screening in large populations and will be widely used to analyze the cell-type-specific mechanism of human diseases.

References

  1. David CG, Cameron C., et al. "MSA: scalable DNA methylation screening BeadChip for high-throughput trait association studies." bioRxiv. 2024 https://doi.org/10.1101/2024.05.17.594606
  2. Nabais, M.F., Gadd, D.A., Hannon, E., et al. "An overview of DNA methylation-derived trait score methods and applications." Genome Biol. 2023 https://doi.org/10.1186/s13059-023-02855-7
  3. Arneson A, Haghani A, Thompson MJ, et al. "A mammalian methylation array for profiling methylation levels at conserved sequences." Nat Commun. 2022 13(1): 783 https://doi.org/10.1038/s41467-022-28355-z
! For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Related Services
x
Online Inquiry