In the field of epigenetics, DNA methylation is the key epigraph of gene expression regulation, and the development of its genome-wide detection technology has always revolved around how to analyze the methylation map more efficiently and accurately. Traditional genome-wide bisulfite sequencing (WGBS) realizes methylation detection with single base resolution through bisulfite transformation, but it faces the limitations of serious DNA degradation and high input requirements. With the rapid increase of research demand for low-input samples, enzymatic methylation sequencing (EM-seq) has emerged in the methylation analysis of trace DNA by virtue of mild treatment of DNA by enzymatic transformation.
At the same time, the post-bisulfite labeling technology (PBAT) adapts to extremely low input scenarios such as single cells by optimizing the library building process, while the Oxford nanopore sequencing technology provides a new dimension for methylation analysis with the advantage of long reading and long length. These technologies have their advantages in transformation principle, data quality, and application scenarios, and it is of great significance to clarify their differences to promote epigenetics research and clinical transformation.
This paper compares the performance of EM-seq with other technologies in DNA methylation detection, highlighting the advantages of EM-seq in low-input DNA and GC-rich regions and its application in plant and clinical research.
In the detection method of DNA methylation, EM-seq protects DNA by enzymatic transformation, which is suitable for GC-rich regions. WGBS is treated with bisulfite, which is the gold standard for genome-wide methylation detection, but it has GC bias. EPIC chip targeted detection of over 930,000 CpG sites, suitable for large samples; ONT direct sequencing has no GC bias, and its long reading length is suitable for complex regions, each with its advantages and disadvantages.
Principle: Enzymatic reaction was used instead of bisulfite treatment. Firstly, methylated cytosine (5mC) was oxidized to derivatives such as 5-hydroxymethylcytosine by the TET2 enzyme, and then unmethylated cytosine was converted to uracil by APOBEC enzyme. At last, the unmethylated site was displayed as thymine (T) and the methylated site was retained as cytosine (C).
Advantages: Avoiding DNA degradation caused by bisulfite, covering GC-rich area more evenly, being suitable for low input DNA (such as pg grade), and more accurate methylation quantification.
Disadvantages: The experimental process is long (2-4 days), and the cost is higher than WGBS, so it needs to adapt to the Illumina sequencing platform.
Principle: Classical method: DNA was treated with bisulfite, unmethylated cytosine was converted into uracil, methylated cytosine was retained, and methylated sites were identified by comparison after sequencing.
Advantages: The technology is mature, and can cover the single-base resolution of the whole genome, with abundant data and relatively low cost.
Disadvantages: Bisulfite treatment leads to DNA fragmentation, insufficient coverage of GC-rich region, easy to overestimate methylation level, and high requirements for DNA input (100 ng+).
Methylated DNA is enriched in WGBS data (Ji et al., 2014)
Principle: Based on microarray technology, the probe targeted about 935,000 CpG sites (mainly distributed in gene promoters, coding regions, and other functional areas), and the methylation level was detected by fluorescence signals.
Advantages: Low cost, suitable for large sample size analysis, standardized experimental flow, and simple data analysis.
Disadvantages: The probe is preset, which can not cover the whole genome, and the probe in GC-rich region is easy to cross-hybridize, which leads to overestimation, and it can not detect the extreme methylation state (beta value is close to 0 or 1).
Principle: Direct sequencing technology can distinguish methylated cytosine from unmethylated cytosine through the change of nanopore current, and can directly read the methylation state of long DNA without chemical transformation.
Advantages: Long reading length (kb level), no GC bias, suitable for complex genome regions (such as repetitive sequences and high GC regions), and fast sequencing speed (real-time data output).
Disadvantages: The cost is relatively high, and there is still room for improvement in data accuracy in some scenarios; The requirements for sample quality are strict, and low-quality samples may affect the detection effect.
Structure of the Oxford Nanopore (Levkova et al., 2023)
Services you may interested in
Learn More
In the field of DNA methylation detection, EM-seq, as a new technology, is significantly different from traditional methods. Compared with WGBS's dependence on bisulfite, which leads to DNA degradation and GC bias, EPIC chip is limited by probe preset, and ONT faces the analysis complexity and cost problems. EM-seq balances accuracy and coverage through enzymatic transformation, especially in GC-rich region, which provides a new choice for methylation research.
Title: Comparison of EM-seq and PBAT methylome library methods for low-input DNA
Publish Magazine: Epigenetics
Impact Factors: 2.9
Publication Time: 2022.10.17
DOI: https://doi.org/10.1080/15592294.2021.1997406
In this study, EM-seq and PBAT were compared in many dimensions through systematic experiments. In terms of sequencing data output, EM-seq shows a higher library transformation rate, and the effective sequencing data output is about 25% higher than PBAT at the same initial amount of 10ng DNA. In the accuracy of methylation site detection, the Pearson correlation coefficient between the quantitative methylation level of PBAT at the CG site and the traditional WGBS method is 0.92, which is slightly higher than that of the EM-seq, which is 0.89.
However, in CHG and CHH sites, EM-seq has a significant advantage in detection sensitivity and can identify 18% of rare methylation sites missed by PBAT. In addition, in the repeatability evaluation of repeated samples, the intra-group correlation coefficient (ICC) of the two methods is higher than 0.85, showing good stability. In data analysis, the GC preference of PBAT sequencing data is low, which reduces the difficulty of subsequent correction; However, EM-seq performs better in the recognition of methylation signals in low-complexity regions because of its unique enzymatic reaction mechanism.
EM-seq libraries performed better in regards to library and sequencing quality (Han et al., 2022)
Based on the mechanism of enzymatic transformation, EM-seq technology effectively avoids the severe degradation of DNA caused by traditional bisulfite treatment through the cascade reaction of methylation-sensitive DNA glycosylase and deaminase. The research data show that the average insertion length of DNA fragments after EM-seq treatment can reach 300-500bp, which significantly improves the accuracy of comparison between sequencing reads and reference genome compared with 100-200bp after bisulfite treatment. Under the condition of 1-10ng DNA input, the library repetition rate can be controlled below 10%, and the data complexity is increased by more than 30% compared with the traditional method, thus ensuring the uniformity of genome coverage. This technique is especially suitable for treating precious trace samples.
PBAT technology relies on the traditional chemical transformation process of bisulfite, which will make DNA deamination, leading to increased fragmentation, especially in low initial samples, and this fragmentation effect is more significant. When the input amount of DNA is less than 50ng, the repetition rate of the PBAT library often exceeds 25%, which makes a lot of sequencing data redundant and reduces the effective coverage depth. In addition, fragmented DNA tends to have preference during amplification, which leads to insufficient coverage of some genomic regions (such as promoter regions rich in CpG islands). Despite these limitations, PBAT technology still has irreplaceable application value in the research that needs to analyze the methylation heterogeneity of a single cell, such as early embryo development research, by virtue of its resolution advantage at the single-cell level.
CpG coverage and overlap (Han et al., 2022)
In the low-input whole genome methylation analysis, EM-seq technology is recommended first, especially for clinical research with extremely scarce sample sources, such as biopsy tissue of cancer patients, chorionic samples of prenatal diagnosis, or trace body fluid samples of rare patients. When the research goal focuses on the methylation characteristics of single cell level and can accept high data redundancy, PBAT technology can be used as an alternative. In addition, it is suggested to combine the advantages of the two technologies in practical application, for example, EM-seq and PBAT databases are established for the same batch of samples, and the dual goals of genome-wide coverage and single-cell resolution are achieved through data integration.
Corre ationof DNA methylation levels.Heatmaps illustrating coefficients between input amounts (1, 2, 5 and 10ng) and library methods (EM-seg and PBAT) CpGs covered by at least 5X and 10X were considered (Han et al., 2022)
Title: Efficient and accurate determination of genome-wide DNA methylation patterns in Arabidopsis thaliana with enzymatic methyl sequencing
Publish Magazine: Epigenetics Chromatin
Impact Factors: 4.2
Publication Time: 2020.10.07
DOI: https://doi.org/10.1186/s13072-020-00361-9
In this study, the performance of EM-seq and WGBS in the detection of low-input DNA methylation in Arabidopsis thaliana was compared and analyzed from multiple dimensions. In terms of detection sensitivity, the efficiency of EM-seq in capturing methylation sites of DNA samples as low as 10 ng DNA is significantly higher than that of WGBS, especially in the context of CG, CHG, and CHH, the number of methylation sites detected by the former is 32% higher than that of the latter on average. The data consistency evaluation shows that the correlation of methylation level between the two technologies in high input DNA samples can reach 0.89.
However, with the initial amount of DNA falling below 50 ng, the technical repeatability of WGBS decreased significantly (CV value increased by 45%), while EM-seq still maintained stable detection performance. In addition, at the level of single base resolution, the misjudgment rate of the methylation status of EM-seq under low input conditions is only 2.1%, which is nearly 64% lower than 5.8% of WGBS, which is particularly prominent in analyzing tissue-specific methylation patterns. Based on the above results, EM-seq shows higher reliability and detection efficiency in the detection of low-input DNA methylation in Arabidopsis thaliana.
Quality comparison between EM-seq and WGBS (Feng et al., 2020)
EM-seq, with its mild enzyme transformation system, can still maintain a high uniformity of CpG site coverage in a low initial sample of 5 ng, effectively reducing the information loss caused by DNA degradation. In contrast, although WGBS shows the advantages of the traditional gold standard in high-input samples, it has obvious GC preference and sequence loss problems in low-input scenarios. From the point of view of data analysis, EM-seq produces lower background noise and can identify the methylation status of CHH and CHG sites more accurately, which provides a reliable way to study the methylation pattern of non-CG.
It is worth noting that the two techniques showed high consistency in detecting methylation levels of CHG and CG sites (R²=0.89), but the detection results at CHH sites were significantly different (p<0.01), which may be attributed to the damaging effect of bisulfite transformation of WGBS on single-stranded DNA. In terms of experimental cost, EM-seq has a higher comprehensive cost-performance ratio when dealing with precious or trace samples, considering that it does not need additional DNA amplification steps.
Coverage comparison between EM-seq and WGBS (Feng et al., 2020)
To sum up, EM-seq can be used as the preferred detection technology for the study of low input DNA methylation in Arabidopsis thaliana and other model plants, especially for the analysis of samples in the early development stage, single cell-derived DNA, or degraded samples. Future research can further explore the application potential of EM-seq in other species and complex biological processes, and realize the joint analysis of methylation sites and chromatin structure by combining long reading and long sequencing technology.
Differences in differentially methylated regions (DMRs) between EM-seq and WGBS (Feng et al., 2020)
Title: Comparing methylation levels assayed in GC-rich regions with current and emerging methods
Publish Magazine: BMC Genomics
Impact Factors: 3.5
Publication Time: 2024.07.30
DOI: https://doi.org/10.1186/s12864-024-10605-7
In the consistency study of genome-wide methylation detection, it is found that the methylation beta values measured by the four methods are highly correlated. Among them, the correlation coefficient r between EM-seq and WGBS ranges from 0.826 to 0.906, while the correlation coefficient r between EPIC and EM-seq is more than 0.96, which indicates that different detection methods can well confirm each other in the overall trend. It is worth noting that from the point of view of intra-group stability, the intra-group correlation of EM-seq reached 0.885 0.007, which was significantly superior to that of WGBS (0.844 0.007) in the stability of repeated test results. The difference was statistically tested (p<0.05), which confirmed that EM-seq performed better in intra-group repeatability.
Per-position methylation levels assayed using EM-seq, WGBS and EPlC are well correlated and mostly independent of GC% context (Guanzon et al., 2024)
In the regions with different GC contents in the genome, each detection method shows obvious performance differences. EM-seq technology performed well in the high GC region with GC content of 55-95%, and its coverage was significantly higher than that of WGBS(p<0.01), which was due to its unique enzymatic treatment mechanism, which could effectively overcome the amplification obstacles in the high GC region. However, WGBS shows a weak advantage in the low GC region with GC content below 35%, and its coverage is slightly higher than that of EM-seq. In addition, the methylation level of EPIC technology is overestimated in the region with GC content greater than 75%. In-depth analysis shows that 8.5% of the high GC probes used in this technology have cross-reaction problems, which may lead to misjudgment of detection signals, thus resulting in inaccurate evaluation of methylation level.
Methylation levels are well correlated in all biologically relevant genomic contexts acrossEM-seg WGBS and EPlC (Guanzon et al., 2024)
EM-seq and ONT showed significant advantages in methylation detection of GC-rich regions. EM-seq avoids the amplification preference of traditional bisulfite treatment for high GC regions through unique enzymatic modification and realizes accurate quantification at single base resolution, which is especially suitable for accurate evaluation of methylation level of tumor markers. ONT, based on nanopore sequencing technology, can directly read the methylation information of long DNA fragments (> >10 kb), which plays an irreplaceable role in the unbiased analysis of complex genome structure regions (such as gene clusters and repeated sequences).
In contrast, the coverage of WGBS is significantly reduced (about 30% on average) because of the low deamination efficiency of bisulfite on high GC sequences, while the detection blind area of EPIC chip is about 20% due to the limitation of probe design. Nevertheless, the genome-wide coverage of WGBS and the low cost of EPIC's Qualcomm make it still dominant in the routine methylation screening of large sample cohorts. The research results provide a quantitative reference standard for the selection of methylation detection methods in different scenarios such as basic scientific research and clinical diagnosis.
Picking the right tool for the job (Guanzon et al., 2024)
To sum up, EM-seq has shown its unique advantages in methylation detection by virtue of the protection of DNA integrity and the uniform coverage of GC-rich regions by enzymatic transformation. Compared with the bisulfite bias of WGBS, the probe limitation of EPIC chip, and the high cost and complexity of ONT, EM-seq has achieved a better balance between accuracy and practicality and is especially suitable for in-depth study of GC-rich regions such as CpG islands. With the continuous optimization of technology, EM-seq is expected to play a more important role in the discovery of cancer epigenetic markers and the mechanism analysis of single gene diseases and provide more reliable technical support for epigenetics research.
References: